SUC 3.0

Data citation

Språkbanken (2024). SUC 3.0 (updated: 2024-06-03). [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/wy84-ar30

Additional ways to cite the dataset.

Stockholm-Umeå corpus 3.0

The Stockholm-Umeå Corpus (SUC) is a collection of Swedish texts from the 1990's, consisting of one million words in total. The corpus is balanced, meaning that it contains various text types and stylistic levels. The texts are annotated with part-of-speech tags, morphological analysis and lemma (all that can be considered gold standard data), as well as some structural and functional information.

Version 1.0 was developed in co-operation between Gunnel Källgren at Stockholm University and Eva Ejerhed at Umeå University and was made available in 1997 by the department of linguistics at Stockholm University.

Version 2.0 was made available in 2006 by Sofia Gustafsson Capkova and Britt Hartmann at the department of linguistics at Stockholm University. It contains the same texts as SUC 1.0 but is extended with some annotation. Additionally, SUC 2.0 contains bonus materials. TigerSUC is SUC 2.0 converted to TIGER-XML by Martin Volk. StorSUC is additional SUC material of four million words.

Version 3.0 is available since 2012. It contains improved annotations, and unannotated texts with seven million words. (For the TigerXML-version, Suc2c, Suc2d, and the DTDs we still refer to version 2.0.)

Additional information about the compilation and annotation of SUC can be found in the SUC 2.0 manual [PDF].

Språkbanken distributes SUC 2.0 and SUC 3.0 in two variants:

SUC 2.0 and SUC 3.0: freely available for research; require a signed licence
SUCX 2.0 and SUCX 3.0: sentences in scrambled order; enriched with automatic annotations; downloadable without restrictions

SUC is freely available for research, but requires that every user signs an individual license with the department of linguistics at Stockholm University. Since December 1st 2008, SUC licensing is delegated to Språkbanken Text at the University of Gothenburg.

Appendix 3 of the SUC license [PDF] needs to be printed, signed, and sent either to sb-info@svenska.gu.se or to

SUC-licens
Språkbanken Text
Institutionen för svenska, flerspråkighet och språkteknologi
Göteborgs universitet
Box 200
405 30 Göteborg

When we have received and registered the signed license, we will contact you with a download link.

Accessible through

Access	Platform	Licence
https://spraakbanken.gu.se/korp/#?corpus=suc3 (scrambled)		CC-BY-4.0

Download

File	Size	Modified	Licence
suc3.xml.bz2 corpus (XML, scrambled)	84.44 MB	2024-06-03	CC-BY-4.0
stats_suc3.csv.zip Word statistics: (CSV)	1.43 MB	2025-04-22	CC-BY-4.0

SUC 3.0

Data citation

SUC 3.0

Accessible through

Download

Type

Language

Size

Updated

Contact

DOI