Skip to main content
Språkbanken Text is a part of Språkbanken.

SweLL-gold

Citation Information

Språkbanken Text. SweLL-gold [Data set]. Språkbanken Text. https://doi.org/10.23695/2k47-y432
BibTeX Additional ways to cite the dataset.
Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.

SweLL-gold corpus is a corpus of essays written by adult learners of Swedish. It was collected during the period of 2017-2020 in the SweLL project, and contains 502 essays that have been pseudonymized, normalized and correction annotated.

Links

Cite as

Elena Volodina, Lena Granstedt, Arild Matsson, Beáta Megyesi, Ildikó Pilán, Julia Prentice, Dan Rosén, Lisa Rudebeck, Carl-Johan Schenström, Gunlög Sundberg, Mats Wirén (2019): The SweLL Language Learner Corpus: From Design to Annotation, in Northern European Journal of Language Technology, volume 6, pages 67-104 BibTeX

Annotation

Essays are manually pseudonymized, normalized and correction annotated for ortographical, lexical, morphological, syntactical and punctuation errors according to the guidelines available at [1]. Essays are also linguistically annotated (POS tagging, lemmatization, dependency annotation) with Sparv. Personal learner metadata is also available.

Caveats

Data collection is limited to a small geographical area and a short period of time. Although several language backgrounds are represented, the corpus is very unbalanced in this sense and as a consequence not well suited for native language identification tasks.

Intended uses

The corpus is primarily intended for Second Language Acquisition studies and development of Grammatical Error Correction and automatic pseudonymization systems.

References

Type

  • Corpus

Language

Swedish

Size

Texts: 502

Keywords

  • L2 Swedish
  • second language
  • language learning
  • essays

Contact

Språkbanken Text
sb-info@svenska.gu.se