Hoppa till huvudinnehåll

Learner Language

Datacitering Information

Språkbanken Text (2025). Learner Language (uppdaterad: 2025-01-19). [Data set]. Språkbanken Text. https://doi.org/10.23695/svn8-rt31
BibTeX Ytterligare sätt att citera datamängden.
Learner Language är en samling av korpusar och lexikala resurser som beskriver inlärarspråket. Korpusar inkluderar både texter/audio som produceras av de som lär sig språket, och texter/språket som de utsätts för (läser eller hör, t.ex. kursböcker). Även en del derivata resurser utifrån dessa korpusar är med i denna samling.

Included resources

  • COCTAILL - is a corpus of course booksused for teaching Swedish as a second language at CEFR levels A1, A2, B1, B2, and C1.
  • SweLL-gold is a second language learner corpus, featuring pseudonymization, normalization and correction-annotation.
  • SweLL-pilot is a second language learner corpus, featuring CEFR labeling.
  • DaLAJ resources are a collection of sentence pairs (original - corrected) containing one error each.
  • MultiGED -- Multilingual Grammatical Error Detection - is a dataset for grammamatical error detection, featuring five languages (Czech, German, English, Italian, Swedish). The data is organized by sentences, where each token has an annotation whether it is correct or incorrect (c or i). The corrected version is not provided. MultiGED has been used for a shared task (https://spraakbanken.github.io/multiged-2023/)
  • MuClaGED -- Multi-Class Grammatical Error Detection - is a dataset for Swedish only, organized by sentences, each incorrect token associated with the type of correction (Orthography, Syntax, Morphology, etc.) and the type of edit (Addition, Deletion, Replacement)
  • MultiGEC -- Multilingual Grammatical Error Correction is a dataset for grammamatical error detection, featuring twelve languages (Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian). The data is organized by essay pairs (original - corrected). MultiGEC has been used for a shared task (https://spraakbanken.github.io/multigec-2025/)
  • SVALex is a wordlist generated from the COCTAILL corpus.
  • SweLLex is a wordlist generated from the SweLL-pilot essays.
  • Sen*Lex is a sense-based wordlist, combining SVALex and SweLLex in one.
  • CoDeRooMor is a morphologically annotated list based on Sen*Lex, featuring annotations for word-building morphemes (roots, prefixes, suffixes etc) and word-building mechanisms (affixation, compounding, etc.).
  • Swe-MWELex is a list of multi-word expressions based on Sen*Lex, with CEFR labels and subcategorizations of MWEs into several types
  • L2Lex-Adj is a list of adjectives based on Sen*Lex, with CEFR labels, information on adjectival declensions and frequences.
  • L2Lex-AdjAdv is a list of adjectives and adverbs based on Sen*Lex, with CEFR labels, information on patterns of comparative degrees and frequences.
  • Kelly - is a wordlist covering ca 8.000 most frequent words in the web texts, and assigned to the six CEFR levels based on the frequency information.

Avsedd användning

Research, development and pedagogical applications within (second) language acquisition and intelligent computer-assisted language learning

Referenser

  • <a href='/om/personal/elena'>Elena Volodina</a>, <a href='/om/personal/ildiko'>Ildikó Pilán</a>, <a href='/om/personal/stian'>Stian Rødven-Eide</a>, Hannes Heidarsson (2014): <a href="https://gup.ub.gu.se/publication/206132?lang=sv">You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language</a>, in <em>NEALT Proceedings Series</em>, volume <em>22</em>, pages <em>128-144</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/206132"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a> (2024): <a href="https://gup.ub.gu.se/publication/345597?lang=sv">On two SweLL learner corpora – SweLL-pilot and SweLL-gold</a>, in <em>Proceedings of the Huminfra Conference (HiC 2024), 10-11 January, 2024, Gothenburg, Sweden / edited by Elena Volodina, Gerlof Bouma, Markus Forsberg, Dimitrios Kokkinakis, David Alfter, Mats Fridlund, Christian Horn, Lars Ahrenberg, Anna Blåder</em>, pages <em>83-94</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/345597"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, <a href='/om/personal/ildiko'>Ildikó Pilán</a>, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell (2016): <a href="https://gup.ub.gu.se/publication/248141?lang=sv">SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies</a>, in <em>Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016, Portorož, Slovenia</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/248141"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, Lena Granstedt, <a href='/en/about/staff/arild'>Arild Matsson</a>, Beáta Megyesi, Ildikó Pilán, Julia Prentice, <a href='/om/personal/dan'>Dan Rosén</a>, Lisa Rudebeck, <a href='/om/personal/carl-johan'>Carl-Johan Schenström</a>, Gunlög Sundberg, Mats Wirén (2019): <a href="https://gup.ub.gu.se/publication/285609?lang=sv">The SweLL Language Learner Corpus: From Design to Annotation</a>, in <em>Northern European Journal of Language Technology</em>, volume <em>6</em>, pages <em>67-104</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/285609"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, <a href='/om/personal/samir'>Yousuf Ali Mohammed</a>, <a href='/om/personal/sasha'>Aleksandrs Berdicevskis</a>, <a href='/om/personal/gerlof'>Gerlof Bouma</a>, Joey Öhman (2023): <a href="https://gup.ub.gu.se/publication/326817?lang=sv">DaLAJ-GED – a dataset for Grammatical Error Detection tasks on Swedish</a>, in <em>Proceedings of the 12th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2023) / edited by David Alfter, Elena Volodina, Thomas François, Arne Jönsson and Evelina Rennes</em>, pages <em>94-101</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/326817"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, Christopher Bryant, Andrew Caines, Orphée De Clercq, Jennifer-Carmen Frey, Elizaveta Ershova, Alexandr Rosen, Olga Vinogradova (2023): <a href="https://gup.ub.gu.se/publication/331652?lang=sv">MultiGED-2023 shared task at NLP4CALL: Multilingual Grammatical Error Detection</a>, in <em>Proceedings of the 12th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2023)</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/331652"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • Judit Casademont Moner, <a href='/om/personal/elena'>Elena Volodina</a> (2022): <a href="https://gup.ub.gu.se/publication/321955?lang=sv">Swedish MuClaGED: A new dataset for Grammatical Error Detection in Swedish</a>, in <em>Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022)</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/321955"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, <a href='/om/personal/ildiko'>Ildikó Pilán</a>, Lorena Llozhi, Baptiste Degryse, Thomas François (2016): <a href="https://gup.ub.gu.se/publication/248090?lang=sv">SweLLex: second language learners&#039; productive vocabulary</a>, in <em>Linköping Electronic Conference Proceedings. Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/248090"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • Thomas François, <a href='/om/personal/elena'>Elena Volodina</a>, <a href='/om/personal/ildiko'>Ildikó Pilán</a>, Anaïs Tack (2016): <a href="https://gup.ub.gu.se/publication/248142?lang=sv">SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners</a>, in <em>Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016 Portorož, Slovenia</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/248142"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, Sofie Johansson Kokkinakis (2012): <a href="https://gup.ub.gu.se/publication/154723?lang=sv">Introducing Swedish Kelly-list, a new free e-resource for Swedish</a>, in <em>LREC 2012 Proceedings</em>, volume <em>2012</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/154723"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • <a href='/om/personal/elena'>Elena Volodina</a>, <a href='/om/personal/samir'>Yousuf Ali Mohammed</a>, Therese Lindström Tiedemann (2021): <a href="https://gup.ub.gu.se/publication/311724?lang=sv">CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish</a>, in <em>23rd Nordic Conference on Computational Linguistics (NoDaLiDa) Proceedings, May 31–2 June, 2021, Reykjavik, Iceland Online / Simon Dobnik, Lilja Øvrelid (Editors)</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/311724"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>
  • Judit Casademont Moner, <a href='/om/personal/elena'>Elena Volodina</a> (2022): <a href="https://gup.ub.gu.se/publication/321955?lang=sv">Swedish MuClaGED: A new dataset for Grammatical Error Detection in Swedish</a>, in <em>Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022)</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/321955"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>

Datamängder i samlingen

Typ

  • Korpus
  • Samling

Språk

svenska
flera språk

Storlek

Resurser: 22

Nyckelord

  • second language (L2)
  • learner language
  • language learning
  • essays
  • course books
  • word lists

Uppdaterad

2025-01-19

Kontakt

sb-info@svenska.gu.se