Hoppa till huvudinnehåll
Språkbanken Text är en avdelning inom Språkbanken.

SweLL

Standardreferens Information

Elena Volodina (2024): On two SweLL learner corpora – SweLL-pilot and SweLL-gold, in Proceedings of the Huminfra Conference (HiC 2024), 10-11 January, 2024, Gothenburg, Sweden / edited by Elena Volodina, Gerlof Bouma, Markus Forsberg, Dimitrios Kokkinakis, David Alfter, Mats Fridlund, Christian Horn, Lars Ahrenberg, Anna Blåder, pages 83-94 BibTeX

Datacitering Information

Språkbanken Text (2025). SweLL (uppdaterad: 2025-01-19). [Data set]. Språkbanken Text. https://doi.org/10.23695/b4wj-b251
BibTeX Ytterligare sätt att citera datamängden.
SweLL -- Swedish Learner Language -- är en samling av SweLL korpusar och derivata resurser med ursprung i dessa korpusar. SweLL korpusar består av elevtexter som skrevs av elever med andra modersmål än svenska (andraspråkskorpusar). Alla texter samlades inom provsituationer (ej hemuppgifter).

Included resources

  • SweLL-gold is a second language learner corpus, featuring pseudonymization, normalization and correction-annotation
  • SweLL-pilot is a second language learner corpus, featuring CEFR labeling
  • DaLAJ resources are a collection of sentence pairs (original - corrected) containing one error each
  • MultiGED -- Multilingual Grammatical Error Detection - is a dataset for grammamatical error detection, featuring five languages (Czech, German, English, Italian, Swedish). The data is organized by sentences, where each token has an annotation whether it is correct or incorrect (c or i). The corrected version is not provided. MultiGED has been used for a shared task (https://spraakbanken.github.io/multiged-2023/)
  • MuClaGED -- Multi-Class Grammatical Error Detection - is a dataset for Swedish only, organized by sentences, each incorrect token associated with the type of correction (Orthography, Syntax, Morphology, etc.) and the type of edit (Addition, Deletion, Replacement)
  • MultiGED -- Multilingual Grammatical Error Correction is a dataset for grammamatical error detection, featuring twelve languages (Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian). The data is organized by essay pairs (original - corrected). MultiGEC has been used for a shared task (https://spraakbanken.github.io/multigec-2025/)

Avsedd användning

Research, development and pedagogical applications within second language acquisition and intelligent computer-assisted language learning

Referenser

Datamängder i samlingen

Typ

  • Korpus
  • Samling

Språk

svenska
flera språk

Storlek

Resurser: 9

Nyckelord

  • L2 Swedish
  • second language
  • language learning
  • essays
  • Swedish learner language

Updaterad

2025-01-19

Kontakt

Språkbanken
sb-info@svenska.gu.se