Skip to main content
Språkbanken Text is a part of Språkbanken.

MuClaGED

Standard reference Information

Judit Casademont Moner, Elena Volodina (2022): Swedish MuClaGED: A new dataset for Grammatical Error Detection in Swedish, in Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022) BibTeX

Data citation Information

Judit Casademont Moner, & Elena Volodina (2025). MuClaGED (updated: 2025-01-19). [Data set]. Språkbanken Text. https://doi.org/10.23695/q9v4-vt57
BibTeX Additional ways to cite the dataset.
MuClaGED is a dataset for multi-class Grammatical Error Detection for Swedish. The dataset is based on the SweLL-gold corpus.


Dataset description

Data is provided in a tab-separated format consisting of five columns, namely, token id, token, list of error codes for addition, list of error codes for deletion and list of codes for replacement. See more on data format in the standard reference article.

License: CLARIN-ID, -PRIV, -NORED, -BY (https://www.kielipankki.fi/support/clarin-eula/#res).

Annotation

Each token has an error label (high level error type) and edit type that has been applied for correction (addition, deletion and replacement).

Intended uses

Grammatical Error Detection and labeling, (Second) Language Acquisiton studies, Learner Corpus Research, Noisy User-produced Data.

Accessible through

Access Platform Licence
misc

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Sentences: 8,553
Tokens: 155,415

Keywords

  • grammatical error detection
  • token-level detection
  • error labeling
  • error edit labeling
  • language learning
  • sentences

Creators

  • Judit Casademont Moner
  • Elena Volodina

Updated

2025-01-19

Contact

Språkbanken Text, Sweden
sb-info@svenska.gu.se