MuClaGED

Standard reference

Judit Casademont Moner, Elena Volodina (2022): Swedish MuClaGED: A new dataset for Grammatical Error Detection in Swedish, in Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022)

Data citation

Casademont Moner, Judit, & Volodina, Elena (2025). MuClaGED (updated: 2025-01-19). [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/q9v4-vt57

Additional ways to cite the dataset.

MuClaGED is a dataset for multi-class Grammatical Error Detection for Swedish. The dataset is based on the SweLL-gold corpus.

Dataset description

Data is provided in a tab-separated format consisting of five columns, namely, token id, token, list of error codes for addition, list of error codes for deletion and list of codes for replacement. See more on data format in the standard reference article.

License: CLARIN-ID, -PRIV, -NORED, -BY (https://www.kielipankki.fi/support/clarin-eula/#res).

Annotation

Each token has an error label (high level error type) and edit type that has been applied for correction (addition, deletion and replacement).

Intended uses

Grammatical Error Detection and labeling, (Second) Language Acquisiton studies, Learner Corpus Research, Noisy User-produced Data.

Accessible through

Access	Platform	Licence
https://sunet.artologik.net/gu/swell		Other

Standard reference

Data citation

Dataset description

Annotation

Intended uses

Accessible through

Collection

Type

Language

Size

Keywords

Creators

Updated

Contact

DOI