UD_Swedish-SweLL is a parallel Universal Dependencies treebank based on SweLL, the Swedish Learner Language corpus. Its first version, released as part of UD 2.17, consists of 510 randomly selected sentence-correction pairs from SweLL-gold, a corpus of essays written by adult learners of Swedish as a second language (L2). For more information about the treebank, see the official README file.
Data citation
Masciolini, Arianna, Berdicevskis, Aleksandrs, & Szawerna, Maria Irena (2025). UD2.17_Swedish-SweLL (updated: 2025-11-19). [Data set]. Språkbanken Text. https://doi.org/10.23695/fpnc-1v66
Additional ways to cite the dataset.
A parallel UD treebank based on SweLL, the Swedish Learner Language corpus.
Annotation
On top of the annotations available in the source corpus (pseudonymization, error tagging and normalization), each token is lemmatized, UPOS-tagged and dependency annotated following the Universal Dependencies standard. The annotators are themselves speakers of Swedish as a second language.
Caveats
- Lemmas, POS tags and dependencies are manually validated systematically. Morphological features, on the other hand, were only checked for tokens marked as learner errors in the source corpus and/or whose automatic lemmatization, POS tagging and/or dependency annotation were found to be incorrect.
- This release only contains part of the learner metadata available for SweLL-gold. To obtain full-metadata version of the treebank, apply for access to SweLL-gold.
Intended uses
(Cross-lingual) SLA studies, dependency parser evaluation.
Accessible through
| Access | Platform | Licence |
|---|---|---|
|
|
CC-BY-SA-4.0 |
Download
| File | Size | Modified | Licence |
|---|---|---|---|
| 218.8 KB | 2025-11-19 | CC-BY-SA-4.0 |