UD_Swedish-SweLL is a parallel Universal Dependencies treebank based on SweLL, the Swedish Learner Language corpus, a collection of essays written by adult learners of Swedish as a second language (L2). The current version, released as part of UD 2.18, includes a test set consisting of 510 randomly selected sentences, as well as 134 sentences authored by French speakers. All sentences are extracted from SweLL-gold and come with corrections and error tags. For more information about the treebank, see the official README file.
Data citation
Masciolini, Arianna, Berdicevskis, Aleksandrs, Szawerna, Maria Irena, & Grand-Clement, Caroline (2026). UD2.18_Swedish-SweLL (updated: 2026-06-15). [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/6bxr-zx80
Additional ways to cite the dataset.
Annotation
On top of the annotations available in the source corpus (pseudonymization, error tagging and normalization), each token is lemmatized, UPOS-tagged and dependency annotated following the Universal Dependencies standard. The annotators are themselves speakers of Swedish as a second language.
Caveats
- Lemmas, POS tags and dependencies are manually validated systematically. Morphological features, on the other hand, were only checked for tokens marked as learner errors in the source corpus and/or whose automatic lemmatization, POS tagging and/or dependency annotation were found to be incorrect.
- This release only contains part of the learner metadata available for SweLL-gold. To obtain full-metadata version of the treebank, apply for access to SweLL-gold.
Intended uses
(Cross-lingual) SLA studies, dependency parser evaluation.
References
Elena Volodina, Arianna Masciolini, Beáta Megyesi, Julia Prentice, Lisa Rudebeck, Gunlög Sundberg, Mats Wirén (2025): SweLL with pride: How to put a learner corpus to good use, in Huminfra handbook: Empowering digital and experimental humanities / Gerlof Bouma, Dana Dannélls, Dimitrios Kokkinakis, Elena Volodina (eds.), pages 251-306
Guidelines for the annotation of interlanguage phenomena in UD_Swedish-SweLL
Arianna Masciolini, Aleksandrs Berdicevskis, Maria Irena Szawerna, Elena Volodina (2025): Annotating Second Language in Universal Dependencies: a Review of Current Practices and Directions for Harmonized Guidelines, in Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), August 27, Ljubljana, Slovenia / Gosse Bouma, Çağrı Çöltekin (eds.), pages 153-163
Caroline Grand-Clement and Arianna Masciolini: Sharing is Caring: Advantages of Sharing a Language Background with Learners as an Annotator of Learner Data in UD (upcoming)
Accessible through
| Access | Platform | Licence |
|---|---|---|
| CC-BY-4.0 | ||
| CC-BY-4.0 | ||
|
|
CC-BY-SA-4.0 |
Download
| File | Size | Modified | Licence |
|---|---|---|---|
| 257.02 KB | 2026-06-15 | CC-BY-4.0 | |
| 262.28 KB | 2026-06-15 | CC-BY-4.0 | |
| 1.64 MB | 2026-05-20 | CC-BY-SA-4.0 | |
|
stats_ud218_swedish-swell.csv.zip
Token frequency list in CSV format
(CSV)
|
30.73 KB | 2026-06-15 | CC-BY-4.0 |
|
stats_ud218_swedish-swell-target.csv.zip
Token frequency list for corrected sentences in CSV format
(CSV)
|
30.23 KB | 2026-06-15 | CC-BY-4.0 |