Swe-MWELex is a list of MultiWord Expressions that are used productively or receptively in teaching Swedish as a second language. The list is based on two corpora: SweLL-pilot, containing essays fron language learners, and COCTAILL, containing texts from course books used at courses for teaching language learners. Texts in the two corpora were manually annotated with CEFR levels. These levels have been projected to each vocabulary item observed in the texts. The list is, therefore, non-prescriptive, i.e. descriptive in character.
Every item in the list contains linguistic information, that was partly automatically assigned, with certain categories manually assigned.
Frequences in the list come also from the two corpora, i.e.: COCTAILL, and SweLL-pilot, see articles below:
- Elena Volodina, Ildikó Pilán, Stian Rødven Eide and Hannes Heidarsson 2014. You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. Proceedings of the third workshop on NLP for computer-assisted language learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings 107: 128–144.
- Volodina Elena. (2024) On two SweLL learner corpora–SweLL-pilot and SweLL-gold. In Huminfra Conference, pp. 83-94.
- Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.
It is possible to interactively browse the list on the Lärka-platform (https://spraakbanken.gu.se/larka/svlp) under Swedish L2 profiles -> Lexical profile -> Multi Word Expressions. There, it is possible to filter the list for different categories and download it in full or as a selection.