Skip to main content
Språkbanken Text is a part of Språkbanken.

Svenska MWELex

Standard reference Information

Therese Lindström Tiedemann, David Alfter, Yousuf Ali Mohammed, Daniela Piipponen, Beatrice Silén, Elena Volodina (2024): Multiword expressions in Swedish as a second language: Taxonomy, annotation, and initial results, in Multiword Expressions in Lexical Resources: Linguistic, Lexicographic, and Computational Perspectives / edited by Voula Giouli and Verginica Barbu Mititelu, pages 309-348 BibTeX

Data citation Information

Lindström Tiedemann Therese, Alfter David, & Volodina Elena (2023). Svenska MWELex (updated: 2023-04-20). [Data set]. Språkbanken Text. https://doi.org/10.23695/352q-wa92
BibTeX Additional ways to cite the dataset.
Swe-MWELex is a sense-based word list of multi-word expressions that learners of Swedish as a second language can handle at the different levels of proficiency (according to the CEFR scale). The word list features MWE items and their frequencies from essays (productive vocabulary, based on SweLL-pilot) and from course books (receptive vocabulary, based on COCTAILL). Besides, each MWE has been classified by its type (based on their syntactic and lexical characteristics), as well as by a subgroup within the group of verbal MWEs)

Swe-MWELex is a list of MultiWord Expressions that are used productively or receptively in teaching Swedish as a second language. The list is based on two corpora: SweLL-pilot, containing essays fron language learners, and COCTAILL, containing texts from course books used at courses for teaching language learners. Texts in the two corpora were manually annotated with CEFR levels. These levels have been projected to each vocabulary item observed in the texts. The list is, therefore, non-prescriptive, i.e. descriptive in character.
Every item in the list contains linguistic information, that was partly automatically assigned, with certain categories manually assigned.

Frequences in the list come also from the two corpora, i.e.: COCTAILL, and SweLL-pilot, see articles below:

  • Elena Volodina, Ildikó Pilán, Stian Rødven Eide and Hannes Heidarsson 2014. You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. Proceedings of the third workshop on NLP for computer-assisted language learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings 107: 128–144.
  • Volodina Elena. (2024) On two SweLL learner corpora–SweLL-pilot and SweLL-gold. In Huminfra Conference, pp. 83-94.
  • Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.

It is possible to interactively browse the list on the Lärka-platform (https://spraakbanken.gu.se/larka/svlp) under Swedish L2 profiles -> Lexical profile -> Multi Word Expressions. There, it is possible to filter the list for different categories and download it in full or as a selection.

Annotation

CEFR levels, lemmatization, sense disambiguation, POS-tagging, frequency, manual MWE classification by type

Caveats

Swe-MWELex lists the same item once for each proficiency level and type of use (productive vs receptive), if the item has been used at each of the levels, which means that, for example, the expression "till_exempel" (Eng. for example) is provided six time, once for each level at which it occurs. This also means that the number of unique items is much less than the number of entries.

Intended uses

teaching L2 Swedish, developing CALL and ICALL systems, using as features in classification, profiling Swedish as a second language

Accessible through

Access Platform Licence
CC BY 4.0

Download

File Size Modified Licence
swe-mwelex.xlsx
Columns: Word (MWE in its dictionary form); Lemgram (MWE + word class); Sense (acc to Saldo); POS (word class acc. to SUC); SaldoPOS (word class acc to Saldo taxonomy); Type1:Syntactic-contiguity (subgroups of MWEs); Type2:Lexical-categories; Type3: Verbal-subcategory; Receptive (absolute frequencies in coursebooks), Productive (absolute frequencies in learner essays), Receptive TTR (relative frequencies per level and total in coursebooks), Productive TTR (relative frequencies per level and total in learner essays) (xlsx)
184.75 KB 2025-03-12 CC BY 4.0
swe-mwelex.csv
Columns: Word (MWE in its dictionary form); Lemgram (MWE + word class); Sense (acc to Saldo); POS (word class acc. to SUC); SaldoPOS (word class acc to Saldo taxonomy); Type1:Syntactic-contiguity (subgroups of MWEs); Type2:Lexical-categories; Type3: Verbal-subcategory; Receptive (absolute frequencies in coursebooks), Productive (absolute frequencies in learner essays), Receptive TTR (relative frequencies per level and total in coursebooks), Productive TTR (relative frequencies per level and total in learner essays) (csv)
414.88 KB 2025-02-20 CC BY-NC-SA 4.0

Type

  • Lexicon

Language

Swedish

Size

Entries: 2,791

Keywords

  • multi-word expressions
  • second language wordlist
  • L2
  • receptive vocabulary
  • productive vocabulary
  • CEFR levels

Creators

  • Lindström Tiedemann Therese
  • Alfter David
  • Volodina Elena

Created

2023-04-20

Updated

2023-04-20

Contact

sb-info@svenska.gu.se