Skip to main content
Språkbanken Text is a part of Språkbanken.

SVALex

Standard reference Information

Thomas François, Elena Volodina, Ildikó Pilán, Anaïs Tack (2016): SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016 Portorož, Slovenia BibTeX

Data citation Information

Volodina Elena, Pilán Ildikó, & François Thomas (2016). SVALex (updated: 2016-06-09). [Data set]. Språkbanken Text. https://doi.org/10.23695/rwrq-1w38
BibTeX Additional ways to cite the dataset.
SVALex is a lexicon of receptive vocabulary for Swedish as a second language

SVALex is a lexicon of receptive vocabulary for Swedish as a second/foreign language (SVA). Like its sister resource, SweLLex, it reports the normalized frequencies of words (lemmas) across six levels of the CEFR (Common European Framework of Reference for Languages). In the same fashion as SweLLex, it contains information on both single word usage, multi-word expressions, as well as information on their usage at different levels, something that is rarely present in the resources of this kind.

The frequencies have been estimated on a corpus of course books, COCTAILL, described in the article:

  • Elena Volodina, Ildikó Pilán, Stian Rødven Eide and Hannes Heidarsson 2014. You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. Proceedings of the third workshop on NLP for computer-assisted language learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings 107: 128–144.

More details on SVALex resource are provided on this webpage and in article by Volodina et al (2016).

Annotation

CEFR levels, lemmatization, POS-tagging, frequency

Intended uses

teaching L2 Swedish, developing CALL and ICALL systems, using as features in classification, profiling Swedish as a second language

Accessible through

Access Platform Licence
CC BY-NC-SA 4.0
attribution

Download

File Size Modified Licence
svalex_xlsx.tar.bz2
Columns: word (lemma), word class (SUC tags), normalized frequencies per level and total (format: level_freq@a1), number documents per level (format: nb_doc@a1), frequences in unique learner essays by level and subcorpus (format: c1_TISUS1, a1_SpIn31) (xlsx)
2.16 MB 2025-01-24 CC BY-NC-SA 4.0
attribution
svalex_tsv.tar.bz2
Columns: word (lemma), word class (SUC tags), normalized frequencies per level and total (format: level_freq@a1), number documents (=texts) per level (format: nb_doc@a1), frequences in unique coursebooks by level and in total (format: B2_Nya_Mål_3@b2, C1_Språkporten_123@total) (tsv)
203.25 KB 2025-01-24 CC BY-NC-SA 4.0
attribution

Collection

Type

  • Lexicon

Language

Swedish

Size

Entries: 15,681

Keywords

  • second language wordlist
  • L2
  • receptive vocabulary

Creators

  • Volodina Elena
  • Pilán Ildikó
  • François Thomas

Created

2016-06-09

Updated

2016-06-09

Contact

Språkbanken
sb-info@svenska.gu.se