L2 profiles for Swedish

Full name: Development of lexical and grammatical competences in immigrant Swedish, RJ, 2018-2020


Sweden has a growing number of immigrants, the need for courses and coursebooks in Swedish as a second language (L2) is increasing, as is the demand for standardized tests and qualifications. This project intends to study the development of lexical and grammatical competences in L2 learners of Swedish.

General description

We intend to perform the study through two corpora: coursebook texts and learner essays, both marked up for proficiency levels according to the Common European Framework of References (CEFR). The corpora will be processed by computational methods, after which the results will be analysed by linguists, lexicographers, grammarians, teachers and language assessors - both linguistically, and based on theory of teaching, to find ways of identifying minimal or central (need-to-know) vocabulary and grammar scopes, as well as peripheral (good-to-know) grammar and vocabulary at each level of proficiency as a way to support teachers, test-makers, assessor and learners. The aim of this project is, thus, to provide an extensive description of what lexical and grammatical competence learners at each level possess, both receptively and productively, and explore the relation between the receptive and productive scopes. The project will result in a number of practical digital tools: online sites for browsing and downloading lexical and grammatical inventories, and a set of algorithms and tools that can be re-used on other corpora for extraction of similar type of resources.



The project is financed by Riksbankens Jubileumsfond during years 2018-2020 through a grant P17-0716:1

Visions and plans


  • Lexical profile: sense-based SenSVALex, SenSweLLex
  • Resource preparation/curation: Transcription and anonymization of SweLL-pilot
  • MWE pilot experiment for level-linking using crowdsourcing (for L2 English)


  • Legato tool for lexicographic annotation
  • Lexicographic annotation (guidelines, assistant/lexicographer work)
  • Level-linking experiment (MWE-based), incl. chapter in a book
  • Pilot on definiteness (as a preparation step for developing gram profiles)
  • Annotation quality check of SweLL-pilot & COCTAILL


  • Grammar profiles: receptive, productive
  • International workshop on CEFR grammartical profiles/criterial features
  • Focus areas for grammar profiles: definiteness, passive, prepositions, verb phrases, noun phrases, ...
  • Lexical profiles - finishing off annotation via Legato
  • Complex network analysis of lexical profiles
  • Empirical analysis and evaluation of lexical profiles
  • Setting up user interface for lexical profile browsing
  • Linking to (target) levels,
  • International CEFRLex workshop,
  • Integration of L2 profiles outcomes into Reference Level Descriptions (EU generic initiative) (?)


  • End-user evaluation
  • Integration of parsing algorithms into L2-specific searches in Korp/Strix
  • User interface for Grammatical profiles browsing
  • Visualization of CA(F) model (?)


  • Elena Volodina, Therese Lindström Tiedemann (September, 2019) L2 profiles: Half-way report [slides]
  • David Alfter. (September, 2019) The LEGATO annotation tool. Presentation at Språkbanken's kick-off. Gothenburg, Sweden. [slides]
  • David Alfter. (March, 28, 2019). Idiomaticity and complexity - An L2-oriented perspective. A regular talk at Språkbanken-Text, Gothenburg, Sweden.
  • Therese Lindström Tiedemann. 2019. Konferenspresentation. Prepositionernas frekvens i L2 svenska – utvecklingen över CEFR-nivåer, Svenskans beskrivning 37, Åbo akademi, Åbo (Turku), Finland [Book of abstracts]
  • Elena Volodina (April,11, 2019). Crowdsourcing for language learning: looking for potential. A regular research seminar at Språkbanken-Text, Gothenburg, Sweden.
  • Elena Volodina (April, 3, 2019). Crowdsourcing for language learning: looking for potential. Louvain-la-Neuve, Belgium, guest talk. [pdf]
  • Jaka Čibej (March, 14, 2019). MWEs and crowdsourcing. Outline and results. Lisbon, Portugal, A talk at enet-Collect annual meeting, Work Group 1. [pdf]
  • Eeva-Liisa Nyqvist & Therese Lindström Tiedemann. 2019, 30 jan., Forskningsseminariet i nordiska språk, Åbo universitet. Hur behärskar finska språkbadselever passiv i åk 6 (12 år) och åk 9 (15 år)? [Abstract]
  • Elena Volodina (December, 6, 2018) Introduction to the pre-workshop MWE experiment [slides]
  • Jaka Čibej & David Alfter (December 6, 2018)Experiment set up and results of the MWE experiment [Slides Jaka] [Slides David]
  • Eeva-Liisa Nyqvist & Therese Lindström Tiedemann. 2018, 28 nov., Forskningsseminariet i nordiska språk och svensk översättning, Helsingfors universitet. Hur behärskar finska språkbadselever passiv i åk 6 (12 år) och åk 9 (15 år)?
  • Eeva-Liisa Nyqvist & Therese Lindström Tiedemann. 2018, May, 15. Hur behärskar finska språkbadselever passiv i åk 6 (12 år) och åk 9 (15 år)? Oslo, Gramino conference. [abstract] [slides]
  • Therese Lindström Tiedemann. 2018, May 3. Profiling L2 Swedish: need-to-know and good-to-know competences. Presentation at a work-in-progress seminar on first-language and second-language writing. In connection to Symposium on language learning and use. Uppsala, Sweden.
  • Therese Lindström Tiedemann. 2018, February. A linguist’s use of L2 corpora – The Swedish passive, a case study. SB-talk, University of Gothenburg [Slides]
  • Therese Lindström Tiedemann. 2017, December. Case study: Studying the Swedish passive. Clarin workshop on Interoperability of L2 resources and tools, Gothenburg. [Slides]
  • Therese Lindström Tiedemann. 2017, October. Att bli aktiv i svenska med passiva verb. Svenskans Beskrivning 2017, Uppsala [abstract]
  • Therese Lindström Tiedemann. 2017, May. När passiv lärs in. Svenskan i Finland. [abstracts]



  • David Alfter and Johannes Graën. 2019. Interconnecting lexical resources and word alignment: How do learners get on with particle verbs? Proceedings of Nodalida 2019, Turku, Finland. LiUP Press.
  • David Alfter and Elena Volodina. 2019. From river to bank: The importance of sense-based graded word lists. Proceedings of EuroCALL 2019
  • David Alfter, Therese Lindström Tiedemann and Elena Volodina. 2019. LEGATO: A flexible lexicographic annotation tool. Nodalida 2019, Turku, Finland. LiUP Press.
  • David Alfter, Lars Borin, Ildikó Pilán, Therese Lindström Tiedemann, Elena Volodina. 2019. From Language Learning Platform to Infrastructure for Research on Language Learning. CLARIN-2018 post-conference volume. LiUP Press. [pdf]
  • Stemle Egon W, Boyd Adrian, Janssen Maarten, Lindström Tiedemann Therese, Mikelić Preradović Nives, Rosen Alexandr, Rosén Dan & Volodina, Elena. (2019) Working together towards an ideal infrastructure for language learner corpora. 4th Learner Corpus Research Conference. Post-conference volume.


  • David Alfter and Elena Volodina. 2018. Is the whole greater than the sum of its parts? A corpus-based pilot study of the lexical complexity in multi-word expressions. Proceedings of SLTC, Stockholm, November 2018.
  • Lindström Tiedemann, Therese, Lenardic Jakob & Fiser Darja. 2018. L2 learner corpus survey – towards improved verifiability, reproducibility and inspiration in learner corpus research. Annual Clarin 2018.
  • Alfter, David, & Volodina, Elena (2018) Towards Single Word Lexical Complexity Prediction. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 79-88) at NAACL-2018 [pdf]


Project-related events

  • Internal meeting with Thomas Francois, on the aspects of CEFR-based vocabulary profiling, followed by an invited talk at 8th NLP4CALL workshop. September, 29 - October 2, 2019.
  • Internal workshop with Jan Hustijn, framed around the project aims (grammar aspect) and Basic Language Cognition Model, followed by an invited talk at 7th NLP4CALL workshop. November, 6-7, 2018.
  • International workshop (and a large-scale experiment) on linking Multi-word expressions to levels of L2 proficiency. June-December 2018. [Website]

Useful links




Show all publications as BibTeX