Hoppa till huvudinnehåll

L2 linguistic complexity

A number of bigger and smaller projects are run under the header of Linguistic complexity in Second Language Learning context, among others:


1. L2 Lexical Complexity


(PhD project by David Alfter, 2016-2021; PhD supervisors: Elena Volodina and Lars Borin)

Partly overlaps with L2 profiles project and its experiment on ranking Multi-Word Expressions


Computer technology has found its way into various areas such as language assessment and language teaching (Chapelle 2006). There has also been a rise in computer-assisted language learning (CALL) and intelligent computer-assisted language learning (ICALL) platforms. ICALL platforms incorporate natural language processing (NLP), opening up a whole new field of opportunities. By using natural language processing, it is possible to enrich textual data for example by automatically identifying part-of-speech classes, syntactic relationships, named entities or compounds. Given this plethora of information, it is possible to devise exercises that are dynamically generated.

Automatically generated exercises can (and should) also be adapted to the needs of different learners. This research focuses on vocabulary, as vocabulary plays a major role in language learning; "while without grammar very little can be conveyed, without vocabulary nothing can be conveyed" (Wilkins 1972: pp. 111-112).

Research issues

One of the practical outcomes of vocabulary research are word lists. However, many vocabulary lists are created from native speaker material and thus might not reflect a learner's reality or needs (François et al. 2014: p.3767). Two of the main questions in this research are:

  • How can we assign target proficiency levels to words?

  • How can we assign target proficiency levels to unseen words?

While research on text complexity abounds, research on sentence complexity is already much more scarce and research on lexical complexity is even more scarce. In this research, we use graded textbooks aimed at language learners of Swedish as a source of "graded" words, i.e. words and an approximate level of proficiency based on the occurrences in different textbooks. We then extract different features from the words (length, gender, degree of polysemy) and train a machine learning algorithm able to predict the proficiency level for unseen words.


  • David Alfter, Therese Lindström Tiedemann and Elena Volodina. 2019. LEGATO: A flexible lexicographic annotation tool. Nodalida 2019, Turku, Finland. LiUP Press. [pdf]
  • David Alfter, Lars Borin, Ildikó Pilán, Therese Lindström Tiedemann, Elena Volodina. 2019. From Language Learning Platform to Infrastructure for Research on Language Learning. CLARIN-2018 post-conference volume. LiUP Press. [pdf]
  • David Alfter & Elena Volodina. (2018). Is the whole greater than the sum of its parts? A corpus-based pilot study of the lexical complexity in multi-word expressions. Proceedings of SLTC-2018, Stockholm, Sweden
  • Alfter, David, & Volodina, Elena (2018) Towards Single Word Lexical Complexity Prediction. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 79-88) at NAACL-2018 [pdf]


2. L2 Sentence and Text Readability

(aka Linguistic Complexity)


Linguistic complexity on sentence and text levels (PhD project by Ildikó Pilán, 2013-2018; PhD supervisors: Elena Volodina and Lars Borin)


Selection of authentic examples that can appropriately demonstrate vocabulary items of interest is a vital question for lexicographers and second/foreign language (L2) teachers. At present it is often unknown for instance, on what principles dictionary examples are selected or where examples for illustrating new vocabulary for L2 learners come from. One way of providing examples is to make them up – they are then as typical as the person that comes up with them thinks they should be, but they lack authenticity. Another way is to use some source of authentic texts, e.g. a linguistic corpus, and select examples using concordance software. The only constraint set on the corpus hits is then the occurrence of the target word in the text span (as opposed to sentence) which makes the number of hits often innumerable. In this case examples are authentic, but the selection process can be very tedious and the quality of “candidate” examples can be very different. One more option is to pre-select sentences automatically using a number of constraints downgrading inappropriate samples. The user is then offered top candidate samples he or she can choose from. The resulting list of ranked candidate sentences can be used for further manual or automatic selection (or editing) of top high-quality sentences, reducing the costs and time spent on manual pre-selection of those. The candidate examples can be used: for dictionary entries; to illustrate language features for students of Linguistics; to exemplify vocabulary for language learners; to create test items for L2 learners; to accompany electronic texts (e.g. via clicking on the unknown word the user can see another example of the usage of this word). The ranking algorithm can eventually be used to test web texts for appropriateness for inclusion into a corpus.

The target user groups are therefore lexicographers, L2 teachers, teachers of Linguistics, test item creators, designers of electronic course materials and corpus linguists.

Research issues

The question arising in this connection is whether we can comprehensively describe and model “good examples”. This question has been addressed in different studies (Kilgariff et.al. 2008, Husák 2008, Kosem et.al. 2011, Segler 2007, etc.), and lately even for Swedish as a target language (Borin et.al. 2012a; Volodina et.al., 2012; 2013). Our starting point is that parameters of good examples are language dependent and need to be tested for each language separately.

The issue of sentence readability, as opposed to text readability, has not been a topic of any systematic research so far. Within lexicography the quality of examples has been well-documented, but often the parameters described there are difficult to model for computer applications. In this research we plan to single out the parameters defining sentence readability for three user groups - lexicographers, L2 teachers, teachers of Linguistics; and suggest a readability measure for testing sentences for their appropriateness for the user groups.



  • Ildikó Pilán, Elena Volodina, Lars Borin. 2017. Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. TAL Journal: Special issue NLP for learning and Teaching. Volume 57, Number 3. [pre-print]
  • Ildikó Pilán. 2016. Detecting Context Dependence in Exercise Item Candidates Selected from Corpora. Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications 2016, NAACL, San Diego. [pdf]
  • Pilán, Ildikó, Sowmya Vajjala, Elena Volodina. 2015. A readable read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity. To appear in International Journal of Computational Linguistics and Applications (IJLCA). [pdf]
  • Ildikó Pilán, Elena Volodina and Richard Johansson. 2014. Rule-based and machine learning approaches for second language sentence-level readability. Proceedings of the 9th workshop on Building Educational Applications Using NLP, ACL 2014. [pdf]
  • Pilán, I. Volodina, E. and Johansson, R. (2013). Automatic selection of suitable sentences for language learning exercises. In: 20 Years of EUROCALL: Learning from the Past, Looking to the Future. 2013 EUROCALL Conference, Évora, Portugal, Proceedings [pdf]
  • Elena Volodina, Richard Johansson, Sofie Johansson Kokkinakis. 2012. Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation. Workshop on NLP in Computer-Assisted Language Learning. Proceedings of the SLTC 2012 workshop on NLP for CALL. Linköping Electronic Conference Proceedings 80: 59–70. [pdf]

3. Automatic L2 Essay Grading and Assessment


A project by Ildikó Pilán (part of PhD), David Alfter, Elena Volodina - in collaboration with Torsten Zesch


Learner essay grading presents a lot of challenges, especially in terms of manual assessment time and qualification of assessors. Human assessment is precise and reliable provided that assessors are well trained. However, their judgements can also be subject to different outside factors, such as hunger, bad mood, negative attitude to a learner or boredom. The same essay can be graded differently depending upon outside influences on an assessor's mood. To avoid misjudgements and to ensure objectivity, certain institutions have started to complement human grading by automatic assessment as a reference point?, e.g. ETS (Burstein 2003, Burstein & Chodorow 2010).


Developing an automatic essay grading (AEG) system is a non-trivial task which needs to rely on data consisting of essays that have been manually graded by human assessors, a set of rules and specific features that can be used to predict grades or levels, and a classification algorithm. AEG tasks have been addressed previously in a number of projects, e.g. by Östling et al. (2013) for Swedish, Hancke & Meurers (2013) for German, Burstein & Chodorow (2010) for English, Vajjala & Lõo (2014) for Estonian, etc. Östling et al. (2013) have looked at Swedish upper secondary school essays (mostly L1) and assessed them in performance grades (VG, G, IG) as opposed to reached proficiency levels in L2 Swedish as our intentions go. However, only in few cases such systems are used for real-life assessment and go beyond prototype development.

Project description

Availability of data is critical for AEG experiments. We are using SweLL-pilot (Volodina et al., 2016), a corpus consisting of second language (L2) Swedish essays, linked to reached levels as defined by Common European Framework of References (COE 2001). Our experiments cover automatic ranking of SweLL essays to predict at which CEFR level (A1, A2, B1, B2, C1) an essay is written. The CEFR levels have been selected since the CEFR is very influential in Europe and outside with numerous projects targeting interpretation of CEFR scales (e.g. Hancke & Meurers, 2013; Vajjala & Lõo, 2014), however, very little work has been done for CEFR-based L2 Swedish.

Selection of features is the most important and time-consuming part of AEG projects. Features can be language independent, such as n-grams, sentence and word length, or language specific, such as language models, out-of-vocabulary words (where vocabulary is defined as some lexicon or word list), etc. Our experiments include empiric analysis of data, extraction of relevant features for machine learning experiments or heuristic rules and experimentation with those to select the most predictive ones. Our intention is to test language independent versus language specific models to see how language specific features change the quality of predictions. In this project we collaborate with University of Duisburg-Essen, where Prof. Zesch's group is testing their language independent model on our data, while we develop language specific approaches.

For users, we intend to set up an interface for assessing new essays and for providing feedback to users as far as certain groups of features are concerned (e.g. lexical, grammatical, readability, etc). The first prototype is already under development (EuroCALL article + Pilan/Zesch article)


Deparment of Swedish (UGOT), Swedish Language Bank (Språkbanken, UGOT) and Swe-CLARIN are co-financing work on this project.


  • Ildikó Pilán, Elena Volodina and Torsten Zesch. 2016. Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. To appear in Proceedings of the 26th International Conference on Computational Linguistics (COLING), 2016, Osaka, Japan. [pdf]
  • Ildikó Pilán, Elena Volodina and David Alfter. 2016. Coursebook texts as a helping hand for classifying linguistic complexity in language learners' writings. To appear in Proceedings of the workshop on Computational Linguistics for Linguistic Complexity (CL4LC), COLING 2016, Osaka, Japan.
  • Ildikó Pilán, Elena Volodina. Classification of Language Proficiency Levels in Swedish Learners' Texts. 2016. Proceedings of SLTC 2016, Umeå, Sweden
  • Elena Volodina, Ildikó Pilán, David Alfter. 2016. Classification of Swedish learner essays by CEFR levels. To appear in Proceedings of EuroCALL 2016, Cyprus.