
Our lives are deeply intertwined with technology today which allows us to perform our daily tasks simpler, better, faster. The broader aim of my research is to explore how we can exploit the advantages offered by Natural Language Processing (NLP) technologies for the purposes of teaching and learning foreign or second (L2) languages. My specific research interests include the automatic assessment of proficiency levels and linguistic complexity in texts written by and for language learners as well as automatic exercise generation, especially the selection of exercise item candidates from corpora. During my PhD studies I have collaborated in the creation of a number of new Swedish lexical resources and corpora with an L2 focus. Moreover, I am actively involved in a number of communities and in the organization of different academic events related both to my narrower research interests and NLP in general. For more information see the "Research" tab. In June 2018, I have successfully defended my PhD thesis entitled Automatic proficiency level prediction for Intelligent Computer-Assisted Language Learning .
Important information: Please note that my Gothenburg university email address will no longer be active after December 2018, I will be reachable on the following email address instead: ildiko.pilan [at] gmail.com.
I devote a considerable amount of effort to making the outcome of my research available to the general public free of use through backend programming for the online learning platform Lärka. Some examples:
Ildikó Pilán and Elena Volodina. 2018. Exploring word embeddings and phonological similarity for the unsupervised correction of language learner errors. Proceedings of the COLING 2018 SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pp. 119 – 128. [pdf]
Ildikó Pilán and Elena Volodina. Investigating the importance of linguistic complexity features across different datasets related to language learning. Proceedings of the COLING 2018 Workshop on Linguistic Complexity and Natural Language Processing (LC&NLP), pp. 49 – 58. [pdf]
Ildikó Pilán. 2018. Automatic proficiency level prediction for Intelligent Computer-Assisted Language Learning. Doctoral Thesis. Data linguistica 29. University of Gothenburg. [pdf]
Ildikó Pilán, Elena Volodina, Lars Borin. 2016. Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. Traitement Automatique des Langues (TAL), Special issue on NLP for learning and teaching, 57 (3) [pdf]
Ildikó Pilán, Elena Volodina and Torsten Zesch. (2016). Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In Proceedings of the 26th International Conference on Computational Linguistics (COLING), pp. 2101-2110, Osaka, Japan. [pdf]
Ildikó Pilán, Elena Volodina and David Alfter. (2016). Coursebook texts as a helping hand for classifying linguistic complexity in language learners' writings. In Proceedings of the workshop on Computational Linguistics for Linguistic Complexity (CL4LC), COLING 2016, Osaka, Japan. [pdf]
Ildikó Pilán, Elena Volodina. Classification of Language Proficiency Levels in Swedish Learners' Texts. In Proceedings of Swedish Language Technology Conference (SLTC), 2016, Umeå, Sweden. [pdf]
Ildikó Pilán. Detecting Context Dependence in Exercise Item Candidates Selected from Corpora. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL 2016, San Diego. [pdf]
Additional publications can be found under the 'Publications' tab.
Supervisors: Lars Borin, Elena Volodina
The main goal of my PhD project is to explore methods based on Natural Language Processing for the automatic assessment of proficiency levels for learning Swedish as a second language (L2) defined by the Common European Framework of Reference for Languages (CEFR). For these purposes different linguistic features and machine learning methods are tested for modelling linguistic complexity in two types of L2 texts: texts written by experts for learners (e.g. reading comprehension texts) and texts written by learners themselves (e.g. essays).
Automatic proficiency level prediction has a number of application potentials for the field of Intelligent Computer- Assisted Language Learning, out of which we investigate two directions. Firstly, it can facilitate L2 learners' interaction with authentic corpora through enabling the selection of learning material suitable for their level. We focus on selecting individual sentences from texts and propose a hybrid system operating based on a combination of heuristic rules and machine learning methods. Secondly, linguistic complexity analysis enables the automatic evaluation of L2 texts which can be useful for teaching professionals when preparing learning materials for students or when assessing their writing. Furthermore, in a self- study scenario, such methods give learners the possibility to locate learning material of appropriate levels and to establish their own level based on texts written by them.
My research questions include:
Introduction to programming, Master's Programme in Language Technology, teaching assistant
Introduction to programming, Master's Programme in Language Technology, teaching assistant
Master thesis supervision: Lorena Llozhi. (2015).A list of productive vocabulary generated from second language learners' essays. Master's Programme in Language Technology, co-supervisor
Introduction to programming, Master's Programme in Language Technology, teaching assistant
Language Technology for linguists, teaching assistant