8th NLP4CALL, Nodalida, Turku, Finland

NoDaLiDa workshop, Turku, Finland, September 30 2019

Linköping University Press proceedings: Click here

ACL proceedings: Click here

Best presentation goes to: Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education. Sabrina Dittrich, Zarah Weiss, Hannes Schröter, Detmar Meurers

Quick links

Venue

The NLP4CALL workshop is co-located with NoDaLiDa 2019 in Turku.

Registration information

At least one author of the accepted papers should be registered for the workshop. In order to register, please go to this page and follow the instructions.

Program

Room: PUB5


08:30 - 09:00		Registration
09:00 - 09:10		Opening session Chair: Elena Volodina
		Session 1
		Chair: Thomas François
09:10 - 09:35		Predicting learner knowledge of individual words using machine learning. Drilon Avdiu, Vanessa Bui, Klára PtaÄinová KlimÄÄ±Ìková [slides]
09:35 - 10:00		Understanding Vocabulary Growth Through An Adaptive Language Learning System. Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger,Yu Qiao, Christian Kohlschein, Tobias Meisen [slides]
10:00 - 10:30		Coffee break
		Session 2
		Chair: Egon Stemle
10:30 - 10:50		Formalism for a language agnostic language learning game and productive grid generation. Sylvain Hatier, Arnaud Bey, Mathieu Loiseau [slides]
10:50 - 11:10		Summarization Evaluation meets Short-Answer Grading. Margot Mieskes, Ulrike Padó [slides]
11:10 - 12:00		Invited talk 1 Assessing language complexity for L2 readers with NLP techniques and corpora. Thomas François [slides] Chair: Elena Volodina
12:00 - 13:15		Lunch
		Session 3
		Chair: Herbert Lange
13:15 - 13:40		Toward automatic improvement of language produced by non-native language learners. Mathias Creutz, Eetu Sjöblom [slides]
13:40 - 14:05		Linguistic features and proficiency classification in L2 Spanish and L2 Portuguese. Iria del RÄ±Ìo (Video presentation) [slides]
14:05 - 14:30		Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education. Sabrina Dittrich, Zarah Weiss, Hannes Schröter, Detmar Meurers
14:30 - 14:55		Automatic Generation and Semantic Grading of Esperanto Sentences in a Teaching Context. Eckhard Bick [slides]
15:00 - 15:30		Coffee break
		Session 4
		Chair: David Alfter
15:30 - 15:50		Experiments on Non-native Speech Assessment and its Consistency. Ziwei Zhou, Sowmya Vajjala, Seyed Vahid Mirnezami (Video presentation) [slides]
15:50 - 16:10		The Impact of Spelling Correction and Task Context on Short Answer Assessment for Intelligent Tutoring Systems. Ramon Ziai, Florian Nuxoll, Kordula De Kuthy, Björn Rudzewitz, Detmar Meurers [slides]
16:10 - 17:00		Invited talk 2 Towards an infrastructure for FAIR language learner corpora. Egon Stemle [slides] Chair: Elena Volodina
17:00 - 17:30		Talk from the organization and closing session: SVALA - pseudonymization service for L2 Swedish [slides] Elena Volodina
17:30 - 19:00		Free time
19:00 - 20:30		Welcome reception (Old Town Hall, Aurakatu 2)

Invited speakers

This year we have the pleasure to welcome two invited speakers:

Thomas François, UCLouvain

Thomas François is Assistant Professor in Applied Linguistics and Natural Language Processing at UCLouvain (Cental). His work focuses on automatic assessment of text readability, automatic text simplification, complex word identification, efficient communication in business, and the use of French as a professional language. He has been an invited researcher at IRCS (University of Pennsylvania) as a Fulbright and BAEF fellow and, later, has been a FNRS post-doctoral researcher. He has led research projects such as CEFRLex (http://cental.uclouvain.be/cefrlex/), a CEFR-graded lexicon for foreign language learning or AMesure (http://cental.uclouvain.be/amesure/), a platform to support simple writing. His work on readability for French as a foreign language has been awarded the best thesis Award by the ATALA in 2012 and the best paper in the TALN2016 conference.

Title: Assessing language complexity for L2 readers with NLP techniques and corpora

Assessing language complexity for both native (L1) and foreign language (L2) readers has been at the core of the field of readability for nearly a century. This research field has greatly contributed to improving the comprehensibility of written communication, for example by helping to improve the readability of major newspapers, technical manuals, and administrative documents. The limitations of readability models have, however, been stressed as soon as the end of the 70s. This eventually led to the investigation of new research avenues based on computational linguistics as well as machine learning techniques to improve traditional approaches. Such advances, combined with automatic investigations of large corpora and automatic approaches to text simplification, made it possible to develop a range of computer-based tools to enhance L2 learners’ access to texts as well as to pinpoint complex linguistic forms in a text.

In this presentation, I will summarize the main trends regarding the automatic assessment of language complexity for L2 readers and focus on three research projects. To illustrate the readability approach, the DMesure project will be presented. It is the first computational readability formula specialized for readers of French as a foreign language. Secondly, the talk will discuss the use of corpora to assess language complexity through CEFRLex, an international project providing, for some of the main European languages, lexical resources describing the frequency distributions of words across the six levels of competence of the Common European Framework of Reference for Languages (CEFR). These distributions have been estimated on corpora of pedagogical materials intended for L2 purposes such as textbooks and simplified readers. The resulting resources have been manually checked and are machine-readable and open-licensed. The project also offers an interface allowing to automatically assess difficult words in a text in accordance with CEFRLex knowledge. Thirdly, the Predicomplex project will illustrate the use of learner data. It consists in a personalized approach of vocabulary knowledge prediction using machine learning algorithms. I will conclude by highlighting some of the current challenges and research opportunities relative to language difficulty assessment for L2 learners.

Keywords: readability, graded lexical resource, computer-assisted language learning, natural language processing, complex word identification.

Egon Stemle, Eurac

Egon Stemle is a researcher in the Institute for Applied Linguistics at Eurac Research, Bolzano, Italy. He is a cognitive scientist with a focus in the area where computational linguistics and artificial intelligence converge. He works on the creation, standardisation, and interoperability of tools for editing, processing, and annotating linguistic data and enjoys working together with other scientists on their data but also collects or helps to collect new data from the Web, from computer-mediated communication and social media, and from language learners. He is an advocate of open science to make research and data available for others to consult or reuse in new research.

Title: Towards an infrastructure for FAIR language learner corpora

In recent years, the reproducibility of scientific research has become increasingly important, both for external stakeholders and for the research communities themselves. They all demand that empirical data collected and used for scientific research is managed and preserved in a way that research results are reproducible. In order to account for this, the FAIR guiding principles for data stewardship have been established as a framework for good data management aiming at the findability, accessibility, interoperability, and reusability of research data. A special role is played by natural language processing and its methods, which are an integral part of many other disciplines working with language data: Language corpora are often living objects – they are constantly being improved and revised, and at the same time the processing tools are also regularly updated, which can lead to different results for the same processing steps. In this presentation I will first investigate CMC corpora, which resemble language learner corpora in some core aspects, with regard to their compliance with the FAIR principles and discuss to what extent the deposit of research data in repositories of data preservation initiatives such as CLARIN, Zenodo or META-SHARE can assist in the provision of FAIR corpora. Second, I will show some modern software technologies and how they make the process of software packaging, installation, and execution and, more importantly, the tracking of corpora throughout their life cycle reproducible. This in turn makes changes to raw data reproducible for many subsequent analyses.

Keywords: research data management, language learner corpora, reusability, FAIR principles

Description of the workshop

The theme on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection.

The intersection of Natural Language Processing and Speech/Dialogue Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech/Dialogue Technology, ICALL researchers need good insights into the second language acquisition (SLA) theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore all ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories/pedagogical practices are modeled in ICALL tools.

The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection.

The latter includes , among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other .

The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.

The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.

We welcome papers:

that describe research directly aimed at ICALL;
that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts/responses, individualized learning solutions, provision of feedback;
that discuss challenges and/or research agenda for ICALL;
that describe empirical studies on language learner data

A special focus is given to the established and upcoming infrastructures aimed at SLA and learner corpus research, covering questions such as data collection, legal issues, reliability of annotation, annotation tool development, search environments for SLA-relevant data, etc.

We encourage paper presentations and software demonstrations describing the above-mentioned themes primarily, but not exclusively, for the Nordic languages.

Submission information

We will be using NoDaLiDa 2019 template for the workshop this year. The author kit, containing LaTeX templates as well as Word template, can be downloaded from here:

Author kit

IMPORTANT: For submission, please leave the placeholder authors in the LaTeX template, as this template does not automatically anonymize author names.

Authors are invited to submit long papers (8-12 pages) alternatively short or demo papers (4-7 pages), page count not including references. Please indicate one relevant paper type at submission time. Only pdf files will be accepted. Submissions will be managed through the electronic conference management system EasyChair. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.

Papers should describe original unpublished work or work-in-progress. Every paper will be reviewed by at least 2 members of the program committee. As reviewing will be blind, please ensure that papers are anonymous. Self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", should be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Submissions will be judged on appropriateness, clarity, originality/innovativeness, correctness/soundness, meaningful comparison, significance and impact of ideas or results.

All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through ACL anthology, following experiences from previous workshops, e.g. the 7th NLP4CALL.

IMPORTANT: For licensing reasons, all camera-ready papers should include the following sentence as an unmarked (unnumbered) footnote on the first page of the paper: This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by/4.0/.

Important dates:

25 March, Monday: first call for papers
29 April, Monday: second call for papers
20 May, Monday: third call for papers
24 June, Monday: final call for papers
30 June, Sunday EXTENDED: 7 July, Sunday: paper submission deadline (long, short and demo)
18 August, Sunday: notification of acceptance
6 September, Friday: camera-ready papers for publication
30 September, Monday: workshop date

Program committee (preliminary):

Lars Ahrenberg, Linköping University, Sweden
David Alfter, University of Gothenburg, Sweden
Lisa Beinborn, University of Amsterdam, Netherlands
Eckhard Bick, University of Southern Denmark, Denmark
Lars Borin, University of Gothenburg, Sweden
António Branco, University of Lisbon, Portugal
Jill Burstein, Educational Testing Service, USA
Andrew Caines, University of Cambridge, UK
Simon Dobnik, University of Gothenburg, Sweden
Thomas François, UCLouvain, Belgium
Johannes Graën, University of Gothenburg, Sweden
Andrea Horbach, University of Duisburg-Essen, Germany
Herbert Lange, University of Gothenburg and Chalmers University of Technology, Sweden
John Lee, City University of Hong Kong, China
Peter Ljunglöf, University of Gothenburg and Chalmers University of Technology, Sweden
Montse Maritxalar, University of the Basque Country, Spain
Beata Megyesi, Uppsala University, Sweden
Detmar Meurers, University of Tübingen, Germany
Ildikó Pilán, City University of Hong Kong, China
Martí Quixal, Universitat Oberta de Catalunya, Spain
Robert Reynolds, Brigham Young University, USA
Gerold Schneider, University of Zurich, Switzerland
Irina Temnikova, Sofia University, Bulgaria
Cornelia Tschichold, Swansea University, UK
Francis M. Tyers, Indiana University Bloomington, USA
Sowmya Vajjala, National Research Council Canada, Canada
Elena Volodina, University of Gothenburg, Sweden
Mats Wirén, Stockholm University, Sweden
Victoria Yaneva, University of Wolverhampton, UK
Torsten Zesch, University of Duisburg-Essen, Germany
Robert Östling, Stockholm University, Sweden

Workshop organizers

David Alfter, Språkbanken, Department of Swedish, University of Gothenburg; david dot alfter at svenska dot gu dot se (Organizing chair)
Elena Volodina, Språkbanken, Department of Swedish, University of Gothenburg; elena dot volodina at svenska dot gu dot se
Ildikó Pilán, City University of Hong Kong; ildiko dot pilan at gmail dot com
Herbert Lange, Department of Computer Science and Engineering, University of Gothenburg and Chalmers University of Technology, Sweden; herbert dot lange at cse dot gu dot se
Lars Borin, Språkbanken, Department of Swedish, University of Gothenburg; lars dot borin at svenska dot gu dot se

This workshop follows a series of workshops on NLP for CALL organized by a Special Interest Group in Intelligent Computer-Assisted Language Learning (SIG-ICALL of NEALT). The workshop series has previously been financed by the Center for Language Technology at the University of Gothenburg.

We intend to continue this workshop series, which up to date has been the only ICALL-relevant recurring event based in the Nordic countries. Our intention is to co-locate the workshop series with the two major LT events in Scandinavia, SLTC and Nodalida, thus making this workshop an annual event. Through this workshop, we intend to profile ICALL research in Nordic countries and to provide a dissemination venue for researchers active in this area.

Related links

ICALL-related mailing lists

There are two mailing lists that spread ICALL-relevant information: one run by EuroCALL/CALICO SIG-ICALL group (nlpcall@artsservices.uwaterloo.ca // nlpcall@watarts.uwaterloo.ca) and the other one run by BEA-workshop organizers (bea.nlp.workshop@gmail.com). We encourage you to join them to be updated of the events, publications and discussions in the area

To join EuroCALL/CALICO list, contact Mathias Schulze (mschulze@uwaterloo.ca) . You can freely write to the EuroCALL/CALICO list when you want to disseminate some call for papers/information or ask questions.
To join BEA-list, contact Ekaterina Kochmar (Ekaterina.Kochmar@cl.cam.ac.uk) . BEA-mailing list spreads information in a digest form approx 4 times a year.

For NLP4CALL inquiries, please email David Alfter (david dot alfter at svenska dot gu dot se)