10th Workshop on NLP for CALL

NoDaLiDa workshop, Online, May 31 2021

Proceedings

The proceedings are online: Proceedings

Venue

The NLP4CALL workshop is co-located with NoDaLiDa 2021. IMPORTANT: Just like the main conference, the workshop will be organized as an online event.

Registration information

At least one author of the accepted papers should be registered for the workshop. In order to register, please go to the following page and follow the instructions. IMPORTANT: Registration is free of charge. All participants must register before the deadline on May 17, 2021 May 21, 2021, as the link to the workshop will be sent out to registered participants only.
Registration page

Program

Please note that all time indications are CEST/UTC+2

09:00 - 09:10		Opening session
09:10 - 10:00		Invited talk 1 What is an NLP NLP? Considerations from an L2 Assessment Perspective. Mark Brenchley, Kevin Cheung Chair: Elena Volodina
10:00 - 10:20		Coffee break
		Session 1
		Chair: Elena Volodina
10:20 - 10:50		Automatic annotation of curricular language targets to enrich activity models and support both pedagogy and adaptive systems. [PDF] Martí Quixal, Björn Rudzewitz, Elizabeth Bear and Detmar Meurers
10:50 - 11:20		Using Broad Linguistic Complexity Modeling for Cross-Lingual Readability Assessment. [PDF] Zarah Weiss, Xiaobin Chen and Detmar Meurers
11:20 - 11:30		Coffee break
		Session 2
		Chair: David Alfter
11:30 - 12:00		An Experiment on Implicitly Crowdsourcing Expert Knowledge about Romanian Synonyms from Language Learners. [PDF] Lionel Nicolas, Lavinia Nicoleta Aparaschivei, Verena Lyding, Christos Rodosthenous, Federico Sangati, Alexander König and Corina Forascu
12:00 - 13:00		Lunch
13:00 - 13:50		Invited talk 2 Challenges of Gamified Crowdsourcing for language learning applications. [Google slides] Johanna Monti Chair: Johannes Graën
13:50 - 14:00		Coffee break
		Research notes session 1 *
		Chair: David Alfter Discussion leaders: Ildikó Pilán, Zarah Weiss, Gerold Schneider
14:00 - 14:10		Automatic generation of vocabulary and grammar exercises for Finnish and Hungarian. [PDF] Zsanett Ferenczi
14:10 - 14:20		BNP readability formulas for Algerian middle school EFL learners. Younes Behira
14:20 - 14:30		Automatically individualised reading assistance for second language learning through modelling learner vocabulary. [PDF] Frankie Robertson
14:30 - 15:00		Research notes discussion
15:00 - 15:20		Coffee break
		Session 3
		Chair: Johannes Graën
15:20 - 15:40		Leveraging Task Information in Grammatical Error Correction for Short Answer Assessment through Context-based Reranking. [PDF] Ramon Ziai and Anna Karnysheva
15:40 - 16:00		Developing Flashcards for Learning Icelandic. [PDF] Xindan Xu and Anton Karl Ingason
16:00 - 16:20		DaLAJ - a dataset for linguistic acceptability judgments for Swedish. [PDF] [Google slides] Elena Volodina, Yousuf Ali Mohammed and Julia Klezl
16:20 - 16:40		Coffee break + best presentation voting
		Research notes session 2 *
		Chair: Ildikó Pilán Discussion leaders: Lionel Nicolas, Thomas François
16:40 - 16:50		Suggestion for a Shared task at the next NLP4CALL. [Google slides] Elena Volodina
16:50 - 17:00		Swedish Profile - research and L2 teaching potential. [PDF] Therese Lindström Tiedemann, Elena Volodina, Yousuf Ali Mohammed
17:00 - 17:20		Research notes discussion
17:20 - 17:30		Closing session + best paper and presentation awards

* Note: For research notes sessions, questions/discussion will be at the end of the session, not after each presentation.

Invited speakers

Mark Brenchley and Kevin Cheung

Mark Brenchley is Senior Research Manager at Cambridge Assessment English. Mark manages research supporting the development and validation of Cambridge English products in the areas of speaking and writing, as well as vocabulary and grammar more broadly. He specialises in the application of corpus-based methodologies and is responsible for maintaining and developing the company’s internal corpus architecture, including the Cambridge Learner Corpus. His current work, in particular, focuses on the development and validation of auto-marking technologies.

Mark holds a PhD in Education from the University of Exeter, where he explored the development of spoken and written syntax within the English education system. Following his PhD, he co-developed the Growth in Grammar Corpus, a novel corpus of student writing that covers the primary and secondary phases of the English education system.

Kevin Cheung is a Chartered Psychologist, psychometrics expert and research leader, currently heading the Marking and Results department at Cambridge Assessment English. This recently formed team is responsible for R&D of technologies for marking and results determination, and their subsequent deployment; in particular, they are focused on systems that use AI and natural language processing (NLP) research. As part of this, Kevin also leads on collaboration with the University of Cambridge’s Automated Language Teaching and Assessment (ALTA) Institute. Prior to setting up the Marking and Results Capability team, he managed Cambridge English’s writing research. Kevin’s research specialisms are in academic writing, scale development and assessment.

Title: What is an NLP NLP? Considerations from an L2 Assessment Perspective

Recent years have witnessed what feels like an exponential development in the scope and performance of NLP-approaches to human language. This is no less the case regarding the field of second language assessment, where NLP techniques seem likely to become ever more essential to, and integrated with, the assessment process. Indeed, surveying the recent progress of NLP, it seems hard to think of an assessment area where such techniques would not have genuine practical value. From an NLP-perspective, in other words, the future of NLP-informed assessment looks extremely bright. At the same time, it remains important to keep taking stock, especially where there is always a chance that techniques and applications will advance at a faster rate than our ability to properly conceptualise them. With that in mind, this talk offers a more philosophical perspective on the role of NLP in second language assessment, focusing on the question of what it might actually mean for something to be an "NLP NLP"; that is, a natural language processed, natural language profile. In general, it will explore the relationship between NLP and L2 profiles with regard to the wider notion of validity as a key assessment concept (Messick, 1989; Bachman & Palmer, 1996; Weir, 2005; Kane, 2006); In particular, it will do so with regard to the specific validation framework utilised at Cambridge English (e.g. Shaw & Weir, 2007), and with reference to some of our current principles and practice.

Johanna Monti

Johanna Monti is currently Associate Professor and Third Mission Delegate at the L’Orientale University of Naples, where she teaches Translation Studies, Specialised Translation, Computational Linguistics for Translation, Machine and Computer Aided Translation. She received her PhD in Theories, Methodologies and Advanced applications for Communication, Computer Science and Physics with a thesis in Computational Linguistics at the University of Salerno, Italy. She the Chief Scientist of the UNIOR NLP Research Group, node in Natural Language Processing and Computational Linguistics of the CINI Italian Lab on Artificial Intelligence and Intelligent Systems. Her current research activities are in the field of Machine Translation, the impact of MT in the translation process, the evaluation of the new translation technologies and finally new methodologies in the development of linguistic data for NLP & CALL applications.

Title: Challenges of Gamified Crowdsourcing for language learning applications

Gamification is defined (Deterding et al., 2011) as "the use of game design elements in non-game contexts" and it has been employed variously in the field of language learning. Gamified crowdsourcing is an increasing practice, and researchers explore creative methods of use in different domains (Morschheuser et al. 2017; Morschheuser and Hamari 2019; Murillo Zamorano et al. 2020) also with respect to the collection of language resources for language learning applications (Fort et al. 2018; Nicolas et al. 2020; Eryiğit, et al. 2021 among others) In this talk, I will present an overview of different types of gamified crowdsourcing and discuss the emerging opportunities and challenges of using it for language learning applications.

Description of the workshop

The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.

The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.

The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.

We welcome papers:

that describe research directly aimed at ICALL;
that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
that discuss challenges and/or research agenda for ICALL
that describe empirical studies on language learner data.

This year a special focus is given to work done on second language vocabulary and grammar profiling, as well as the use of crowdsourcing for creating, collecting and curating data in NLP projects.

We encourage paper presentations and software demonstrations describing the above-mentioned themes primarily, but not exclusively, for the Nordic languages.

Submission information

This year we will use the NLP4CALL 2021 stylesheet for submissions. The author kit, containing LaTeX templates as well as Word template, can be downloaded from here:

Authorkit (version 2) (LaTeX and MS Word templates)
Overleaf template

Authors are invited to submit long papers (8-12 pages) alternatively short or demo papers (4-7 pages), page count not including references. Please indicate one relevant paper type at submission time. Only PDF files will be accepted. Submissions will be managed through the electronic conference management system EasyChair. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.

Papers should describe original unpublished work or work-in-progress. Every paper will be reviewed by at least 2 members of the program committee. As reviewing will be blind, please ensure that papers are anonymous. Self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", should be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Submissions will be judged on appropriateness, clarity, originality/innovativeness, correctness/soundness, meaningful comparison, significance and impact of ideas or results.

All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through ACL anthology, following experiences from previous workshops, e.g. the 9th NLP4CALL.

IMPORTANT: For licensing reasons, all camera-ready papers must include the following sentence as an unmarked (unnumbered) footnote on the first page of the paper: This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by/4.0/.

NEW: Research note session (expression of interest) Following last year's success, we continue with a research notes session this year. The research notes session is an opportunity to present and discuss ideas, projects and work-in-progress and is aimed at two different audiences: MA/PhD students and senior researchers. Each student is paired with a senior researcher to discuss their ideas. Senior researchers are not paired.

If you want to participate in the research notes session, please send an email to David Alfter (david dot alfter at gu dot se) with the subject line NLP4CALL 2021 research notes including the title of the presentation, the presenter, whether the presenter is an MA/PhD student, and a short (50 words) description of the talk. The deadline for expressions of interest coincides with the paper submission deadline (March 26) but the notification of acceptance is later (April 27). Research notes presentations are not included in the proceedings of the workshop.

Important dates

13 January: first call for papers
8 February: second call for papers
1 March: third call for papers
11 March: final call for papers
18 March26 March: paper submission deadline (long, short and demo, research notes)
19 April: notification of acceptance (regular papers)
27 April: notification of acceptance (research notes)
6 May: camera-ready deadline
31 May: workshop date

Program committee (preliminary)

David Alfter, University of Gothenburg, Sweden
Claudia Borg, University of Malta, Malta
António Branco, Universidade de Lisboa, Portugal
Andrew Caines, University of Cambridge, UK
Xiaobin Chen, Universität Tübingen, Germany
Kordula de Kuthy, Universität Tübingen, Germany
Simon Dobnik, University of Gothenburg, Sweden
Thomas François, Université catholique de Louvain, Belgium
Johannes Graën, University of Gothenburg, Sweden and Universitat Pompeu Fabra, Spain
Andrea Horbach, University of Duisburg-Essen, Germany
Ronja Laarman-Quante, University of Duisburg-Essen, Germany
Herbert Lange, University of Gothenburg, Sweden and Chalmers Institute of Technology, Sweden
Peter Ljunglöf, University of Gothenburg, Sweden and Chalmers Institute of Technology, Sweden
Verena Lyding, EURAC research, Italy
Detmar Meurers, Universität Tübingen, Germany
Margot Mieskes, University of Applied Sciences Darmstadt, Germany
Lionel Nicolas, EURAC research, Italy
Ulrike Pado, Hochschule für Technik Stuttgart, Germany
Magali Paquot, Université catholique de Louvain, Belgium
Ildikó Pilán, Norwegian Computing Center, Norway
Gerold Schneider, University of Zurich, Switzerland
Egon Stemle, EURAC research, Italy
Anaïs Tack, Université catholique de Louvain, Belgium and KU Leuven, Belgium
Irina Temnikova, Mitra Translations, Bulgaria
Sowmya Vajjala, National Research Council, Canada
Elena Volodina, University of Gothenburg, Sweden
Zarah Weiss, Universität Tübingen, Germany
Victoria Yaneva, National Board of Medical Examiners, Philadelphia, USA
Torsten Zesch, University of Duisburg-Essen, Germany
Ramon Ziai, Universität Tübingen, Germany
Robert Östling, Stockholm University, Sweden

Workshop organizers

David Alfter, Språkbanken, Department of Swedish, University of Gothenburg (Organizing chair)
Elena Volodina, Språkbanken, Department of Swedish, University of Gothenburg
Ildikó Pilán, Norwegian Computing Center, Oslo, Norway
Johannes Graën, Institut für Computerlinguistik, Universität Zürich
Lars Borin, Språkbanken, Department of Swedish, University of Gothenburg

The workshop series has been previously financed by the Centre for Language Technology (University of Gothenburg), the SweLL project (University of Gothenburg) and the Swedish Research Council's conference grant. Currently the funding comes from Språkbanken-Text and the L2 profiling project.

For the past nine years we successfully co-located the NLP4CALL with the two major Language Technology events in Scandinavia, SLTC and NoDaLiDa, thus making this workshop an annual event. We intend to continue this tradition. Through this workshop, we intend to profile ICALL research in Nordic countries and to provide a dissemination venue for researchers active in this area.

ICALL-relevant mailing lists

There are two mailing lists that spread ICALL-relevant information: one run by EuroCALL/CALICO SIG-ICALL group (nlpcall@artsservices.uwaterloo.ca // nlpcall@watarts.uwaterloo.ca) and the other one run by BEA-workshop organizers (bea.nlp.workshop@gmail.com). We encourage you to join them to be updated of the events, publications and discussions in the area

To join EuroCALL/CALICO list, contact Mathias Schulze (mschulze@uwaterloo.ca) . You can freely write to the EuroCALL/CALICO list when you want to disseminate some call for papers/information or ask questions.
To join BEA-list, contact Ekaterina Kochmar (Ekaterina.Kochmar@cl.cam.ac.uk) . BEA-mailing list spreads information in a digest form approx 4 times a year.

For NLP4CALL inquiries, please email David Alfter (david dot alfter at svenska dot gu dot se)