14th NLP4CALL | Språkbanken Text

A workshop co-located with NoDaLiDa/Baltic-HLT in Tallinn, Estonia on March 5th, 2025.

The proceedings are out! You can check them here.

Quick Links

Venue

This year's NLP4CALL workshop will be organized as a hybrid event. The workshop will physically take place in Tallinn, Estonia on March 5th, 2025 and online on Zoom. The Zoom link will be sent out to registered participants.

The physical location will be Hestia Hotel Europa (Address: Paadi 5, Tallinn, Estonia) [View on Google Maps] at the Lääne-Euroopa room. For on-site participation, the links and maps will be provided on the main conference website: venue.

Registration

Registration is done via the NoDaLiDa/Baltic-HLT registration website.

Program

Start	End	Event	Authors

09:00	9:10	Opening Session	Ricardo Muñoz Sánchez

		Session 1: Invited Talk	Chair: Jelena Kallas
09:10	10:10	AI-assisted (Pedagogical) Constructicography – Opportunities and Challenges	Peter Uhrig

10:10	10:30	Coffee Break

		Session 2: Long and Short Papers	Chair: Arianna Masciolini
10:30	11:00	Investigating Linguistic Abilities of LLMs for Native Language Identification [link]	Ahmet Yavuz Uluslu and Gerold Schneider
11:00	11:30	PIRLS Category-specific Question Generation for Reading Comprehension [link]	Yin Poon, Qiong Wang, John S. Y. Lee, Yu Yan Lam, and Samuel Kai Wah Chu
11:30	12:00	Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language [link]	Soroosh Akef, Detmar Meurers, Amália Mendes, and Patrick Rebuschat

12:00	12:15	Break

		Session 3: Work in Progress	Chair: Ricardo Muñoz Sánchez
12:15	12:45	A prototype authoring tool for editing authentic texts using LLMs to increase support for contextualised L2 grammar practice [link]	Stephen Bodnar
12:45	13:15	Corpus-Based Ukrainian Vocabulary List for Foreign Learners on the PULS Platform	Olena Synchak, Vasyl Starko, and Mariana Burak

13:15	14:30	Lunch Break

		Session 4: Invited Talk	Chair: Elena Volodina
14:30	15:30	The Potential and the Pitfalls of Very Large Language Models for Language Learning Applications	Andrew Caines

		Session 5: The MultiGEC Shared Task	Chair: Maria Irena Szawerna
15:30	16:00	The MultiGEC-2025 Shared Task on Multilingual Grammatical Error Correction at NLP4CALL [link]	Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan KurfalÄ±, Ricardo Muñoz Sánchez, Elena Volodina, and Robert Östling
16:00	16:30	Lattice @MultiGEC-2025: A Spitful Multilingual Language Error Correction System Using LLaMA [link]	Olga Seminck, Yoann Dupont, Mathieu Dehouck, Qi Wang, Noé Durandard, and Margo Novikov
16:30	17:00	UAM-CSI at MultiGEC-2025: Parameter-efficient LLM Fine-tuning for Multilingual Grammatical Error Correction [link]	Ryzard Staruch

17:00	17:10	Closing Session	Ricardo Muñoz Sánchez

19:00		Dinner

Lunch will be at Kochi Aidad (Lootsi 10) and dinner will be at Frenchy Bistro (Telliskivi 60a/5). Note that the cost of the food is not included as part registration to the workshop and everyone joining will have to cover their own expenses.

Shared Task

This year we are offering the MultiGEC shared task on multilingual grammatical error correction for L2 language learners. There are 12 target languages covered, namely Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish, and Ukrainian. This shared task is organized by the Computational SLA working group.

For more information, please see the Shared Task website: https://github.com/spraakbanken/multigec-2025/.

Invited Speakers

This year we have the pleasure to announce two invited speakers:

Andrew Caines: The Potential and the Pitfalls of Very Large Language Models for Language Learning Applications

Abstract: Very large language models with billions or trillions of parameters (LLMs) currently dominate the NLP landscape, and recently they have been used for many tasks in CALL, such as essay grading, error correction, and content creation. Since LLMs outperform earlier models on many NLP tasks now, it can be tempting to view them as a ready solution for any domain, including education. But it is important to evaluate precisely how good they are on CALL-related tasks and understand where they fail or can be improved. I will give an overview of the work on CALL for English by the ALTA group in Cambridge: comparing the performance of LLMs and supervised models on various tasks, pointing out some of their strengths and weaknesses, and giving an overview of our research towards ‘baby’ LMs trained at lower costs but still maintaining good levels of performance. I will also look at the multilingual work of the CompSLA group and end by listing some promising areas for future research and collaboration.

Bio: Andrew Caines is a Senior Research Associate based in the Computer Laboratory at the University of Cambridge, U.K. He has been a member of the Institute for Automated Language Teaching & Assessment (ALTA) since its inception in 2013. His research interests relate to education technology for language learning, including corpus creation, automated essay scoring, grammatical error detection and correction, adaptive learning, content creation and the training of smaller, domain-specific language models.

Peter Uhrig: AI-assisted (Pedagogical) Constructicography – Opportunities and Challenges

Abstract: With the rise of large machine-readable computer corpora, lexicographers found themselves in the (un-)comfortable situation of having enough data on words and their combinations, but often at such a volume that it became difficult to select the most relevant information to include in the dictionary. We have shown in previous work (Uhrig & Proisl 2012, Evert et al. 2017, Uhrig et al. 2018) how the application and the understanding of NLP and statistics can improve the extraction of collocation candidates from corpora and thus provide the lexicographer with more accurate and relevant lists collocation candidates. Given that current large language models distill linguistic knowledge from corpora that are much larger than the ones we used in our previous research, it appears only logical to turn to them for even more improved input on collocations. In this talk, I will explore prompting strategies on standard models and show how fine-tuning can be used to turn such models into constructicographers that write full collocations dictionary entries. I will include a cursory evaluation of the resulting entries and a brief discussion about the usefulness (or lack thereof) of collocations dictionaries in the age of LLMs.

Bio: Peter Uhrig is professor of Digital Linguistics with a focus on Big Data at Friedrich-Alexander-Universität Erlangen-Nürnberg. His research interests include cognitive linguistics, especially Construction Grammar, collo-phenomena (collocation, collostruction), computational and corpus linguistics, and lexicography. He is particularly interested in using large multimodal datasets, data science methods, and machine learning in his work. In addition to his research, Peter Uhrig is committed to creating research infrastructures and open datasets, supporting the broader linguistic community. His work aims to integrate technology with linguistic research, contributing to the evolving field of Digital Linguistics.

References:

Stefan Evert, Peter Uhrig, Sabine Bartsch, Thomas Proisl (2017): “E-VIEW-alation – a large-scale evaluation study of association measures for collocation identification.” In Electronic lexicography in the 21st century. Proceedings of the eLex 2017 conference, Leiden, The Netherlands.
Peter Uhrig, Stefan Evert, Thomas Proisl (2018): “Collocation candidate extraction from dependency-annotated corpora: Exploring the differences between parsers and dependency annotation schemes.” In: Pascual Cantos Gómez and Moisés Almela Sánchez (eds.) Lexical Collocation Analysis: Advances and Applications. Berlin: Springer.
Peter Uhrig, Thomas Proisl (2012): “Less hay, more needles – using dependency-annotated corpora to provide lexicographers with more accurate lists of collocation candidates.” Lexicographica 28.

Description of the Workshop

The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on integrating Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, the integration of insights from Second Language Acquisition (SLA) research, and the promotion of “Computational SLA” through setting up Second Language research infrastructures.

The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research — Intelligent CALL, or short, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop therefore invites a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data and modeled in ICALL tools. The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.

We welcome papers:

that describe research directly aimed at ICALL;
that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application, or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
that discuss challenges and/or research agenda for ICALL;
that describe empirical studies on language learner data; or
that explore the use of LLMs and Generative AI to develop ICALL tools.

In this edition of the workshop a special focus is given to:

Grammatical error correction, with a special track for the MultiGEC shared task.
The use of pedagogically oriented constructicographic resources (constructicons), with an emphasis on their practical application in ICALL. By constructicographic resources, we refer to resources that describe various types of constructions associated with specific meanings or functions, ranging from fully schematic and semi-schematic constructions (e.g., those with both fixed and variable elements) to specific lexical expressions.

We particularly encourage software demonstrations showcasing the potential use of existing Language and Speech Technologies or resources in ICALL applications for Nordic and Finno-Ugric languages.

Submission Information

We accept both short and long papers, as well as demo papers. The submissions must describe original and unpublished work.

Paper length:

Papers sent to the NLP4CALL workshop must adhere to the following page limits:

Short and demo papers must be between 4 and 7 pages.
Long papers must be between 8 and 12 pages.
Shared task papers must be in between 4 and 12 pages.

Other considerations regarding the paper length are as follows:

Papers have can have an unlimited number of pages for references.
Appendices are allowed and are not counted in the page count. However, the main body of the paper has to be self-contained. That is, the reviewers are not expected to look at them.
Camera-ready versions of accepted papers will be given an additional page to address reviewer comments.

Papers should describe original unpublished work or work-in-progress and will be peer-reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be published both in the NEALT Proceeding Series and through the ACL anthology.

The submission will be through EasyChair: https://easychair.org/my/conference?conf=nlp4call2025

The links to the Latex and Word templates can be found here: https://github.com/NLP4CALL/current/blob/website/_includes/other_info/submission_information.md

Important Dates

All deadlines are anywhere on earth.

Submission date: December ~~16th~~ 19th, 2024
Acceptance notification: January 20th, 2025
Camera-ready papers: February 2nd, 2025
Workshop date: March 5th, 2025

Workshop Organizers

Ricardo Muñoz Sánchez, Språkbanken Text, University of Gothenburg, Sweden
David Alfter, Gothenburg Research Infrastructure in Digital Humanities (GRIDH), University of Gothenburg, Sweden
Elena Volodina, Språkbanken Text, University of Gothenburg, Sweden
Jelena Kallas, Institute of the Estonian Language, Estonia

Information about Sponsors and Other Supporters

This workshop is supported jointly by:

The project Expanding the scope of a multi-purpose lexicographic resource to grammar and L2 competence, funded by the Estonian Research Council grant (PRG 1978).
The project Grandma Karl is 27 years old: Automatic pseudonymization of research data with the Swedish Research Council grant with funding number 2022-02311.
The research infrastructure Språkbanken, jointly funded by its 10 partner institutions and the Swedish Research Council (2018–2024; dnr 2017-00626)