Skip to main content

PhD project by Arianna Masciolini

UD as an annotation standard for learner language: from guidelines to applications

Summary

The project revolves around leveraging the Universal Dependencies (UD) framework to develop cross-lingually applicable approaches to Second Language Acquisition (SLA) studies and Computer-Assisted Language Learning (CALL).

Seminars

  1. ideas seminar: UD-based analysis of grammatical errors in L2 texts (April 3, 2023) [slides]
  2. halfway seminar: Cross-lingual approaches to computational SLA: the potential of Universal Dependencies (October 21, 2024) [slides]
  3. final seminar: planned for autumn 2026

Publications

The following publications are planned to be included in the thesis, of which they constitute the core:

  1. Arianna Masciolini, Aleksandrs Berdicevskis, Maria Irena Szawerna, and Elena Volodina. Annotating second language in Universal Dependencies: a review of current practices and directions for harmonized guidelines. In Proceedings of the Eighth Workshop on Universal Dependencies @ SyntaxFest, 2025 [full text] [bibtex] [slides]
  2. Arianna Masciolini, Aleksandrs Berdicevskis, Caroline Grand-Clement, Maria Irena Szawerna, and Elena Volodina. UD Swedish-SweLL: a growing treebank of L2 Swedish (WIP)
  3. Arianna Masciolini, Herbert Lange and Márton András Tóth. Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies. In Huminfra handbook: Empowering digital and experimental humanities, 2025 [full text] [bibtex] [code]
  4. Arianna Masciolini, Elena Volodina, and Dana Dannélls. Towards automatically extracting morphosyntactical error patterns from L1-L2 parallel dependency treebanks. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 2023 [full text] [bibtex] [code] [poster] [slides] [video] (possibly to be substituted by a more up-to-date paper)

Potential additions

The following publications describe preliminary experiments in automatic parsing of L2 texts:

  • Arianna Masciolini, Emilie Francis and Maria Irena Szawerna. Synthetic Error-Augmented Parsing of Swedish as a Second Language: Experiments with Word Order. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING, 2024 [full text] [bibtex] [code] [poster]
  • Arianna Masciolini, Bootstrapping the Annotation of UD Learner Treebanks. In Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING, 2024 [full text] [bibtex] [poster]

The following publications are related to automatic Grammatical Error Correction, a necessary pre-requisite to build CALL applications leveraging parallel learner treebanks:

  • Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, Robert Östling, Kais Allkivi, Špela Arhar Holdt, Ilze Auzina, Roberts Darg̀is, Elena Drakonaki, Jennifer-Carmen Frey, Isidora Glišić, Pinelopi Kikilintza, Lionel Nicolas, Mariana Romanyshyn, Alexandr Rosen, Alla Rozovskaya, Kristjan Suluste, Oleksiy Syvokon, Alexandros Tantos, Despoina-Ourania Touriki, Konstantinos Tsiotskas, Eleni Tsourilla, Vassilis Varsamopoulos, Katrin Wisniewski, Aleš Žagar, and Torsten Zesch. Towards better language representation in Natural Language Processing - a multilingual dataset for text-level Grammatical Error Correction. International Journal of Learner Corpus Research, 2025 [full text] [bibtex] [website] [dataset] [supplementary report]
  • Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, and Robert Östling. The MultiGEC-2025 shared task on Multilingual Grammatical Error Correction at NLP4CALL. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, 2025 [full text] [bibtex] [slides] [website]