Project description
This PhD project is situated within the VR-funded research environment project [Mormor Karl](https://mormor-karl.github.io/) and focuses on the algorithmic part of detection and labeling of personal information in research data, and automatic pseudonym generation to replace it.
Half-way seminar
Date: April, 20, 2026
Discussant: Niklas Zechner
Included publications (preliminary):
- Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, and Elena Volodina. 2025. The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling In Proceedings of the The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).
- Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Therese Lindström Tiedemann and Elena Volodina. 2024.Detecting Personal Identifiable Information in Swedish Learner Essays. In Proceedings of the the EACL workshop Computational Approaches to Language Data Pseudonymization (CALD-pseudo-2024). EACL, Malta, 2024. Association for Language Technology.
- Maria Irena Szawerna, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Xuan-Son Vu and Elena Volodina. 2024. Pseudonymization Categories across Domain Boundaries. In Proceedings of LREC-Coling 2024.
- ...more to be decided...