Project description
This PhD project is situated within the VR-funded research environment project Mormor Karl and focuses on the algorithmic part of detection and labeling of personally identifiable information (PII) in research data, and automatic pseudonym generation to replace PII.
The context for the project is set by these two papers:
- Research agenda for the field of pseudonymization and for the Mormor Karl project: Elena Volodina, Simon Dobnik, Therese Lindström Tiedemann and Xuan-Son Vy. 2023. Grandma Karl is 27 Years old – Research Agenda for Pseudonymization of Research Data. Proceedings of the 2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService), Workshop on Big Data and Machine Learning with Privacy Enhancing Tech. Athens, Greece.
- Setting standards within the field of pseudonymization: Elena Volodina, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, Lisa Södergård and Xuan-Son Vu. (2025). Towards shared standards for pseudonymization of research data. In Proceedings of Huminfra Conference 2025 (HiC 2025), Stockholm, 12–13 November 2025
Half-way seminar
Date: April, 20, 2026
Discussant: Niklas Zechner
Included publications (preliminary):
- Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, and Elena Volodina. 2025. The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling In Proceedings of the The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).
- Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Therese Lindström Tiedemann and Elena Volodina. 2024.Detecting Personal Identifiable Information in Swedish Learner Essays. In Proceedings of the the EACL workshop Computational Approaches to Language Data Pseudonymization (CALD-pseudo-2024). EACL, Malta, 2024. Association for Language Technology.
- Maria Irena Szawerna, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Xuan-Son Vu and Elena Volodina. 2024. Pseudonymization Categories across Domain Boundaries. In Proceedings of LREC-Coling 2024.
- Maria Irena Szawerna, David Alfter, Elena Volodina. 2025. Annotating Personal Information in Swedish Texts with SPARV. In Proceedings of the First Workshop on Natural Language Processing and Language Models for Digital Humanities. RANLP, Bulgaria, 2025.
- ...more to come...