A model based on KB/bert-base-swedish-cased trained to detect personal information, especially in learner essays. This variant differentiates between 38 detailed categories.
Standard reference
Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, and Elena Volodina. 2025. The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 697–708, Tallinn, Estonia. University of Tartu Library. https://aclanthology.org/2025.nodalida-1.70/
Data citation
Szawerna, Maria Irena. sbx/KB-bert-base-swedish-cased_PI-detection-detailed [Data set]. Språkbanken Text. https://doi.org/10.23695/w9zv-9s54

En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.
Caveats
This model does not guarantee the detection of all personal information in the text. Never use it without human supervision (human-in-the-loop). The model performs noticeably worse on texts that are not student essays.
Intended uses
Personal Information detection
Download
File | Size | Modified | Licence |
---|---|---|---|
KB-bert-base-swedish-cased_PI-detection-detailed
The model is hosted on HuggingFace and can be easily accessed e.g. using their Python library.
|
109.84 KB | GPL-3.0 |