A model based on KB/bert-base-swedish-cased trained to detect personal information, especially in learner essays. This variant differentiates between 38 detailed categories and differentiates between beginning and inside.
Standard reference
Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Elena Volodina
(2025):
The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling,
in Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), March 3–4, 2025 Tallinn, Estonia) / Richard Johansson and Sara Stymne (eds.),
pages 697–708
Data citation
Szawerna, Maria Irena. sbx/KB-bert-swedish_PI-detection-detailed-iob [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/w50j-tf54
Additional ways to cite the dataset.
Caveats
This model does not guarantee the detection of all personal information in the text. Never use it without human supervision (human-in-the-loop). The model performs noticeably worse on texts that are not student essays.
Intended uses
Personal Information detection
Download
| File | Size | Modified | Licence |
|---|---|---|---|
|
KB-bert-swedish_PI-detection-detailed-iob
The model is hosted on HuggingFace and can be easily accessed e.g. using their Python library.
|
146.5 KB | GPL-3.0 |