A model based on KB/bert-base-swedish-cased trained to detect personal information, especially in learner essays. This variant differentiates only between PI and non-PI and differentiates between beginning and inside.
Standard reference
Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Elena Volodina
(2025):
The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling,
in Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), March 3–4, 2025 Tallinn, Estonia) / Richard Johansson and Sara Stymne (eds.),
pages 697–708
Data citation
Szawerna, Maria Irena. sbx/KB-bert-swedish_PI-detection-basic-iob [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/wb6y-sv35
Additional ways to cite the dataset.
Caveats
This model does not guarantee the detection of all personal information in the text. Never use it without human supervision (human-in-the-loop). The model performs noticeably worse on texts that are not student essays.
Intended uses
Personal Information detection
Download
| File | Size | Modified | Licence |
|---|---|---|---|
|
KB-bert-swedish_PI-detection-basic-iob
The model is hosted on HuggingFace and can be easily accessed e.g. using their Python library.
|
146.01 KB | GPL-3.0 |