Skip to main content

sbx/KB-bert-swedish_PI-detection-detailed-iob

Standard reference Information

Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Elena Volodina (2025): The Devil’s in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling, in Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), March 3–4, 2025 Tallinn, Estonia) / Richard Johansson and Sara Stymne (eds.), pages 697–708 BibTeX

Data citation Information

Szawerna, Maria Irena. sbx/KB-bert-swedish_PI-detection-detailed-iob [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/w50j-tf54
BibTeX Additional ways to cite the dataset.
En modell baserad på KB/bert-base-swedish-cased tränad med syfte att upptäcka personliga uppgifter, särskilt i studentuppsatser.

A model based on KB/bert-base-swedish-cased trained to detect personal information, especially in learner essays. This variant differentiates between 38 detailed categories and differentiates between beginning and inside.

Caveats

This model does not guarantee the detection of all personal information in the text. Never use it without human supervision (human-in-the-loop). The model performs noticeably worse on texts that are not student essays.

Intended uses

Personal Information detection

Download

File Size Modified Licence
KB-bert-swedish_PI-detection-detailed-iob
The model is hosted on HuggingFace and can be easily accessed e.g. using their Python library.
146.5 KB GPL-3.0

Type

  • Model

Language

Swedish

Size

Keywords

  • PI detection
  • BERT

Creators

  • Szawerna, Maria Irena

Created

2024-07-05

Contact

sb-info@svenska.gu.se