PGV-PII

Standard reference

Maria Irena Szawerna, Jacob Lee Suchardt (2026): Fill-in-the-Blanks: Automatic Generation and Evaluation of Language Models' Pseudonyms for English and Swedish Texts, in Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026),11–16 May 2026, Palma, Mallorca, Spain, pages 1155-1169

Data citation

Szawerna, Maria Irena, & Suchardt, Jacob Lee. PGV-PII [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/qcqg-3613

Additional ways to cite the dataset.

A small collection of 10 pairs of parallel texts in Swedish and English annotated with personal information categories.

This is a small corpus of 10 pairs of texts in Swedish and English annotated with personal information categories. The annotation largely follows that of the TAB corpus (https://aclanthology.org/2022.cl-4.19/). The twenty texts in total were sourced from the Parallel Global Voices corpus (https://nlp.ilsp.gr/pgv/, CC BY 4.0) and manually annotated. That corpus, in turn, had collected the texts from the Global Voices websites (https://globalvoices.org/, CC BY 3.0).

Annotation

The texts are annotated with personal information categories following the TAB guidelines (https://aclanthology.org/2022.cl-4.19/)

Intended uses

This corpus can be used to test personal information detection and labeling or generation of pseudonyms.

Download

File	Size	Modified	Licence
gv-pii.bz2 corpus (jsonl)	49.75 KB	2026-02-27	CC-BY-4.0

Standard reference

Data citation

Annotation

Intended uses

Download

Type

Language

Size

Keywords

Creators

Created

Contact

DOI