POS-tagging model: Stanza

Data citation

Språkbanken (2020). POS-tagging model: Stanza (updated: 2020-12-09). [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/ygw3-gf17

Additional ways to cite the dataset.

Pretrained models for POS-tagging.

Models

Stanza is currently the default annotation tool used by Sparv. We provide two Stanza POS-tagging models.

stanza_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set	Exact match	POS	MSD
Talbanken_SBX_test	0.973	0.983	0.988
SIC2	0.918	0.932	0.957

Using the models on your own

Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Follow the instructions provided by Stanza

Download

File	Size	Modified	Licence
morph_stanza_eval.zip	19.94 MB	2020-12-09	CC-BY-4.0
morph_stanza_full2.zip	20.19 MB	2020-12-09	CC-BY-4.0
stanza_pretrain.zip	91.7 MB	2025-02-20	CC-BY-4.0

Data citation

Models

Using the models on your own

Download

Type

Language

Size

Updated

Contact

DOI