Skip to main content
Språkbanken Text is a part of Språkbanken.

POS-tagging model: Stanza

Data citation Information

Språkbanken Text (2020). POS-tagging model: Stanza (updated: 2020-12-09). [Data set]. Språkbanken Text. https://doi.org/10.23695/ygw3-gf17
BibTeX Additional ways to cite the dataset.
Pretrained models for POS-tagging.

Models

Stanza is currently the default annotation tool used by Sparv. We provide two Stanza POS-tagging models.

stanza_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set Exact match POS MSD
Talbanken_SBX_test 0.973 0.983 0.988
SIC2 0.918 0.932 0.957

Read more about the evaluation here.

stanza_full is trained on SUC3 + Talbanken_SBX_test + SIC2 with Talbanken_SBX_dev as dev set. We cannot evaluate the performance of this model, but we expect it to perform better than stanza_eval, or at least not worse. This is the model used by Sparv.

We updated the "pretrain" file in spring 2025. This was a minor format change.

Using the models on your own

Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Follow the instructions provided by Stanza

Download

File Size Modified Licence
19.94 MB 2020-12-09 CC BY 4.0
20.19 MB 2020-12-09 CC BY 4.0
91.7 MB 2025-02-20 CC BY 4.0

Type

  • Model

Language

Swedish

Size

Updated

2020-12-09

Contact

sb-info@svenska.gu.se