Hoppa till huvudinnehåll
Språkbanken Text är en avdelning inom Språkbanken.

Ordklasstaggningsmodell: Stanza

Datacitering Information

Språkbanken Text (2020). Ordklasstaggningsmodell: Stanza (uppdaterad: 2020-12-09). [Data set]. Språkbanken Text. https://doi.org/10.23695/ygw3-gf17
BibTeX Ytterligare sätt att citera datamängden.
Förtränade modeller för ordklasstaggning.

Models

Stanza is currently the default annotation tool used by Sparv. We provide two Stanza POS-tagging models.

stanza_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set Exact match POS MSD
Talbanken_SBX_test 0.973 0.983 0.988
SIC2 0.918 0.932 0.957

Read more about the evaluation here.

stanza_full is trained on SUC3 + Talbanken_SBX_test + SIC2 with Talbanken_SBX_dev as dev set. We cannot evaluate the performance of this model, but we expect it to perform better than stanza_eval, or at least not worse. This is the model used by Sparv.

We updated the "pretrain" file in spring 2025. This was a minor format change.

Using the models on your own

Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Follow the instructions provided by Stanza

Ladda ned

Fil Storlek Modifierad Licens
19.94 MB 2020-12-09 CC BY 4.0
20.19 MB 2020-12-09 CC BY 4.0
91.7 MB 2025-02-20 CC BY 4.0

Typ

  • Modell

Språk

svenska

Storlek

Updaterad

2020-12-09

Kontakt

sb-info@svenska.gu.se