Ordklasstaggningsmodell: Stanza

Datacitering

Språkbanken Text (2020). Ordklasstaggningsmodell: Stanza (uppdaterad: 2020-12-09). [Data set]. Språkbanken Text. https://doi.org/10.23695/ygw3-gf17

Ytterligare sätt att citera datamängden.

Förtränade modeller för ordklasstaggning.

Models

Stanza is currently the default annotation tool used by Sparv. We provide two Stanza POS-tagging models.

stanza_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set	Exact match	POS	MSD
Talbanken_SBX_test	0.973	0.983	0.988
SIC2	0.918	0.932	0.957

Using the models on your own

Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Follow the instructions provided by Stanza

Ladda ned

Fil	Storlek	Modifierad	Licens
morph_stanza_eval.zip	19.94 MB	2020-12-09	CC BY 4.0
morph_stanza_full2.zip	20.19 MB	2020-12-09	CC BY 4.0
stanza_pretrain.zip	91.7 MB	2025-02-20	CC BY 4.0

Ordklasstaggningsmodell: Stanza

Datacitering

Models

Using the models on your own

Ladda ned

Typ

Språk

Storlek

Updaterad

Kontakt

DOI