Dependensparsningsmodell: Stanza

Datacitering

Språkbanken Text (2020). Dependensparsningsmodell: Stanza (uppdaterad: 2020-12-09). [Data set]. Språkbanken Text. https://doi.org/10.23695/wh3y-2y24

Ytterligare sätt att citera datamängden.

Förtränade modeller för dependensparsning.

Models

Stanza is currently the default annotation tool used by Sparv. We provide two models that enable dependency parsing of Swedish (in the Mamba-Dep format, the format of TalbankenSBX).

stanza_eval is trained on Talbanken_SBX_train with as Talbanken_SBX_dev as dev set and evaluated using Talbanken_SBX_test. The evaluation results are reported in the table below. The LAS (when trained with gold POS and MSD tags) is 84.48. We used the Word2Vec embeddings trained on the CONLL17 corpus (using Word2Vec trained on a Göteborgs-Posten corpus yields a very similar result of 84.43, see more about embeddings here).

stanza_full is trained on Talbanken_SBX_train + Talbanken_SBX_dev with Talbanken_SBX_test as dev set. We cannot evaluate the performance of this model, but we expect it to perform better than stanza_eval, or at least not worse.

We updated the "pretrain" file in spring 2025. This was a minor format change.

Using the models on your own

Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Place the two .pt files in stanza/saved_models/depparse. Run bash scripts/parse.sh UD_Swedish-Talbanken to parse a test set using a pretrained model. The output file will be created in the stanza/corpora folder. If you use other treebank name than UD_Swedish-Talbanken, you would have to rename the model files. The script assumes that the POS tags are already present in the test set.

Training your own models

Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Follow the instructions provided by Stanza. If you need a pretrained part-of-speech model, you will find it here.

Ladda ned

Fil	Storlek	Modifierad	Licens
synt_stanza_eval.zip	99.05 MB	2020-12-09	CC-BY-4.0
synt_stanza_full2.zip	99.17 MB	2020-12-09	CC-BY-4.0
stanza_pretrain.zip	91.7 MB	2025-02-20	CC-BY-4.0

Dependensparsningsmodell: Stanza

Datacitering

Models

Using the models on your own

Training your own models

Ladda ned

Typ

Språk

Storlek

Uppdaterad

Kontakt

DOI