Ordklasstaggningsmodell: Flair

Datacitering

Språkbanken Text (2020). Ordklasstaggningsmodell: Flair (uppdaterad: 2020-06-18). [Data set]. Språkbanken Text. https://doi.org/10.23695/4a7m-mk50

Ytterligare sätt att citera datamängden.

Förtränade modeller för ordklasstaggning.

Models

We provide two models.

flair_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set	Exact match	POS	MSD
Talbanken_SBX_test	0.978	0.987	0.990
SIC2	0.926	0.940	0.964

Tagging and training

Install Flair and the necessary dependencies. Download our scripts from this repository. If necessary, change the path to your data and the names on training, dev and test sets in the scripts. The scripts use a tab-separated two-column format: token, POS. Use conllu_to_tab.rb to convert CONLL(U) to the two-column format (install Ruby 1.9+ and run ruby conllu_to_tab 2 n, where n is the number of the column you want to use (if you are converting our CONLLU files, use 4)). Run the scripts with Python3; GPU is strongly recommended (training is extremely slow on CPU).

Tagging

Use tag_flair_p.py to tag a corpus using a pretrained model. By default, the flair_full model will be used. The output corpus will be created as a tab-separated three-column file: token, POS, confidence score.

Training your own models

Use train_flair_p.py. Replace user_model with the name for your model. By default, Flair's own embeddings will be used (our experiments show that they provide the best results), but you may use other embeddings instead (or combine them).

Ladda ned

Fil	Storlek	Modifierad	Licens
flair_eval.zip	1.37 GB	2020-06-18	CC BY 4.0
flair_full.zip	1.37 GB	2020-06-18	CC BY 4.0

Ordklasstaggningsmodell: Flair

Datacitering

Models

Tagging and training

Tagging

Training your own models

Ladda ned

Typ

Språk

Storlek

Updaterad

Kontakt

DOI