POS-tagging model: Flair

Data citation

Språkbanken Text (2020). POS-tagging model: Flair (updated: 2020-06-18). [Data set]. Språkbanken Text. https://doi.org/10.23695/4a7m-mk50

Additional ways to cite the dataset.

Pretrained models for POS-tagging.

Models

We provide two models.

flair_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.

Test set	Exact match	POS	MSD
Talbanken_SBX_test	0.978	0.987	0.990
SIC2	0.926	0.940	0.964

Tagging and training

Install Flair and the necessary dependencies. Download our scripts from this repository. If necessary, change the path to your data and the names on training, dev and test sets in the scripts. The scripts use a tab-separated two-column format: token, POS. Use conllu_to_tab.rb to convert CONLL(U) to the two-column format (install Ruby 1.9+ and run ruby conllu_to_tab 2 n, where n is the number of the column you want to use (if you are converting our CONLLU files, use 4)). Run the scripts with Python3; GPU is strongly recommended (training is extremely slow on CPU).

Tagging

Use tag_flair_p.py to tag a corpus using a pretrained model. By default, the flair_full model will be used. The output corpus will be created as a tab-separated three-column file: token, POS, confidence score.

Training your own models

Use train_flair_p.py. Replace user_model with the name for your model. By default, Flair's own embeddings will be used (our experiments show that they provide the best results), but you may use other embeddings instead (or combine them).

Download

File	Size	Modified	Licence
flair_eval.zip	1.37 GB	2020-06-18	CC BY 4.0
flair_full.zip	1.37 GB	2020-06-18	CC BY 4.0

Data citation

Models

Tagging and training

Tagging

Training your own models

Download

Type

Language

Size

Updated

Contact

DOI