We provide two models.
flair_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.
|Test set||Exact match||POS||MSD|
Read more about the evaluation here.
flair_full is trained on SUC3 + Talbanken_SBX_test + SIC2 with Talbanken_SBX_dev as dev set. We cannot evaluate the performance of this model, but we expect it to perform better than
flair_eval, or at least not worse.
Tagging and training
Install Flair and the necessary dependencies.
Download our scripts from this repository.
If necessary, change the path to your data and the names on training, dev and test sets in the scripts.
The scripts use a tab-separated two-column format: token, POS. Use
conllu_to_tab.rb to convert CONLL(U) to the two-column format (install Ruby 1.9+ and run ruby conllu_to_tab 2 n, where n is the number of the column you want to use (if you are converting our CONLLU files, use 4)).
Run the scripts with Python3; GPU is strongly recommended (training is extremely slow on CPU).
tag_flair_p.py to tag a corpus using a pretrained model. By default, the
flair_full model will be used.
The output corpus will be created as a tab-separated three-column file: token, POS, confidence score.
Training your own models
Replace user_model with the name for your model. By default, Flair's own embeddings will be used (our experiments show that they provide the best results), but you may use other embeddings instead (or combine them).