A new blogpost is out.
(This blog is based on a joint research and publication in collaboration with David Alfter, Therese Lindström Tiedemann, Maisa Laurialla and Daniela Piipponen) At our department, and outside, we are used to search Korp corpora using the linguistic categories available there. Some of us know that these linguistic categories come as a result of automatic annotation by the Sparv-pipeline. The pipeline automatically splits raw text into tokens, sentences, finds a base form to each of the running (inflected) words, assigns word classes, …