The SwedishGLUE project

Artificial intelligence system dealing with (human) natural language rely on language models, predictions of which words occur together. To better understand how such models work — and where they fail — when applied to Swedish texts we need Swedish test data. A collection of test data addressing various aspects of understanding and generating text allows us to evaluate and compare models. During the autumn of 2020 we have started working on developing evaluation data for Swedish language models at Språkbanken Text. This …

How reliable is sense disambiguation in texts by native and non-native speakers?

(This blog is based on a joint research and publication in collaboration with David Alfter, Therese Lindström Tiedemann, Maisa Lauriala and Daniela Piipponen) At our department, and outside, we are used to search Korp corpora using the linguistic categories available there. Some of us know that these linguistic categories come as a result of automatic annotation by the Sparv-pipeline. The pipeline automatically splits raw text into tokens, sentences, finds a base form to each of the running (inflected) words, assigns word classes, …