Valmanifest 2022 - En språkteknologisk snabbanalys

En dryg vecka innan valet var äntligen alla riksdagspartiernas valmanifest på plats. Vi tänkte det vore intressant att göra några enkla analyser av texten och se vad vi kan ta reda på med språkteknologiska verktyg.

Det första vi gjorde var att köra dokumenten genom Språkbankens annotationsverktyg Sparv. Detta gav oss bland annat läsbarhetsvärden och attitydanalys. Genomsnittslängden på manifesten är ca 6.000 ord, men varierar en hel del, från Kristdemokraternas 1.623 ord till Moderaternas 11.139.

The SwedishGLUE project

Artificial intelligence system dealing with (human) natural language rely on language models, predictions of which words occur together. To better understand how such models work -- and where they fail -- when applied to Swedish texts we need Swedish test data. A collection of test data addressing various aspects of understanding and generating text allows us to evaluate and compare models.

Argumentation Mining

What if you could find all arguments in a text without having to read it? Or, what if you could search a database for a controversial topic and immediately get arguments for and against it, gathered from text all around the internet? Or, imagine when writing an essay you would automatically get an estimation of how persuasive your arguments are.

The Kubhist corpus of Swedish newspapers

Among the flurry of Språkbanken’s historical resources we find the Kubhist corpus – a diachronic collection of historical newspaper texts – in two versions: Kubhist 1 spanning the time period of 1750–1950, and Kubhist 2 spanning the time period of 1645–1926. Historical corpora of this kind, especially when available in searchable format, are valuable sources of information for learning about our history, language and culture.

