Research

Research

Språkbanken's research unit develops state-of-the-art language technology and pursues theoretical and practical aims within different research areas. Our research focuses both on language technology itself (creating comprehensive, high-quality resources that are needed to develop tools and algorithms) and on questions from other disciplines.

All (56) Collections (3) Current (11) Finished (42)

Name or description

Research interests

AI-driven language biomarkers for early detection and progression of cognitive decline

This project investigates speech and language as early markers of cognitive decline by integrating linguistic analysis with neuropsychological tests and biomarkers. Using large-scale, clinically validated datasets and state-of-the-art AI methods, it aims to identify, combine, and track linguistic, cognitive, and behavioral indicators to improve early diagnosis, monitoring, and prognosis of dementia.

Dimitrios Kokkinakis
Charalambos Themistocleous
Lina Rydén
Johan Skoog

cognitive decline
linguistic biomarkers
language disorders
neuropsychological tests

Cassandra: Explaining and predicting short-term language change in Contemporary Swedish

Linguists often try to explain language change, but all our explanations are necessarily post hoc and thus difficult to evaluate. What happens if we turn to the future instead of the past and try to predict language change?

Aleksandrs (Sasha) Berdicevskis
Yvonne Adesam
Evie Coussé
Nina Tahmasebi

linguistics
computational linguistics
language change
language evolution
sociolinguistics

De förslavades röster: korpus-baserad diskursanalys av historiska slav-narrativ

Historisk levnadsstandard är ett centralt forskningsfält inom ämnet ekonomisk historia. I detta forskningsfält har förslavade individer ofta osynliggjorts på grund av att de saknas i de källor som vanligen används. Detta projekt ska studera hur förslavade människor i 1800-talets USA beskrev sin levnadsstandard, baserat på en stor mängd ”slavnarrativ” – självbiografiska texter av, eller intervjuer med, före detta förslavade människor. Tidigare forskning uppskattar att det finns cirka 5 000 sådana berättelser från USA i olika arkivsamlingar. Berättelserna kommer att samlas i en annoterad textkorpus. Den annoterade textkorpusen kommer så småningom att göras fritt tillgänglig för vidare forskning, för att kunna användas för såväl historisk som språkvetenskaplig eller annan forskning. När korpusen har tagit form, kommer vi att studera hur dessa individer beskrev både sin materiella levnadsstandard (i form av ägande av materiella saker) och sina icke-materiella livsvillkor (med fokus på det trauma som slaveriets våld och tvång innebar). Textanalysen kommer att omfatta både datorstödd analys och forskardriven, korpusbaserad diskursanalys av berättelserna. Detta tillvägagångssätt möjliggör en större helhetsbild av de förslavades många olika röster än vad tidigare (huvudsakligen anekdotisk) forskning på området har kunnat ge. Vi kommer att analysera om dessa två aspekter varierade med avseende på sociala, kulturella och geografiska faktorer, samt om de förändrades över tid – framförallt i och med att individerna befriade sig själva eller blev befriade från slaveriet. Projektet genomförs av Klas Rönnbäck (ekonomisk historia), Irene Elmerot (korpuslingvistisk diskursanalys) och Leif-Jöran Olsson (språkteknologi), i samarbete med Morgan State University i USA och en bred internationell referensgrupp med framstående forskare från olika forskningsämnen.

Leif-Jöran Olsson
Klas Rönnbäck
Irene Elmerot

Economic History
digital humanities
Corpus-Assisted Discourse Studies
computational linguistics
historiskt material
kulturarv

HUMINFRA

HUMINFRA är en ny distribuerad, nationell infrastruktur för forskning inom humaniora, konst och samhällsvetenskap.

Gerlof Bouma
Dana Dannélls
Markus Forsberg
Dimitrios Kokkinakis
Elena Volodina

Linguistic networks: Connecting constructions within and between languages

This project uses Construction Grammar to develop a linguistic network that (a) accounts for Swedish grammatical constructions and (b) connects them to constructions in other languages.

Benjamin Lyngfelt
Maia Andreasson
Kristian Blensenius
Linnea Bäckström
Steffen Höder
Peter Ljunglöf
Jonatan Uppström

linguistic typology

Mapping Social Stratification in the Making of Modern Argentina, 1850–1900: a Micro-Level Analysis

Increased social stratification is associated with societal problems, most clearly in the Global South. Research on the origins of high social stratification has therefore grown, but empirical and methodological challenges complicate the work. Our project aims to investigate the origins of the high social stratification in Argentina (1850–1900), once a wealthy country but now afflicted by crises and inequality – commonly referred to as the “Argentine paradox.” We plan to use advanced OCR technology to digitize a rich body of source material at the individual level. By examining measures of occupational structure, literacy, and social mobility, we will provide new insights into the historical origins of the “Argentine paradox.”

Stefania Galli
Dana Dannélls
Juliá Ciarelli, Juan Pablo

digital humanities
historiskt material
multilingual
Economic History

Grandma Karl

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the presence of personal and sensitive information, e.g names, political opinions. GDPR suggests pseudonymization as a solution, but we need to learn more about it before adopting it for manipulation of research data.

Elena Volodina
Simon Dobnik
Xuan-Son Vu
Therese Lindström Tiedemann
Maria Irena Szawerna
Lisa Södergård

pseudonymization
research data
språkteknologi
allmän lingvistik
svenska som andraspråk
pseudonymisering
dataintegritet
forskningsdata

Svenska Akademiens samtidsordböcker

Inom ramarna för projektet förvaltas och vidareutvecklas Svenska Akademiens lexikala databas (Salex). Vidare bedrivs arbete med Svenska Akademiens båda samtidsordböcker Svenska Akademiens ordlista (SAOL) och Svensk ordbok utgiven av Svenska Akademien (SO). Arbetet sker på uppdrag av och i samarbete med Svenska Akademien.

Kristian Blensenius
Markus Forsberg
Louise Holmer
Hans Landqvist
Stellan Petersson
Emma Sköldberg
Jonatan Uppström
Ann Lillieström

Swedish Constructicon

This is an umbrella for the work at the institution creating a swedish constructicon.

Benjamin Lyngfelt
Maia Andreasson
Kristian Blensenius
Linnea Bäckström
Steffen Höder
Peter Ljunglöf
Jonatan Uppström

A System Architecture for ICALL

ICALL - Intelligent Computer Assisted Language Learning. The aim of the project is to develop an open-source system architecture for supporting ICALL, i.e. CALL that reuses NLP tools and NL resources, with emphasis on the Nordic languages.

Lars Borin
Elena Volodina
Hrafn Loftsson
Birna Arnbjörnsdóttir

ICALL
NLP4CALL
Swedish as a second language
Second language infrastructure
second language learning

Akademiska ordlistor

Den svenska akademiska ordlistan har utvecklats av forskare med anknytning till forskningsområdena språkteknologi, lexikologi/lexikografi och svenska som andraspråk.

Lexicography
second language learning
NLP4CALL

Argumentation analysis and technology

A joint project between Språkbanken Text, FLoV and CLASP, with the purpose of creating and exploring methods for argumentation technology.

Anna Lindahl
Stian Rødven-Eide
Axel Almquist
Bill Noble
Christine Howes
Ellen Breitholtz
Vladislav Maraev
Martin Kaså

linguistics
computational linguistics
argumentation
text
dialogue
pragmatics
semantics
politics
forum
online discussion
argumentation technology
argument mining

Catta

Developing tools for systematic studies of text classification

Niklas Zechner

Catta

Change is Key!

This program has two main aims, firstly to develop corpus-based methods for detecting semantic change (over time) and variation (across social groups and media). This will create general tools for the study and detection of language change at large-scale and directly benefit historical linguistics and lexicography. Secondly, we will collaborate with researchers from social sciences, gender studies, and literature to answer their research questions. We will develop tools, evaluation data, and research methodology for their specific needs.

Nina Tahmasebi
Simon Hengchen
Haim Dubossarsky
Dominik Schlechtweg
Shafqat Virk
Emma Sköldberg
Mats Malm
Mia Liinason
Sarah Valdez
Dirk Geeraerts
Stefano de Pascale

lexical-semantic-change

Computational SLA

CompSLA (Computational Second Language Acquisition) is a cooperation whose primary aim is to encourage the development of datasets and tools related to L2 (second language) learning for lower-resourced languages.

Elena Volodina
David Alfter
Arianna Masciolini
Yousuf Ali Mohammed
Ricardo Muñoz Sánchez
Maria Irena Szawerna

CONPLISIT

Consumption patterns and life-style in Swedish literature – novels 1830-1860

Lars Borin
Markus Forsberg
Christer Ahlberger

Corpus-driven induction of linguistic knowledge

We will apply corpus-driven methods as a way to expand and correct existing hand-crafted linguistic resources, and conversely we will use hand-crafted resources as additional sources of supervision when learning meaning representations automatically.

Richard Johansson
Luis Nieto Piña

Culture-Sensitive Assessment and Adjustment of Large Language Models – Adaptation to the Nordic-Baltic Societies

This project aims at optimizing current and future LLMs towards a more responsible coverage and functionality that better encompass the linguistic and cultural diversity of the Nordic and Baltic regions, and that are thereby much more inclusive in relation to the societies in which they will be used.

Dana Dannélls
Kristian Blensenius
Bolette Standford Pedersen
Sanni Nimb
Ali Basirat
Sanni Nimb
Lilja Øvrelid
Erik Velldal
Inguna Skadina
Normunds Grūzītis
Iben Nyholm Debess
Barbara Scalvini

multilingual
computational linguistics

Digital areal linguistics

The goal of this project is to create a database of comparable lexical items in a number of representative languages spoken in the Himalayan region in India and to use this database for investigating the Himalayas as a linguistic area.

Lars Borin
Taraka Rama
Anju Saxena
Bernard Comrie

language technology
areal linguistics
linguistic typology
computational linguistics
Lexicography

Digital LSI

Digitization of Grierson’s Linguistic Survey of India (LSI; 1903-1927)

Lars Borin
Shafqat Virk
Anju Saxena
Bernard Comrie

DReaM: The Dictionary/Grammar Reading Machine

A Multilingual Annotated Corpus of Grammars for the World's Languages

Shafqat Virk
Markus Forsberg
Harald Hammarström

A free cloud service for OCR

The project aims to create a prototype Optical Character Recognition (OCR) web service for processing old Swedish texts that are printed in a blackletter (fraktur) or roman typeface, using one of two open source OCR engines. Our ultimate goal is to provide a service for libraries, museums and archives to upload any digitized document and retrieve an OCRed text with high quality, independent on the quality of the print.

Dana Dannélls
Lars Borin
Gerlof Bouma

OCR
historiskt material

European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect)

EnetCollect (the European Network for Combining Language Learning with Crowdsourcing Techniques) was a large network project funded as a COST Action that ran from March 2017 till September 2021. It involved stakeholders from more than 40 different countries and has been the catalyst for numerous collaborative research efforts, achievements and publications.

Elena Volodina

Evaluation and refinement of an enhanced OCR-process for mass digitisation

The purpose of this project is to fine-tune and evaluate a test platform for OCR-production that was developed by Kungliga biblioteket (KB) in cooperation with the Norwegian software company Zissor in 2017.

Dana Dannélls
Lars Björk
Torsten Johansson

OCR
digital humanities
historiskt material
kulturarv
language technology

Funktionella somatiska symtom

Tolkning och förståelse av funktionella symtom i primärvården

Dimitrios Kokkinakis
Eva Lidén
Elisabeth Björk Brämberg
Sylvia Määttä
Staffan Svensson

Hot och hat mot journalister

Under vilka förutsättningar försvagas yttrandefrihet och demokrati av hat och hot mot journalister online?

Peter Ljunglöf
Oscar Björkenfeldt
Måns Svensson

ICALL - Intelligent Computer-Assisted Language Learning

Språkbanken is working on integration of available language technology for Swedish into the area of language learning, so-called ICALL.

Elena Volodina
Arianna Masciolini
Ricardo Muñoz Sánchez
Celine Leuzinger
Yousuf Ali Mohammed
Arild Matsson
Maria Irena Szawerna
David Alfter
Ildikó Pilán

Jubileumsarkivet: Inledande inventering, digitalisering och analys av Göteborgs universitetsbiblioteks samling om stadens 300-årsjubileum 1923

Det ettåriga pilotprojektet inventerar och analyserar maskinellt en unik samling av pressmaterial från 1923 års jubileumsutställning i Göteborg. Projektet är ett samarbete mellan GPS400, Språkbanken Text, Universitetsbiblioteket och Riksarkivet.

Lars Borin
Dana Dannélls
Markus Forsberg

Kelly - KEywords for Language Learning for Young and adults alike

EU project Kelly - lists of useful vocabulary for language learners in 9 languages

Elena Volodina
Sofie Johansson Kokkinakis

ICALL
NLP4CALL
second language learning
CEFR profiles
Swedish as a second language

Koala – Korp's linguistic annotations

Improved annotations for the Korp corpus infrastructure.

Yvonne Adesam
Lars Borin
Gerlof Bouma
Markus Forsberg
Richard Johansson

L2 profiles for Swedish

The L2 P project studies lexical and grammatical competence in second language learner Swedish through two corpora: coursebooks and essays.

Elena Volodina
Therese Lindström Tiedemann
Yousuf Ali Mohammed
David Alfter

ICALL
NLP4CALL
språklig komplexitet
SLA
second language learning
CEFR profiles

Lexical semantic change

At Språkbanken Text, we have an ongoing research theme that relates to computational modeling of word meaning called (computational) lexical semantic change. In this theme, we have both research projects and a larger research program.

Nina Tahmasebi

lexical-semantic-change
lexikal semantik

Market Language

The market Language primarily is funded by MAW in which we look at the changing concepts around “the market”. They have transitioned from implying a concrete physical market to increasingly abstract markets like Europe-wide iron markets, as well as marriage and dating markets. They have also increasingly become actors in our lives, “the market reacted badly to the new corona restrictions”. We will complement the conceptual historians in-depth analyses with computational models of change. This project ranges 2022-2025.

Henrik Björck
Shafqat Virk
Claes Ohlsson

MAÞiR

Developing automatic annotation tools for Old Swedish texts.

Gerlof Bouma
Yvonne Adesam

MedEval

En svensk medicinsk testkollektion

Karin Friberg Heppin
Anni Järvelin

META-NORD

The META-NORD project aims to establish an open linguistic infrastructure in the Baltic and Nordic countries.

Lars Borin
Markus Forsberg

Milage: Multilingual Automated Grammar Extraction

In this project we want to utilize a useful collection of 9000 digitized grammatical descriptions covering over a thousand languages in order to significantly expand the ability to make major language comparisons.

Shafqat Virk
Markus Forsberg
Harald Hammarström

MOLTO - Multilingual Online Translation

MOLTO's goal is to develop a set of tools for translating texts between multiple languages in real time with high quality. Languages are separate modules in the tool and can be varied; prototypes covering a majority of the EU's 23 official languages will be built.

Dana Dannélls

Generation
translation
multilingual
cultural heritage
GF

PINCORE

Person-Centred Information and Communication for patients undergoing Colo-Rectal Cancer Surgery

Dimitrios Kokkinakis

Rumour mining

The aim of the project is to investigate the role and importance of rumouring for the vaccination skepticism growing on the internet, and how it can be understood as an expression of civic engagement in the present digital times entailing crucial transformations for everyday civic culture.

Dimitrios Kokkinakis
Lars Borin
Mia-Marie Hammarlin
Fredrik Miegel

digital humanities

Språkbanken’s lexical research infrastructure

The aim of this project is to build a versatile lexical infrastructure for Swedish language technology (LT).

Lars Borin
Markus Forsberg
Dana Dannélls
Niklas Zechner
Jonatan Uppström
Ann Lillieström
Maria Öhrman
Nick Smallbone

Lexicography
lexikon
lexikal semantik
integrerad lexikonresurs

Linguistic and extra-linguistic parameters for early detection of cognitive impairment

With an increasing aging pyramid the number of people with cognitive dysfunctions, such as various types of dementia, has grown at a high rate. However, years before the clinical onset symptoms of dementia, patients exhibit serious deficits in their oral and written communication and visual short-term memory, signs that can be measured and serve as a complement to medical evidence to discriminate the performance of healthy (elderly) controls or even predict poor cognitive health in late life.

Dimitrios Kokkinakis
Kristina Lundholm Fors
Malin Antonsson
Marie Eckerström
Charalambos Themistocleous
Kathleen C Fraser
Arto Nordlund

language disorders

Language Technology Linked Open Data at Språkbanken

This project aims to make lexical resources for language technology available in the form of linked open data.

Lars Borin
Dana Dannélls
Markus Forsberg

LOD
Semantiska webben
länkad data

SuperLim 2.0

The goal of this project is to finalize the evaluation framework SuperLim by contributing training data for the current collection of test sets, a reference implementation (baseline), and a standardized web-based test environment for comparison between models and publication of results (leaderboard).

Markus Forsberg
Aleksandrs (Sasha) Berdicevskis
Gerlof Bouma
Felix Morger
Anna Lindahl
Dana Dannélls
Magnus Sahlgren
Love Börjeson
Francisca Hoyer
Elena Volodina

evaluation
bias
language models

SuperLim 2.0 logotype

SwedishGlue: a benchmark suite for language models

The purpose of this project is to create high-quality test sets for att enable all actors within Swedish NLP to evaluate and compare language models.

Markus Forsberg
Yvonne Adesam
Aleksandrs (Sasha) Berdicevskis
Dana Dannélls
Felix Morger
Gerlof Bouma
Magnus Sahlgren
Love Börjeson
Johanna Bergman

evaluation
language models
bias

Gold reserve

Swedish FrameNet++ (SweFN++)

The goal of the SweFN++ project is to build an open-content -- i.e., freely available and modifiable -- integrated lexical resource for Swedish -- so far lacking -- to be used as a basic infrastructural component in Swedish language technology (LT) research and in the development of LT applications for Swedish.

Lars Borin
Dana Dannélls
Dimitrios Kokkinakis
Markus Forsberg
Jonatan Uppström
Leif-Jöran Olsson
Malin Ahlberg
Maria Toporowska Gronostaj
Karin Friberg Heppin
Richard Johansson

lexikon
lexikal semantik
modern
integrerad lexikonresurs
framenet

Svenskt språkdatalabb

Målet med Svenskt Språkdatalabb är att skapa en nationell kunskapsnod inom språkteknologi, och ta fram svenska referensdatamängder för NLP som sedan tillgängliggörs med öppen access i AI Innovation of Swedens datafabrik.

Peter Ljunglöf
Aleksandrs (Sasha) Berdicevskis

SweCcn -- a Swedish constructicon

The aim of this project is to develop a Swedish so-called constructicon, a database of Swedish constructions.

Lars Borin
Dana Dannélls
Markus Forsberg
Leif-Jöran Olsson
Jonatan Uppström
Benjamin Lyngfelt
Kristian Blensenius
Linnea Bäckström
Anna Ehrlemark
Per Malm
Joel Olofsson
Julia Prentice
Rudolf Rydstedt
Emma Sköldberg
Sofia Tingsell

Lexicography
integrerad lexikonresurs
constructicon

SweLL - Infrastructure for L2 Swedish

The main focus of this project is on producing data, tools and workflows for research an and evaluation of L2 Swedish.

Elena Volodina
Yousuf Ali Mohammed
Arild Matsson
Mats Wirén
Beáta Megyesi
Julia Prentice
Gunlög Sundberg
Lena Granstedt
Monica Reichenberg
Lisa Rudebeck

Second language infrastructure
Swedish as a second language
essay annotation
correction annotation
pseudonymization

Text classification of medical publications about person-centred care

Computerised text classification is used to help identify documents on the topic of patient-centred care.

Niklas Zechner

The rise of complex verb constructions in Germanic

The project studies the rise of complex verb constructions in Germanic.

Evie Coussé
Gerlof Bouma
Nicoline van der Sijs
Dirk-Jan de Kooter
Trude Dijkstra

Towards a Knowledge-Based Culturomics

The main aim of this research program is to advance the state of the art in language technology resources and methods for semantic processing of Swedish text, in order to provide researchers and others with more sophisticated tools for working with the information contained in large volumes of digitized text, e.g., by being able to correlate and compare the content of texts and text passages on a large scale.

Jacobo Rouces
Lars Borin
Nina Tahmasebi
Dimitrios Kokkinakis
Pierre Nugues
Richard Johansson
Dubhashi Devdatt

culturomics

Towards Computational Lexical Semantic Change Detection

In this project, we aim to find automatic, corpus-based methods for detecting semantic change and lexical replacement for Swedish and English.

Nina Tahmasebi
Simon Hengchen
Richard Johansson
Maria Koptjevskaja Tamm

Tvåhundra år av verbkonstruktioner i nederländska

Hur skapas ny grammatik? Språkhistorisk forskning har främst studerat hur nya grammatiska ord och konstruktioner blir till. Med vårt projekt vill vi ta ett större grep och utforskar vi hur nya grammatiska konstruktioner tillsammans skapar ett grammatiskt nätverk. Vi fokuserar mer specifikt på hur verbkonstruktioner i nederländska gradvis bygger ett större nätverk under de senaste tvåhundra åren.

Evie Coussé
Gerlof Bouma

Variation and contact in medieval personal names

This project investigates which strategies are employed when North Germanic personal names are adapted to medieval German, French and Latin in multilingual contexts. It aims at surveying the variation patterns evident in the adaptations and seeks to develop a theoretical model that explains why different strategies were used.

Michelle Waldispühl
Lars Borin
Dana Dannélls
Jonatan Uppström

språk
culture
historiskt material

Xhosa Corpus

Språkbanken Text collaborates with the Department of Philosophy, Linguistics and Theory of Science to create an annotated corpus of Xhosa, an underresourced Bantu language of South Africa (also known as isiXhosa and Xosa).

Anne Schumacher Olsson
Martin Hammarstedt
Aleksandrs (Sasha) Berdicevskis
Markus Forsberg
Eva-Marie Karin Bloom Ström
Aron Einar Zahran
Onelisa Slater

linguistic typology
field linguistics
African languages
Bantu languages
glossing

Proportion of the South African population that speaks isiXhosa as their first language, according to Census 2011 at electoral ward level

Page manager: sb-webb