Skip to main content

Research

Språkbanken's research unit develops state-of-the-art language technology and pursues theoretical and practical aims within different research areas. Our research focuses both on language technology itself (creating comprehensive, high-quality resources that are needed to develop tools and algorithms) and on questions from other disciplines.
Umbrella project

Cassandra: Explaining and predicting short-term language change in Contemporary Swedish

Linguists often try to explain language change, but all our explanations are necessarily post hoc and thus difficult to evaluate. What happens if we turn to the future instead of the past and try to predict language change?
  • Aleksandrs (Sasha) Berdicevskis
  • Yvonne Adesam
  • Evie Coussé
  • Nina Tahmasebi
  • linguistics
  • computational linguistics
  • language change
  • language evolution
  • sociolinguistics
cassandra logo

HUMINFRA

HUMINFRA  är en ny distribuerad, nationell infrastruktur för forskning inom humaniora, konst och samhällsvetenskap.
  • Gerlof Bouma
  • Dana Dannélls
  • Markus Forsberg
  • Dimitrios Kokkinakis
  • Elena Volodina

Linguistic networks: Connecting constructions within and between languages

This project uses Construction Grammar to develop a linguistic network that (a) accounts for Swedish grammatical constructions and (b) connects them to constructions in other languages.
  • Benjamin Lyngfelt
  • Maia Andreasson
  • Kristian Blensenius
  • Linnea Bäckström
  • Steffen Höder
  • Peter Ljunglöf
  • Jonatan Uppström
  • linguistic typology

Grandma Karl

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the presence of personal and sensitive information, e.g names, political opinions. GDPR suggests pseudonymization as a solution, but we need to learn more about it before adopting it for manipulation of research data.
  • Elena Volodina
  • Simon Dobnik
  • Xuan-Son Vu
  • Therese Lindström Tiedemann
  • pseudonymization
  • research data
  • språkteknologi
  • allmän lingvistik
  • svenska som andraspråk
  • pseudonymisering
  • dataintegritet
  • forskningsdata

Svenska Akademiens samtidsordböcker

Inom ramarna för projektet förvaltas och vidareutvecklas Svenska Akademiens lexikala databas (Salex). Vidare bedrivs arbete med Svenska Akademiens båda samtidsordböcker Svenska Akademiens ordlista (SAOL) och Svensk ordbok utgiven av Svenska Akademien (SO). Arbetet sker på uppdrag av och i samarbete med Svenska Akademien.
  • Kristian Blensenius
  • Markus Forsberg
  • Louise Holmer
  • Hans Landqvist
  • Stellan Petersson
  • Emma Sköldberg
  • Jonatan Uppström
  • Ann Lillieström

A System Architecture for ICALL

ICALL - Intelligent Computer Assisted Language Learning. The aim of the project is to develop an open-source system architecture for supporting ICALL, i.e. CALL that reuses NLP tools and NL resources, with emphasis on the Nordic languages.
  • Lars Borin
  • Elena Volodina
  • Hrafn Loftsson
  • Birna Arnbjörnsdóttir
  • ICALL
  • NLP4CALL
  • Swedish as a second language
  • Second language infrastructure
  • second language learning

Akademiska ordlistor

Den svenska akademiska ordlistan har utvecklats av forskare med anknytning till forskningsområdena språkteknologi, lexikologi/lexikografi och svenska som andraspråk.
  • Lexicography
  • second language learning
  • NLP4CALL

Argumentation analysis and technology

A joint project between Språkbanken Text, FLoV and CLASP, with the purpose of creating and exploring methods for argumentation technology.
  • Anna Lindahl
  • Stian Rødven-Eide
  • Axel Almquist
  • Bill Noble
  • Christine Howes
  • Ellen Breitholtz
  • Vladislav Maraev
  • Martin Kaså
  • linguistics
  • computational linguistics
  • argumentation
  • text
  • dialogue
  • pragmatics
  • semantics
  • politics
  • forum
  • online discussion
  • argumentation technology
  • argument mining

Catta

Developing tools for systematic studies of text classification
  • Niklas Zechner
Catta

Change is Key!

This program has two main aims, firstly to develop corpus-based methods for detecting semantic change (over time) and variation (across social groups and media). This will create general tools for the study and detection of language change at large-scale and directly benefit historical linguistics and lexicography. Secondly, we will collaborate with researchers from social sciences, gender studies, and literature to answer their research questions. We will develop tools, evaluation data, and research methodology for their specific needs.
  • Nina Tahmasebi
  • Simon Hengchen
  • Haim Dubossarsky
  • Dominik Schlechtweg
  • Shafqat Virk
  • Emma Sköldberg
  • Mats Malm
  • Mia Liinason
  • Sarah Valdez
  • Dirk Geeraerts
  • Stefano de Pascale
  • lexical-semantic-change

CONPLISIT

Consumption patterns and life-style in Swedish literature – novels 1830-1860
  • Lars Borin
  • Markus Forsberg
  • Christer Ahlberger

Corpus-driven induction of linguistic knowledge

We will apply corpus-driven methods as a way to expand and correct existing hand-crafted linguistic resources, and conversely we will use hand-crafted resources as additional sources of supervision when learning meaning representations automatically.
  • Richard Johansson
  • Luis Nieto Piña

Digital areal linguistics

The goal of this project is to create a database of comparable lexical items in a number of representative languages spoken in the Himalayan region in India and to use this database for investigating the Himalayas as a linguistic area.
  • Lars Borin
  • Taraka Rama
  • Anju Saxena
  • Bernard Comrie
  • language technology
  • areal linguistics
  • linguistic typology
  • computational linguistics
  • Lexicography

Digital LSI

Digitization of Grierson’s Linguistic Survey of India (LSI; 1903-1927)
  • Lars Borin
  • Shafqat Virk
  • Anju Saxena
  • Bernard Comrie

DReaM: The Dictionary/Grammar Reading Machine

A Multilingual Annotated Corpus of Grammars for the World's Languages
  • Shafqat Virk
  • Markus Forsberg
  • Harald Hammarström

A free cloud service for OCR

The project aims to create a prototype Optical Character Recognition (OCR) web service for processing old Swedish texts that are printed in a blackletter (fraktur) or roman typeface, using one of two open source OCR engines. Our ultimate goal is to provide a service for libraries, museums and archives to upload any digitized document and retrieve an OCRed text with high quality, independent on the quality of the print.
  • Dana Dannélls
  • Lars Borin
  • Gerlof Bouma
  • OCR
  • historiskt material

European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect)

EnetCollect (the European Network for Combining Language Learning with Crowdsourcing Techniques) was a large network project funded as a COST Action that ran from March 2017 till September 2021. It involved stakeholders from more than 40 different countries and has been the catalyst for numerous collaborative research efforts, achievements and publications.
  • Elena Volodina

Evaluation and refinement of an enhanced OCR-process for mass digitisation

The purpose of this project is to fine-tune and evaluate a test platform for OCR-production that was developed by Kungliga biblioteket (KB) in cooperation with the Norwegian software company Zissor in 2017.
  • Dana Dannélls
  • Lars Björk
  • Torsten Johansson
  • OCR
  • digital humanities
  • historiskt material
  • kulturarv
  • language technology

Funktionella somatiska symtom

Tolkning och förståelse av funktionella symtom i primärvården
  • Dimitrios Kokkinakis
  • Eva Lidén
  • Elisabeth Björk Brämberg
  • Sylvia Määttä
  • Staffan Svensson

Hot och hat mot journalister

Under vilka förutsättningar försvagas yttrandefrihet och demokrati av hat och hot mot journalister online?
  • Peter Ljunglöf
  • Oscar Björkenfeldt
  • Måns Svensson

Jubileumsarkivet: Inledande inventering, digitalisering och analys av Göteborgs universitetsbiblioteks samling om stadens 300-årsjubileum 1923

Det ettåriga pilotprojektet inventerar och analyserar maskinellt en unik samling av pressmaterial från 1923 års jubileumsutställning i Göteborg. Projektet är ett samarbete mellan GPS400, Språkbanken Text, Universitetsbiblioteket och Riksarkivet.
  • Lars Borin
  • Dana Dannélls
  • Markus Forsberg

Kelly - KEywords for Language Learning for Young and adults alike

EU project Kelly - lists of useful vocabulary for language learners in 9 languages
  • Elena Volodina
  • Sofie Johansson Kokkinakis
  • ICALL
  • NLP4CALL
  • second language learning
  • CEFR profiles
  • Swedish as a second language

Koala – Korp's linguistic annotations

Improved annotations for the Korp corpus infrastructure.
  • Yvonne Adesam
  • Lars Borin
  • Gerlof Bouma
  • Markus Forsberg
  • Richard Johansson

L2 profiles for Swedish

The L2 P project studies lexical and grammatical competence in second language learner Swedish through two corpora: coursebooks and essays.
  • Elena Volodina
  • Therese Lindström Tiedemann
  • Yousuf Ali Mohammed
  • David Alfter
  • ICALL
  • NLP4CALL
  • språklig komplexitet
  • SLA
  • second language learning
  • CEFR profiles

Market Language

The market Language primarily is funded by MAW in which we look at the changing concepts around “the market”. They have transitioned from implying a concrete physical market to increasingly abstract markets like Europe-wide iron markets, as well as marriage and dating markets. They have also increasingly become actors in our lives, “the market reacted badly to the new corona restrictions”. We will complement the conceptual historians in-depth analyses with computational models of change. This project ranges 2022-2025.
  • Henrik Björck
  • Shafqat Virk
  • Claes Ohlsson

MAÞiR

Developing automatic annotation tools for Old Swedish texts.
  • Gerlof Bouma
  • Yvonne Adesam

MedEval

En svensk medicinsk testkollektion
  • Karin Friberg Heppin
  • Anni Järvelin

META-NORD

The META-NORD project aims to establish an open linguistic infrastructure in the Baltic and Nordic countries.
  • Lars Borin
  • Markus Forsberg

Milage: Multilingual Automated Grammar Extraction

In this project we want to utilize a useful collection of 9000 digitized grammatical descriptions covering over a thousand languages in order to significantly expand the ability to make major language comparisons.
  • Shafqat Virk
  • Markus Forsberg
  • Harald Hammarström

MOLTO - Multilingual Online Translation

MOLTO's goal is to develop a set of tools for translating texts between multiple languages in real time with high quality. Languages are separate modules in the tool and can be varied; prototypes covering a majority of the EU's 23 official languages will be built.
  • Dana Dannélls
  • Generation
  • translation
  • multilingual
  • cultural heritage
  • GF

PINCORE

Person-Centred Information and Communication for patients undergoing Colo-Rectal Cancer Surgery
  • Dimitrios Kokkinakis

Rumour mining

The aim of the project is to investigate the role and importance of rumouring for the vaccination skepticism growing on the internet, and how it can be understood as an expression of civic engagement in the present digital times entailing crucial transformations for everyday civic culture.
  • Dimitrios Kokkinakis
  • Lars Borin
  • Mia-Marie Hammarlin
  • Fredrik Miegel
  • digital humanities

Linguistic and extra-linguistic parameters for early detection of cognitive impairment

  • Dimitrios Kokkinakis
  • Kristina Lundholm Fors
  • Malin Antonsson
  • Marie Eckerström
  • Charalambos Themistocleous
  • language disorders

Language Technology Linked Open Data at Språkbanken

This project aims to make lexical resources for language technology available in the form of linked open data.
  • Lars Borin
  • Dana Dannélls
  • Markus Forsberg
  • LOD
  • Semantiska webben
  • länkad data

SuperLim 2.0

The goal of this project is to finalize the evaluation framework SuperLim by contributing training data for the current collection of test sets, a reference implementation (baseline), and a standardized web-based test environment for comparison between models and publication of results (leaderboard).
  • Markus Forsberg
  • Aleksandrs (Sasha) Berdicevskis
  • Gerlof Bouma
  • Felix Morger
  • Anna Lindahl
  • Dana Dannélls
  • Magnus Sahlgren
  • Love Börjeson
  • Francisca Hoyer
  • Elena Volodina
  • evaluation
  • bias
  • language models
SuperLim 2.0 logotype

SwedishGlue: a benchmark suite for language models

The purpose of this project is to create high-quality test sets for att enable all actors within Swedish NLP to evaluate and compare language models.
  • Markus Forsberg
  • Yvonne Adesam
  • Aleksandrs (Sasha) Berdicevskis
  • Dana Dannélls
  • Felix Morger
  • Gerlof Bouma
  • Magnus Sahlgren
  • Love Börjeson
  • Johanna Bergman
  • evaluation
  • language models
  • bias
Gold reserve

Swedish FrameNet++ (SweFN++)

The goal of the SweFN++ project is to build an open-content -- i.e., freely available and modifiable -- integrated lexical resource for Swedish -- so far lacking -- to be used as a basic infrastructural component in Swedish language technology (LT) research and in the development of LT applications for Swedish.
  • Lars Borin
  • Dana Dannélls
  • Dimitrios Kokkinakis
  • Markus Forsberg
  • Jonatan Uppström
  • Leif-Jöran Olsson
  • Malin Ahlberg
  • Maria Toporowska Gronostaj
  • Karin Friberg Heppin
  • Richard Johansson
  • lexikon
  • lexikal semantik
  • modern
  • integrerad lexikonresurs
  • framenet

Svenskt språkdatalabb

Målet med Svenskt Språkdatalabb är att skapa en nationell kunskapsnod inom språkteknologi, och ta fram svenska referensdatamängder för NLP som sedan tillgängliggörs med öppen access i AI Innovation of Swedens datafabrik.
  • Peter Ljunglöf
  • Aleksandrs (Sasha) Berdicevskis

SweCcn -- a Swedish constructicon

The aim of this project is to develop a Swedish so-called constructicon, a database of Swedish constructions.
  • Lars Borin
  • Dana Dannélls
  • Markus Forsberg
  • Leif-Jöran Olsson
  • Jonatan Uppström
  • Benjamin Lyngfelt
  • Kristian Blensenius
  • Linnea Bäckström
  • Anna Ehrlemark
  • Per Malm
  • Joel Olofsson
  • Julia Prentice
  • Rudolf Rydstedt
  • Emma Sköldberg
  • Sofia Tingsell
  • Lexicography
  • integrerad lexikonresurs
  • constructicon

SweLL - Infrastructure for L2 Swedish

The main focus of this project is on producing data, tools and workflows for research an and evaluation of L2 Swedish.
  • Elena Volodina
  • Yousuf Ali Mohammed
  • Arild Matsson
  • Mats Wirén
  • Beáta Megyesi
  • Julia Prentice
  • Gunlög Sundberg
  • Lena Granstedt
  • Monica Reichenberg
  • Lisa Rudebeck
  • Second language infrastructure
  • Swedish as a second language
  • essay annotation
  • correction annotation
  • pseudonymization

Text classification of medical publications about person-centred care

Computerised text classification is used to help identify documents on the topic of patient-centred care.
  • Niklas Zechner

The rise of complex verb constructions in Germanic

The project studies the rise of complex verb constructions in Germanic.
  • Evie Coussé
  • Gerlof Bouma
  • Nicoline van der Sijs
  • Dirk-Jan de Kooter
  • Trude Dijkstra

Towards a Knowledge-Based Culturomics

The main aim of this research program is to advance the state of the art in language technology resources and methods for semantic processing of Swedish text, in order to provide researchers and others with more sophisticated tools for working with the information contained in large volumes of digitized text, e.g., by being able to correlate and compare the content of texts and text passages on a large scale.
  • Jacobo Rouces
  • Lars Borin
  • Nina Tahmasebi
  • Dimitrios Kokkinakis
  • Pierre Nugues
  • Richard Johansson
  • Dubhashi Devdatt
  • culturomics

Towards Computational Lexical Semantic Change Detection

In this project, we aim to find automatic, corpus-based methods for detecting semantic change and lexical replacement for Swedish and English.
  • Nina Tahmasebi
  • Simon Hengchen
  • Richard Johansson
  • Maria Koptjevskaja Tamm

Variation and contact in medieval personal names

This project investigates which strategies are employed when North Germanic personal names are adapted to medieval German, French and Latin in multilingual contexts. It aims at surveying the variation patterns evident in the adaptations and seeks to develop a theoretical model that explains why different strategies were used.
  • Michelle Waldispühl
  • Lars Borin
  • Dana Dannélls
  • Jonatan Uppström
  • språk
  • culture
  • historiskt material

Xhosa Corpus

Språkbanken Text collaborates with the Department of Philosophy, Linguistics and Theory of Science to create an annotated corpus of Xhosa, an underresourced Bantu language of South Africa (also known as isiXhosa and Xosa).
  • Anne Schumacher
  • Martin Hammarstedt
  • Aleksandrs (Sasha) Berdicevskis
  • Markus Forsberg
  • Eva-Marie Karin Bloom Ström
  • Aron Einar Zahran
  • Onelisa Slater
  • linguistic typology
  • field linguistics
  • African languages
  • Bantu languages
  • glossing
 Proportion of the South African population that speaks isiXhosa as their first language, according to Census 2011 at electoral ward level