Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Type Language Access
Somali: Caafimaad 1972–79
Corpus Somali
Somali: Cilmi-Afeed
Corpus Somali
Somali: Cilmiga Bulshada 1971–1980
Corpus Somali
Somali: Cilmiga Bulshada 2001-03 Soomaaliya
Corpus Somali
Somali: Cilmiga Bulshada 2016 Somaliland
Corpus Somali
Somali: Kitaabka Quduuska Ah
Corpus Somali
Somali: Maaddooyinka Kale 1972–79
Corpus Somali
Somali: Raadiyaha Denmark 2014
Corpus Somali
Somali: Raadiyaha Iswiidhan 2014
Corpus Somali
Somali: Saynis 1980–89
Corpus Somali
Somali: Sheekooyin Carruureed
Corpus Somali
Somali: Sheekooyin Carruureed (Turjuman)
Corpus Somali
Somali: Sheekooyin Gaagaaban
Corpus Somali
Somali: Suugaan
Corpus Somali
Somali: Suugaan (Turjuman)
Corpus Somali
Somali: Suugaan 2
Corpus Somali
Somali: Taariikh iyo Dhaqan (Turjuman)
Corpus Somali
Somali: Xisaab 2001 Soomaaliya
Corpus Somali
Somali: Xisaab 2016 Somaliland
Corpus Somali
SpIn v1
256 essays collected from Language Introduction course (mid-term exams) for newly arrived refugees. Some of the students are recurrent.
Corpus Swedish
Sports anglicisms
English loan-words in the Swedish sports press
Lexicon Swedish
Språkprov SO 2009
De drygt 94 000 språkexemplen är hämtade ur Svensk ordbok utgiven av Svenska Akademien (2009). Exemplens uppgift är att stödja ordboksdefinitionerna och att ge information om uppslagsordens fraseologi. <br><br>För åtkomst kontakta <a href="mailto:emma.skoldberg@svenska.gu.se">Emma Sköldberg</a>.
Corpus Swedish
SUC 2.0
Stockholm-Umeå corpus 2.0
Corpus Swedish
SUC 3.0
Stockholm-Umeå corpus 3.0
Corpus Swedish
SUC Novels (StorSUC)
Stockholm-Umeå corpus
Corpus Swedish
SUCX 2.0
Stockholm-Umeå corpus 2.0 scrambled
Corpus Swedish
SUCX 3.0
Stockholm-Umeå corpus 3.0 scrambled
Corpus Swedish
Collection
SuperLim 2
A standardized suite for evaluation and analysis of Swedish natural language understanding systems.
Corpus Swedish
SuperSim (repackaged for Superlim) 2.0
A dataset for word similarity and relatedness in Swedish
Corpus Swedish
sv-COVID-19
A compilation of various articles related to the COVID-19 pandemic
Corpus Swedish
Svensk Tidskrift
27 annual volumes of the conservative journal Svensk Tidskrift, from 1891 to 1940
Corpus Swedish
Collection
SVT news
News texts from svt.se
Corpus Swedish
SVT news 2004
News texts from svt.se
Corpus Swedish
SVT news 2005
News texts from svt.se
Corpus Swedish
SVT news 2006
News texts from svt.se
Corpus Swedish
SVT news 2007
News texts from svt.se
Corpus Swedish
SVT news 2008
News texts from svt.se
Corpus Swedish
SVT news 2009
News texts from svt.se
Corpus Swedish
SVT news 2010
News texts from svt.se
Corpus Swedish
SVT news 2011
News texts from svt.se
Corpus Swedish
SVT news 2012
News texts from svt.se
Corpus Swedish
SVT news 2013
News texts from svt.se
Corpus Swedish
SVT news 2014
News texts from svt.se
Corpus Swedish
SVT news 2015
News texts from svt.se
Corpus Swedish
SVT news 2016
News texts from svt.se
Corpus Swedish
SVT news 2017
News texts from svt.se
Corpus Swedish
SVT news 2018
News texts from svt.se
Corpus Swedish
SVT news 2019
News texts from svt.se
Corpus Swedish
SVT news 2020
News texts from svt.se
Corpus Swedish
SVT news 2021
News texts from svt.se
Corpus Swedish
SVT news 2022
News texts from svt.se
Corpus Swedish
SVT news 2023
News texts from svt.se
Corpus Swedish
SVT news unknown date
News texts from svt.se
Corpus Swedish
SW1203-essays
Essays written by L2 Swedish language learners, university courses
Corpus Swedish
Swe-NERC
A resource for training and evaluation of Named Entity Recognition for Swedish.
Corpus Swedish
Swedberg's Swensk Ordabok
Swedberg's Swensk Ordabok
Lexicon Swedish, Latin
Swedberg's Swensk Ordabok (morphology, rudimentary)
Swedberg's Swensk Ordabok (morphology, rudimentary)
Lexicon Swedish
SweDiagnostics
Swedish version of (Super)GLUE Diagnostic
Corpus Swedish
Swedish ABSAbank
An annotated Swedish corpus for aspect-based sentiment analysis
Corpus Swedish
Swedish ABSAbank-Imm 1.1
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Corpus Swedish
Swedish analogy 2.0
Swedish semantic and syntactic similarity
Corpus Swedish
Swedish Bible 1873
Swedish translation of the Bible from 1873
Corpus Swedish
Swedish Bible 1917
Official Swedish translation of the Bible from 1917
Corpus Swedish
Swedish Diachronic Word Embeddings
Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data
Model Swedish
Swedish EAT: question classification
A translated version of the QAQC dataset for expected-answer-type classification.
Corpus Swedish
Swedish fraktur 1626-1816
A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.
Corpus Swedish
Swedish framenet (SweFN)
A lexical semantic resource based on the same principles as the English Berkeley FrameNet. This part of the resource contains the corpus examples, automatically enriched with linguistic information.
Corpus Swedish
Swedish FrameNet (SweFN)
A lexical semantic resource based on the same principles as the English Berkeley FrameNet. This part of the resource contains the frames and the manually annotated semantic content.
Lexicon Swedish
Swedish newspapers 1818-1870
A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus Swedish
Swedish newspapers 1871-1906
A selection of Swedish newspapers printed between 1871 and 1906 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus Swedish
Swedish party programs and election manifestos
Swedish political party programs and election manifestos 1887–2022
Corpus Swedish
Swedish Prose Fiction 1800–1900
All Swedish fiction published for the first time during the years 1800, 1820, 1840, 1860, 1880 and 1900
Corpus Swedish
Swedish statute book
Swedish statute book 1880-01-01 – 2012-08-16
Corpus Swedish
Swedish treebank
A Swedish treebank built from recycled language resources
Corpus Swedish
Swedish Twitter 2015
Material collected from a selection of Swedish speaking twitter users from 2015
Corpus Swedish
Swedish Twitter 2016
Material collected from a selection of Swedish speaking twitter users from 2016
Corpus Swedish
Swedish Twitter 2017
Material collected from a selection of Swedish speaking twitter users from 2017
Corpus Swedish
Swedish Wikipedia
Corpus of Swedish Wikipedia
Corpus Swedish
Swedish words, LEXIN
Lexicon for immigrants. Second edition
Lexicon Swedish, Albanian, Bosnian, English, Finnish, Modern Greek (1453-), Croatian, Kurdish, Iranian Persian, Russian, Serbian, Somali, Spanish, Turkish
Swedish-Finnish word lists
Swedish-Finnish word lists within various domains
Lexicon Swedish
SweDN 1.0
A Swedish text summarization corpus
Corpus Swedish
SweFAQ 2.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Corpus Swedish
SweLL-gold
Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.
Corpus Swedish
Collection
SweLL-pilot
Essays written by adult learners of Swedish, manually anonymised and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2006-2015.
Corpus Swedish
SweNLI 1.0
A Swedish NLI dataset
Corpus Swedish
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Corpus Swedish
SweSAT Swedish Scholastic Aptitude Test Synonyms 1.1
Swedish Scholastic Aptitude Test Synonyms
Lexicon Swedish
Swesaurus
A Swedish WordNet
Lexicon Swedish
SweWiC 2.0
A Swedish Word-in-Context dataset
Corpus Swedish
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Corpus Swedish
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Corpus Swedish
Syntag treebank
A Swedish treebank with syntactic analysis of 158 articles from Press-65.
Corpus Swedish
Sæmundaredda
Ancient Icelandic poetry collection also known as The King's Book
Corpus Old Norse
TalbankenSBX
Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.
Corpus Swedish
TalbankenSTB
Talbanken is a Swedish treebank.
Corpus Swedish
The English-Swedish Parallel Corpus (ESPC)
ESPC is a combined comparable and parallel corpus suitable for cross-language research for diffferent types.
Corpus Swedish, English
The Riksdag's open data - Debates
Debates from the Swedish parliament in the period 1993/94-2017/18
Corpus Swedish
The Swedish Culturomics Gigaword Corpus
One billion Swedish words from 1950 and onwards. Code to extract data from the corpus, as well as usage instructions, can be downloaded from https://svn.spraakdata.gu.se/sb-arkiv/tools/gigaword/
Corpus Swedish
The Swedish Literature Bank: Free Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
Corpus Swedish
The Swedish Literature Bank: Restricted Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
Corpus Swedish