Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Tokens Sort descending Language Access
Collection
Web News
News from Swedish newspapers' websites
Swedish
Collection
Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Swedish
Swedish ABSAbank-Imm 1.1
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Swedish
Swedish analogy 2.0
Swedish semantic and syntactic similarity
Swedish
Collection
Fornsvenska textbankens material
A collection of Old Swedish texts from Fornsvenska textbanken
Swedish
Collection
Old Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Swedish
Collection
SuperLim 2
A standardized suite for evaluation and analysis of Swedish natural language understanding systems.
Swedish
SuperSim (repackaged for Superlim) 2.0
A dataset for word similarity and relatedness in Swedish
Swedish
SweDiagnostics
Swedish version of (Super)GLUE Diagnostic
Swedish
SweDN 1.0
A Swedish text summarization corpus
Swedish
Argumentation sentences 1.0
A translated corpus for classifying sentence stance in relation to a topic.
Swedish
Collection
Familjeliv
Material from the Familjeliv internet forum
Swedish
SweFAQ 2.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Swedish
Swedish EAT: question classification
A translated version of the QAQC dataset for expected-answer-type classification.
Swedish
Collection
Blog mix
Material from a selection of Swedish blogs. Regularly updated.
Swedish
Swedish treebank
A Swedish treebank built from recycled language resources
Swedish
SweLL-gold
Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.
Swedish
SweNLI 1.0
A Swedish NLI dataset
Swedish
Detective Department
Data from the Detective Department at the Gothenburg police, from late 1800s to early 1900s.
Swedish
Collection
Flashback
Material from the Flashback internet forum
Swedish
Collection
SweLL-pilot
Essays written by adult learners of Swedish, manually anonymised and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2006-2015.
Swedish
Collection
ASPAC
The Amsterdam Slavic Parallel Aligned Corpus
Swedish, Belarusian, Bulgarian, Czech, German, Lower Sorbian, Modern Greek (1453-), English, Spanish, French, Croatian, Upper Sorbian, Latin, Macedonian, Dutch, Polish, Portuguese, Romanian, Russian, Kele (Papua New Guinea), Slovak, Slovenian, Serbian, Slavomolisano, Turkmen, Ukrainian
Collection
Göteborgsposten
A corpus with texts from the newspaper Göteborgs-Posten
Swedish
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Swedish
Collection
Europarl
European Parliament Proceedings Parallel Corpus
Swedish, Danish, German, Modern Greek (1453-), English, Spanish, Finnish, French, Italian, Dutch, Portuguese
Collection
Kubord 1
Word frequencies from modern newspaper texts from the National Library of Sweden
Swedish
Collection
Kvinnotidningar
Material from historical women's periodicals
Swedish
SweWiC 2.0
A Swedish Word-in-Context dataset
Swedish
Collection
Läkartidningen medical journal
Corpus for health care technical language
Swedish
Collection
Bicameral Riksdag
Collection of textual documents from the Swedish bicameral parliament data
Swedish
Collection
Kubhist 2
Diachronic collection of Swedish historical newspaper texts from the period of 1645–1926. Kubhist 2 is an updated version av Kubhist with improved OCR and more material.
Swedish
Corpus word statistics
Accumulated word statistics from many of our modern Swedish corpora
Collection
Kubord 2
Word relations from modern newspaper texts from the National Library of Sweden
Swedish
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Swedish
Collection
Press
Swedish press
Swedish
Collection
Kubhist
Diachronic collection of Swedish historical newspaper texts from the period of 1749–1926
Swedish
DaLAJ-GED-SuperLim 2.0
Dataset for Linguistic Acceptability Judgments (and more), v.2.0
Swedish
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Swedish
Collection
Medieval letters
Swedish medieval charters from Diplomatarium Suecanum (Svenskt Diplomatariums huvudkartotek, SDHK)
Latin, German, Norwegian, Swedish
Collection
Riksdagens öppna data
Data from the Swedish parliament collected from data.riksdagen.se
Swedish
ScandiSent
Sentiment Corpus for Swedish, Norwegian, Danish, Finnish and English crawled from trustpilot.
Swedish, Norwegian Bokmål, Danish, English, Finnish
Collection
Somali corpora
A collection of Samli corpora
Somali
Collection
SVT news
News texts from svt.se
Swedish
IVIP demo
Interaction and Variation in Pluricentric Languages – Communicative Patterns in Sweden Swedish and Finland Swedish.
572 Swedish
Caafimaad 1983
1,521 Somali
Kubhist: Götheborgs weckolista 1740's
Part of the collection Kubhist
1,778 Swedish
Kubhist 2: Götheborgs Weckolista 1740's
Part of the collection Kubhist 2
2,272 Swedish
Old Finland Swedish: Borgåbladet 1885
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
3,020 Swedish
Kubhist 2: Posttidningar 1670's
Part of the collection Kubhist 2
5,575 Swedish
Old Finland Swedish: Tidningar Utgifne af et Sällskap i Åbo 1771–1783
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
6,532 Swedish
Somali: Suugaan (Turjuman)
8,796 Somali
Af Soomaali 1993-94
9,247 Somali
Kubhist 2: Posttidningar 1650's
Part of the collection Kubhist 2
9,994 Swedish
Old Finland Swedish: Typografiskt minnesblad 1891
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
10,234 Swedish
Old Finland Swedish: Uleåborgs Tidning 1877–1887
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
13,474 Swedish
Somali: Caafimaad 1972–79
13,550 Somali
SIC2 - Stockholm Internet Corpus
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.
13,562 Swedish
Somali: Sheekooyin Carruureed (Turjuman)
13,865 Somali
Somali: Maaddooyinka Kale 1972–79
14,908 Somali
Kubhist 2: Posttidningar 1660's
Part of the collection Kubhist 2
15,912 Swedish
Sibirientyska kvinnor
Dialogs between four women born in 1927 to 1937 in the Soviet Volga Republic
16,208 Swedish
Old Finland Swedish: Wiborgs Tidning 1867–1877
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
19,086 Swedish
Old Finland Swedish: Fredrikshamns Tidning 1888–1908
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
20,484 Swedish
Folke corpus
Records from Isof archives
20,699 Swedish
Somali: Afka Hooyo 2010–19 Iswiidhan
21,542 Somali
Fornsvenska textbankens material: Nysvenska lagar
22,701 Swedish
Somali: Sheekooyin Carruureed
26,003 Somali
Applications
Anonymised job applications. The corpus is protected, contact Lena Rogström (lena.rogstroem@svenska.gu.se) for more information and access.
26,228 Swedish
Medieval letters: Norwegian
Charters written in Norwegian, from the Diplomatarium Suecanum (Svenskt Diplomatariums huvudkartotek, SDHK)
27,718 Norwegian
Blog mix 1998
Material from a selection of Swedish blogs. Is updated regularly.
30,939 Swedish
ASPAC: Swedish-Turkmen
Part of The Amsterdam Slavic Parallel Aligned Corpus
31,397 Swedish, Turkmen
Finland Swedish: Österbottens tidning 2011
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
32,950 Finland Swedish
Somali: Saynis 1980–89
33,034 Somali
MAÞiR Trees
An Old Swedish treebank, with lemma, parts-of-speech, and PROIEL-style dependency syntax.
33,721 Swedish
Old Finland Swedish: Björneborgs Tidning 1897–1907
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
34,057 Swedish
Sibirian-German
Siberian German is transcribed German spoken of about 36 000 people in the region of Krasnoyarsk in Siberia (Russia).
34,205 Swedish
Somali: Af-Soomaali 2001 Somaliland
35,043 Somali
ASPAC: Swedish-Molise Slavik
Part of The Amsterdam Slavic Parallel Aligned Corpus
35,279 Slavomolisano, Swedish
Somali: Taariikh iyo Dhaqan (Turjuman)
35,479 Somali
ASPAC: Swedish-Lower Sorbian
Part of The Amsterdam Slavic Parallel Aligned Corpus
36,551 Swedish, Lower Sorbian
SVT news unknown date
News texts from svt.se
36,783 Swedish
Old Finland Swedish: Åland 1891–1911
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
38,288 Swedish
Medieval letters: Other languages
Charters written in other languages, from the Diplomatarium Suecanum (Svenskt Diplomatariums huvudkartotek, SDHK)
39,430 Swedish
Finland Swedish: Syd-Österbotten 2013
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
40,030 Finland Swedish
Betänkande ang. läroböcker (1882)
A report from 1882, digitized by the Gothenburg University Library
41,521 Swedish
Somali: Xisaab 2016 Somaliland
41,922 Somali
Old Finland Swedish: Spanska Flugan 1839–1841
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
41,935 Swedish
Fornsvenska textbankens material: Nysvenska bibelböcker
44,990 Swedish
Sæmundaredda
Ancient Icelandic poetry collection also known as The King's Book
46,726 Old Norse
SpIn v1
256 essays collected from Language Introduction course (mid-term exams) for newly arrived refugees. Some of the students are recurrent.
46,911 Swedish
Old Finland Swedish: Frågebrevsvar 1900–1949
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
47,922 Swedish
Swedish fraktur 1626-1816
A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.
47,924 Swedish
Somali: Cilmiga Bulshada 2001-03 Soomaaliya
48,234 Somali
Interrogations
The corpus is protected, contact Ylva Burman (ylva.byrman@svenska.gu.se) for more information and access.
49,299 Swedish
Somali: Xisaab 2001 Soomaaliya
50,361 Somali
Somali: Af Soomaali 1971-79
50,794 Somali
InterFra Swedish
To promote research in the field of French L2 second language acquisition in a developmental, interactional and variationist perspective. The HLP (High Level Proficiency in Second language use) project also investigates learners of other L2s such as Swedish, Spanish, English and Italian.
50,993 Swedish
Af-Soomaali 2016 Somaliland
51,236 Somali
Förvaltningsmyndigheters texter
51,366 Swedish
SW1203-essays
Essays written by L2 Swedish language learners, university courses
51,972 Swedish