Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Tokens Sort ascending Language Access
Somali: Sheekooyin Carruureed
26,003 Somali
Fornsvenska textbankens material: Nysvenska lagar
22,701 Swedish
Somali: Afka Hooyo 2010–19 Iswiidhan
21,542 Somali
Folke corpus
Records from Isof archives
20,699 Swedish
Old Finland Swedish: Fredrikshamns Tidning 1888–1908
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
20,484 Swedish
Old Finland Swedish: Wiborgs Tidning 1867–1877
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
19,086 Swedish
Sibirientyska kvinnor
Dialogs between four women born in 1927 to 1937 in the Soviet Volga Republic
16,208 Swedish
Kubhist 2: Posttidningar 1660's
Part of the collection Kubhist 2
15,912 Swedish
Somali: Maaddooyinka Kale 1972–79
14,908 Somali
Somali: Sheekooyin Carruureed (Turjuman)
13,865 Somali
SIC2 - Stockholm Internet Corpus
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.
13,562 Swedish
Somali: Caafimaad 1972–79
13,550 Somali
Old Finland Swedish: Uleåborgs Tidning 1877–1887
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
13,474 Swedish
Old Finland Swedish: Typografiskt minnesblad 1891
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
10,234 Swedish
Kubhist 2: Posttidningar 1650's
Part of the collection Kubhist 2
9,994 Swedish
Af Soomaali 1993-94
9,247 Somali
Somali: Suugaan (Turjuman)
8,796 Somali
Old Finland Swedish: Tidningar Utgifne af et Sällskap i Åbo 1771–1783
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
6,532 Swedish
Kubhist 2: Posttidningar 1670's
Part of the collection Kubhist 2
5,575 Swedish
Old Finland Swedish: Borgåbladet 1885
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
3,020 Swedish
Kubhist 2: Götheborgs Weckolista 1740's
Part of the collection Kubhist 2
2,272 Swedish
Kubhist: Götheborgs weckolista 1740's
Part of the collection Kubhist
1,778 Swedish
Caafimaad 1983
1,521 Somali
IVIP demo
Interaction and Variation in Pluricentric Languages – Communicative Patterns in Sweden Swedish and Finland Swedish.
572 Swedish
Collection
SVT news
News texts from svt.se
Swedish
Collection
Web News
News from Swedish newspapers' websites
Swedish
Collection
Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Swedish
Swedish ABSAbank-Imm 1.1
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Swedish
Swedish analogy 2.0
Swedish semantic and syntactic similarity
Swedish
Collection
Fornsvenska textbankens material
A collection of Old Swedish texts from Fornsvenska textbanken
Swedish
Collection
Old Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Swedish
Collection
SuperLim 2
A standardized suite for evaluation and analysis of Swedish natural language understanding systems.
Swedish
SuperSim (repackaged for Superlim) 2.0
A dataset for word similarity and relatedness in Swedish
Swedish
SweDiagnostics
Swedish version of (Super)GLUE Diagnostic
Swedish
SweDN 1.0
A Swedish text summarization corpus
Swedish
Argumentation sentences 1.0
A translated corpus for classifying sentence stance in relation to a topic.
Swedish
Collection
Familjeliv
Material from the Familjeliv internet forum
Swedish
SweFAQ 2.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Swedish
Swedish EAT: question classification
A translated version of the QAQC dataset for expected-answer-type classification.
Swedish
Collection
Blog mix
Material from a selection of Swedish blogs. Regularly updated.
Swedish
Swedish treebank
A Swedish treebank built from recycled language resources
Swedish
SweLL-gold
Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.
Swedish
SweNLI 1.0
A Swedish NLI dataset
Swedish
Detective Department
Data from the Detective Department at the Gothenburg police, from late 1800s to early 1900s.
Swedish
Collection
Flashback
Material from the Flashback internet forum
Swedish
Collection
SweLL-pilot
Essays written by adult learners of Swedish, manually anonymised and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2006-2015.
Swedish
Collection
ASPAC
The Amsterdam Slavic Parallel Aligned Corpus
Swedish, Belarusian, Bulgarian, Czech, German, Lower Sorbian, Modern Greek (1453-), English, Spanish, French, Croatian, Upper Sorbian, Latin, Macedonian, Dutch, Polish, Portuguese, Romanian, Russian, Kele (Papua New Guinea), Slovak, Slovenian, Serbian, Slavomolisano, Turkmen, Ukrainian
Collection
Göteborgsposten
A corpus with texts from the newspaper Göteborgs-Posten
Swedish
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Swedish
Collection
Europarl
European Parliament Proceedings Parallel Corpus
Swedish, Danish, German, Modern Greek (1453-), English, Spanish, Finnish, French, Italian, Dutch, Portuguese
Collection
Kubord 1
Word frequencies from modern newspaper texts from the National Library of Sweden
Swedish
Collection
Kvinnotidningar
Material from historical women's periodicals
Swedish
SweWiC 2.0
A Swedish Word-in-Context dataset
Swedish
Collection
Läkartidningen medical journal
Corpus for health care technical language
Swedish
Collection
Bicameral Riksdag
Collection of textual documents from the Swedish bicameral parliament data
Swedish
Collection
Kubhist 2
Diachronic collection of Swedish historical newspaper texts from the period of 1645–1926. Kubhist 2 is an updated version av Kubhist with improved OCR and more material.
Swedish
Corpus word statistics
Accumulated word statistics from many of our modern Swedish corpora
Collection
Kubord 2
Word relations from modern newspaper texts from the National Library of Sweden
Swedish
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Swedish
Collection
Press
Swedish press
Swedish
Collection
Kubhist
Diachronic collection of Swedish historical newspaper texts from the period of 1749–1926
Swedish
DaLAJ-GED-SuperLim 2.0
Dataset for Linguistic Acceptability Judgments (and more), v.2.0
Swedish
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Swedish
Collection
Medieval letters
Swedish medieval charters from Diplomatarium Suecanum (Svenskt Diplomatariums huvudkartotek, SDHK)
Latin, German, Norwegian, Swedish
Collection
Riksdagens öppna data
Data from the Swedish parliament collected from data.riksdagen.se
Swedish
ScandiSent
Sentiment Corpus for Swedish, Norwegian, Danish, Finnish and English crawled from trustpilot.
Swedish, Norwegian Bokmål, Danish, English, Finnish
Collection
Somali corpora
A collection of Samli corpora
Somali