Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Tokens Language Access
SVT news 2020
News texts from svt.se
16,025,766 Swedish
SVT news 2021
News texts from svt.se
14,978,995 Swedish
SVT news 2022
News texts from svt.se
13,996,419 Swedish
SVT news 2023
News texts from svt.se
7,501,502 Swedish
SVT news unknown date
News texts from svt.se
36,783 Swedish
SW1203-essays
Essays written by L2 Swedish language learners, university courses
51,972 Swedish
Swe-NERC
A resource for training and evaluation of Named Entity Recognition for Swedish.
140,914 Swedish
SweDiagnostics
Swedish version of (Super)GLUE Diagnostic
Swedish
Swedish ABSAbank
An annotated Swedish corpus for aspect-based sentiment analysis
1,574,226 Swedish
Swedish ABSAbank-Imm 1.1
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Swedish
Swedish analogy 2.0
Swedish semantic and syntactic similarity
Swedish
Swedish Bible 1873
Swedish translation of the Bible from 1873
811,321 Swedish
Swedish Bible 1917
Official Swedish translation of the Bible from 1917
894,720 Swedish
Swedish EAT: question classification
A translated version of the QAQC dataset for expected-answer-type classification.
Swedish
Swedish fraktur 1626-1816
A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.
47,924 Swedish
Swedish framenet (SweFN)
A lexical semantic resource based on the same principles as the English Berkeley FrameNet. This part of the resource contains the corpus examples, automatically enriched with linguistic information.
137,770 Swedish
Swedish newspapers 1818-1870
A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.
186,013 Swedish
Swedish newspapers 1871-1906
A selection of Swedish newspapers printed between 1871 and 1906 from the collections of Kungliga biblioteket (KB). For OCR analysis.
337,635 Swedish
Swedish party programs and election manifestos
Swedish political party programs and election manifestos 1887–2022
2,099,602 Swedish
Swedish Prose Fiction 1800–1900
All Swedish fiction published for the first time during the years 1800, 1820, 1840, 1860, 1880 and 1900
16,275,130 Swedish
Swedish statute book
Swedish statute book 1880-01-01 – 2012-08-16
8,058,400 Swedish
Swedish treebank
A Swedish treebank built from recycled language resources
Swedish
Swedish Twitter 2015
Material collected from a selection of Swedish speaking twitter users from 2015
412,663,140 Swedish
Swedish Twitter 2016
Material collected from a selection of Swedish speaking twitter users from 2016
694,515,420 Swedish
Swedish Twitter 2017
Material collected from a selection of Swedish speaking twitter users from 2017
505,017,012 Swedish
Swedish Wikipedia
Corpus of Swedish Wikipedia
190,149,497 Swedish
SweDN 1.0
A Swedish text summarization corpus
Swedish
SweFAQ 2.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Swedish
SweLL-gold
Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.
Swedish
Collection
SweLL-pilot
Essays written by adult learners of Swedish, manually anonymised and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2006-2015.
Swedish
SweNLI 1.0
A Swedish NLI dataset
Swedish
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Swedish
SweWiC 2.0
A Swedish Word-in-Context dataset
Swedish
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Swedish
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Swedish
Syntag treebank
A Swedish treebank with syntactic analysis of 158 articles from Press-65.
101,329 Swedish
Sæmundaredda
Ancient Icelandic poetry collection also known as The King's Book
46,726 Old Norse
TalbankenSBX
Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.
96,346 Swedish
TalbankenSTB
Talbanken is a Swedish treebank.
96,346 Swedish
The English-Swedish Parallel Corpus (ESPC)
ESPC is a combined comparable and parallel corpus suitable for cross-language research for diffferent types.
1,518,759 Swedish, English
The Riksdag's open data - Debates
Debates from the Swedish parliament in the period 1993/94-2017/18
121,987,537 Swedish
The Swedish Culturomics Gigaword Corpus
One billion Swedish words from 1950 and onwards. Code to extract data from the corpus, as well as usage instructions, can be downloaded from https://svn.spraakdata.gu.se/sb-arkiv/tools/gigaword/
1,015,635,151 Swedish
The Swedish Literature Bank: Free Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
344,688,445 Swedish
The Swedish Literature Bank: Restricted Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
128,261,903 Swedish
Tiden
30 annual volumes of the socialist journal Tiden, 1909–1940
7,106,662 Swedish
TISUS texts
Essays written by L2 Swedish learners as part of a TISUS exam
59,639 Swedish
Twitter Mix
Material from a selection of Swedish Twitter users. Is regularly updated.
499,986,353 Swedish
Twitter: Party Leader Debate June 2013
Material from Twitter, collected during the party leader debate on June 12th 2013 and a few days before and after
38,959,102 Swedish
Twitter: Party Leader Debate May 2014
Material from Twitter, collected during the party leader debate on May 4th 2013 and a few days before and after
34,228,521 Swedish
Twitter: Party Leader Debate October 2013
Material from Twitter, collected during the party leader debate on June 6th 2013 and a few days before and after
25,736,586 Swedish
Ur Dagens Krönika
Eight annual volumes of the cultural journal Ur Dagens Krönika, 1881–1890
1,995,149 Swedish
Collection
Web News
News from Swedish newspapers' websites
Swedish
Web News 2001
News from Swedish newspapers' websites
614,151 Swedish
Web News 2002
News from Swedish newspapers' websites
17,426,173 Swedish
Web News 2003
News from Swedish newspapers' websites
12,217,288 Swedish
Web News 2004
News from Swedish newspapers' websites
13,806,323 Swedish
Web News 2005
News from Swedish newspapers' websites
29,503,647 Swedish
Web News 2006
News from Swedish newspapers' websites
22,563,792 Swedish
Web News 2007
News from Swedish newspapers' websites
24,630,443 Swedish
Web News 2008
News from Swedish newspapers' websites
27,561,804 Swedish
Web News 2009
News from Swedish newspapers' websites
25,888,779 Swedish
Web News 2010
News from Swedish newspapers' websites
23,803,577 Swedish
Web News 2011
News from Swedish newspapers' websites
26,268,603 Swedish
Web News 2012
News from Swedish newspapers' websites
25,132,041 Swedish
Web News 2013
News from Swedish newspapers' websites
22,648,638 Swedish
Wexjöbladet 1820's
Part of the collection Kubhist2
1,338,559 Swedish
WordReference
A large corpus of native and non-native written speech in four languages.
170,000,000 English, Spanish, French, Italian