Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Tokens Sort ascending Language Access
The Swedish Culturomics Gigaword Corpus
One billion Swedish words from 1950 and onwards. Code to extract data from the corpus, as well as usage instructions, can be downloaded from https://svn.spraakdata.gu.se/sb-arkiv/tools/gigaword/
1,015,635,151 Swedish
Familjeliv: Delicate Room
Material from the Familjeliv internet forum.
1,000,091,318 Swedish
Flashback: Politics
Material from the Flashback internet forum.
820,692,247 Swedish
Flashback: Society
Material from the Flashback internet forum.
810,078,225 Swedish
Swedish Twitter 2016
Material collected from a selection of Swedish speaking twitter users from 2016
694,515,420 Swedish
Familjeliv: Parent
Material from the Familjeliv internet forum.
618,316,542 Swedish
Swedish Twitter 2017
Material collected from a selection of Swedish speaking twitter users from 2017
505,017,012 Swedish
Twitter Mix
Material from a selection of Swedish Twitter users. Is regularly updated.
499,986,353 Swedish
Flashback: Science & Humanities
Material from the Flashback internet forum.
492,612,129 Swedish
Familjeliv: Pregnant
Material from the Familjeliv internet forum.
467,967,257 Swedish
Flashback: Culture & Media
Material from the Flashback internet forum.
466,942,449 Swedish
KBs digitaliserade SOU:er (1922–1996)
Statens offentliga utredningar (SOU) i digitaliserat format från <a href="https://sou.kb.se" target="_blank">Kungliga biblioteket</a>. Samlingen är inte komplett.
428,188,012 Swedish
Swedish Twitter 2015
Material collected from a selection of Swedish speaking twitter users from 2015
412,663,140 Swedish
Riksdagens öppna data: Proposition
Propositioner och skrivelser från regeringen
379,103,550 Swedish
Flashback: Home, Housing & Family
Material from the Flashback internet forum.
372,037,073 Swedish
The Swedish Literature Bank: Free Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
344,688,445 Swedish
Bicameral riksdag: Protocols
Part of the data set "Bicameral Riksdag"
327,554,657 Swedish
Bicameral riksdag: Propositions and letters
Part of the data set "Bicameral Riksdag"
319,201,218 Swedish
Flashback: Computers & IT
Material from the Flashback internet forum.
315,751,284 Swedish
Familjeliv: General Threads – Society
Material from the Familjeliv internet forum.
307,777,980 Swedish
Familjeliv: Member Threads – Expecting Children
Material from the Familjeliv internet forum.
304,128,161 Swedish
Familjeliv: Member Threads – Parents
Material from the Familjeliv internet forum.
286,038,223 Swedish
Familjeliv: Member Threads – General
Material from the Familjeliv internet forum.
280,503,150 Swedish
Riksdagens öppna data: Statens offentliga utredningar
Olika utredningars förslag till regeringen
273,083,646 Swedish
Flashback: Sports & Fitness
Material from the Flashback internet forum.
270,061,043 Swedish
Flashback: Drugs
Material from the Flashback internet forum.
248,419,114 Swedish
Riksdagens öppna data: Protokoll
Protokoll från kammarens sammanträden
247,384,265 Swedish
DReaM-Copyright-Protected
A multilingual corpus of linguistic descriptions of the world's natural languages.
225,617,801 English
Riksdagens öppna data: Betänkande
Utskottens betänkanden och utlåtanden, inklusive rksdagens beslut, en sammanfattning av voteringsresultaten och Beslut i korthet
203,229,298 Swedish
Familjeliv: Family Planning
Material from the Familjeliv internet forum.
196,628,553 Swedish
Bicameral riksdag: Reports, memorandums and opinions
Part of the data set "Bicameral Riksdag"
195,467,124 Swedish
Swedish Wikipedia
Corpus of Swedish Wikipedia
190,149,497 Swedish
SemEval2020 Task 1
Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (extracts from Kubhist v2)
182,000,000 Swedish
Familjeliv: Difficult to get children
Material from the Familjeliv internet forum.
173,449,358 Swedish
Kubhist 2: Stockholms Dagblad 1880's
Part of the collection Kubhist 2
172,566,864 Swedish
WordReference
A large corpus of native and non-native written speech in four languages.
170,000,000 English, Spanish, French, Italian
Familjeliv: Sex & Relationships
Material from the Familjeliv internet forum.
169,521,163 Swedish
Riksdagens öppna data: Motion
Motioner från riksdagens ledamöter
162,923,798 Swedish
Flashback: Other
Material from the Flashback internet forum.
161,004,923 Swedish
Familjeliv: General Threads – Body & Soul
Material from the Familjeliv internet forum.
140,028,849 Swedish
Flashback: Economy
Material from the Flashback internet forum.
137,327,439 Swedish
Familjeliv: Member Threads – Family Planning
Material from the Familjeliv internet forum.
134,245,556 Swedish
Kubhist 2: Stockholms Dagblad 1870's
Part of the collection Kubhist 2
132,779,856 Swedish
The Swedish Literature Bank: Restricted Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
128,261,903 Swedish
Kubhist 2: Nya Dagligt Allehanda 1880's
Part of the collection Kubhist 2
127,689,132 Swedish
Kubhist 2: Göteborgs Handels- och Sjöfartstidning 1880's
Part of the collection Kubhist 2
126,449,799 Swedish
The Riksdag's open data - Debates
Debates from the Swedish parliament in the period 1993/94-2017/18
121,987,537 Swedish
Flashback: Lifestyle
Material from the Flashback internet forum.
110,639,813 Swedish
Kubhist 2: Stockholms Dagblad 1860's
Part of the collection Kubhist 2
110,424,960 Swedish
Kubhist 2: Nya Dagligt Allehanda 1870's
Part of the collection Kubhist 2
108,932,888 Swedish
Kubhist 2: Göteborgsposten 1880's
Part of the collection Kubhist 2
106,175,803 Swedish
Kubhist 2: Göteborgs Handels- och Sjöfartstidning 1870's
Part of the collection Kubhist 2
105,868,348 Swedish
Kubhist 2: Stockholms Dagblad 1890's
Part of the collection Kubhist 2
104,229,980 Swedish
Kubhist 2: Nya Dagligt Allehanda 1860's
Part of the collection Kubhist 2
102,427,400 Swedish
Kubhist 2: Aftonbladet 1890's
Part of the collection Kubhist 2
101,943,656 Swedish
Blog mix 2011
Material from a selection of Swedish blogs. Is updated regularly.
100,591,617 Swedish
Kubhist 2: Aftonbladet 1880's
Part of the collection Kubhist 2
100,382,714 Swedish
Blog mix 2010
Material from a selection of Swedish blogs. Is updated regularly.
97,435,693 Swedish
Kubhist 2: Aftonbladet 1870's
Part of the collection Kubhist 2
96,758,392 Swedish
Kubhist 2: Aftonbladet 1860's
Part of the collection Kubhist 2
94,574,658 Swedish
Flashback: Sex
Material from the Flashback internet forum.
93,420,880 Swedish
Familjeliv: General Threads – Entertainment
Material from the Familjeliv internet forum.
90,124,289 Swedish
Kubhist 2: Stockholms Dagblad 1850's
Part of the collection Kubhist 2
89,487,015 Swedish
Kubhist 2: Göteborgs Handels- och Sjöfartstidning 1890's
Part of the collection Kubhist 2
85,890,778 Swedish
Kubhist 2: Göteborgs Handels- och Sjöfartstidning 1860's
Part of the collection Kubhist 2
85,578,468 Swedish
Kubhist 2: Post- och Inrikes Tidningar 1880's
Part of the collection Kubhist 2
84,999,696 Swedish
Flashback: Vehicles & Traffic
Material from the Flashback internet forum.
84,187,446 Swedish
Kubhist 2: Nya Dagligt Allehanda 1890's
Part of the collection Kubhist 2
82,878,309 Swedish
Kubhist 2: Aftonbladet 1850's
Part of the collection Kubhist 2
82,212,220 Swedish
Blog mix 2012
Material from a selection of Swedish blogs. Is updated regularly.
80,041,223 Swedish
Kubhist 2: Stockholms Dagblad 1840's
Part of the collection Kubhist 2
79,190,467 Swedish
Flashback: Food, Beverages & Tobacco
Material from the Flashback internet forum.
77,673,422 Swedish
Kubhist 2: Göteborgsposten 1870's
Part of the collection Kubhist 2
76,702,361 Swedish
Blog mix 2009
Material from a selection of Swedish blogs. Is updated regularly.
75,113,677 Swedish
DReaM
A multilingual corpus of linguistic descriptions of the world's natural languages.
75,027,790 English
Kubord 1 - Word frequencies Dagens Nyheter 2000
Part of the collection Kubord 1
74,199,162 Swedish
Europarl: Swedish-French
Part of European Parliament Proceedings Parallel Corpus
73,983,862 Swedish, French
Bicameral riksdag: Motions
Part of the data set "Bicameral Riksdag"
73,189,180 Swedish
Kubhist 2: Post- och Inrikes Tidningar 1860's
Part of the collection Kubhist 2
71,456,997 Swedish
Europarl: Swedish-Portuguese
Part of European Parliament Proceedings Parallel Corpus
71,426,540 Swedish, Portuguese
Europarl: Swedish-Spanish
Part of European Parliament Proceedings Parallel Corpus
70,962,605 Swedish, Spanish
Familjeliv: General Threads – House & Home
Material from the Familjeliv internet forum.
70,766,196 Swedish
Kubord 1 - Word frequencies Dagens Nyheter 2001
Part of the collection Kubord 1
70,746,359 Swedish
Kubhist 2: Stockholms Dagblad 1830's
Part of the collection Kubhist 2
70,171,078 Swedish
Europarl: Swedish-Dutch
Part of European Parliament Proceedings Parallel Corpus
70,164,912 Swedish, Dutch
Europarl: Swedish-English
Part of European Parliament Proceedings Parallel Corpus
70,026,783 Swedish, English
Europarl: Swedish-Italian
Part of European Parliament Proceedings Parallel Corpus
69,549,513 Swedish, Italian
Kubhist 2: Post- och Inrikes Tidningar 1870's
Part of the collection Kubhist 2
69,262,442 Swedish
Kubord 1 - Word frequencies Dagens Nyheter 2002
Part of the collection Kubord 1
68,920,538 Swedish
Europarl: Swedish-Danish
Part of European Parliament Proceedings Parallel Corpus
68,186,074 Swedish, Danish
Europarl: Swedish-German
Part of European Parliament Proceedings Parallel Corpus
68,134,508 Swedish, German
Kubhist 2: Norrköpings Tidningar 1880's
Part of the collection Kubhist 2
67,358,103 Swedish
Kubhist 2: Göteborgsposten 1890's
Part of the collection Kubhist 2
65,852,833 Swedish
Familjeliv: General Threads – Familjeliv.se
Material from the Familjeliv internet forum.
64,302,153 Swedish
Kubhist 2: Aftonbladet 1840's
Part of the collection Kubhist 2
62,967,761 Swedish
Kubord 1 - Word frequencies Dagens Nyheter 2003
Part of the collection Kubord 1
62,900,920 Swedish
Blog mix 2013
Material from a selection of Swedish blogs. Is updated regularly.
62,098,899 Swedish
Kubord 1 - Word frequencies Dagens Nyheter 2005
Part of the collection Kubord 1
61,800,135 Swedish
Bicameral riksdag: Narratives and accounts
Part of the data set "Bicameral Riksdag"
61,348,401 Swedish
Kubord 1 - Word frequencies Dagens Nyheter 2004
Part of the collection Kubord 1
60,004,993 Swedish