Skip to main content
Språkbanken Text is a department within Språkbanken.

Language resources

On this page you can browse and search our datasets. Click on a row name to see what files are available for download. You can go directly to the search interface by clicking on the tool logo.
Resurs Antal tokens Språk Åtkomst
Af Soomaali 1993-94
9,247 Somali
Af-Soomaali 2016 Somaliland
51,236 Somali
Applications
Anonymised job applications. The corpus is protected, contact Lena Rogström (lena.rogstroem@svenska.gu.se) for more information and access.
26,228 Swedish
Collection
ASPAC
The Amsterdam Slavic Parallel Aligned Corpus
Swedish, Belarusian, Bulgarian, Czech, German, Lower Sorbian, Modern Greek (1453-), English, Spanish, French, Croatian, Upper Sorbian, Latin, Macedonian, Dutch, Polish, Portuguese, Romanian, Russian, Kele (Papua New Guinea), Slovak, Slovenian, Serbian, Slavomolisano, Turkmen, Ukrainian
Collection
Bicameral Riksdag
Collection of textual documents from the Swedish bicameral parliament data
Swedish
Collection
Blog mix
Material from a selection of Swedish blogs. Regularly updated.
Swedish
Caafimaad 1983
1,521 Somali
Corpus of spoken isiXhosa
A corpus of transcribed and annotated recordings of spoken Xhosa.
7,039
Dalin: Then Swänska Argus 1732-1734
Manual transcription of Then Swänska Argus by Olof von Dalin, Stockholm, 1732–1734. For OCR analysis.
213,399 Swedish
Databank of 1977 Spanish Press
Text from two Spanish newspapers from 1977. Part of SOL - Spanish Online.
2,166,383 Spanish
Databank of Eleven Spanish Novels 1951–1971
Corpus consisting of 11 Spanish novels. Part of SOL - Spanish Online.
1,248,184 Spanish
DiabetologNytt (1996–1999)
The paper DiabetologNytt (Diabetologynews) 1996-1999
228,313 Swedish
DReaM
A multilingual corpus of linguistic descriptions of the world's natural languages.
75,027,790 English
DReaM-Copyright-Protected
A multilingual corpus of linguistic descriptions of the world's natural languages.
225,617,801 English
Eukalyptus Treebank of Written Swedish
A treebank with written Swedish data, with parts-of-speech, TIGER-style syntax, multiword expressions and sense annotation
99,913 Swedish
Collection
Europarl
European Parliament Proceedings Parallel Corpus
Swedish, Danish, German, Modern Greek (1453-), English, Spanish, Finnish, French, Italian, Dutch, Portuguese
Collection
Familjeliv
Material from the Familjeliv internet forum
Swedish
Collection
Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Swedish
Collection
Flashback
Material from the Flashback internet forum
Swedish
Collection
Fornsvenska textbankens material
A collection of Old Swedish texts from Fornsvenska textbanken
Swedish
Fornsvenska textbankens material: Äldre lagar
531,192 Swedish
Fornsvenska textbankens material: Äldre religiös prosa
491,605 Swedish
Fornsvenska textbankens material: Dalin's Then Swänska Argus 1732-1734
Thän Swänska Argus (1732–1734) by Olof Dalins digitized by Fornsvenska textbanken
260,093 Swedish
Fornsvenska textbankens material: Nysvenska bibelböcker
44,990 Swedish
Fornsvenska textbankens material: Nysvenska krönikor
333,670 Swedish
Fornsvenska textbankens material: Nysvenska lagar
22,701 Swedish
Fornsvenska textbankens material: Nysvenska, övrigt
344,590 Swedish
Fornsvenska textbankens material: Profan prosa
239,805 Swedish
Fornsvenska textbankens material: Verser
184,059 Swedish
Fornsvenska textbankens material: Yngre lagar
117,773 Swedish
Fornsvenska textbankens material: Yngre religiös prosa
1,365,544 Swedish
Fornsvenska textbankens material: Yngre tänkeböcker
160,831 Swedish
FTS - Faroese text collection
Faroese text collection, in cooperation with University of Faroe Islands, Fróðskaparsetur Føroya, http://www.setur.fo/
1,142,289 Faroese
Collection
Göteborgsposten
A corpus with texts from the newspaper Göteborgs-Posten
Swedish
Interrogations
The corpus is protected, contact Ylva Burman (ylva.byrman@svenska.gu.se) for more information and access.
49,299 Swedish
Collection
Kubhist
Diachronic collection of Swedish historical newspaper texts from the period of 1749–1926
Swedish
Collection
Kubhist 2
Diachronic collection of Swedish historical newspaper texts from the period of 1645–1926. Kubhist 2 is an updated version av Kubhist with improved OCR and more material.
Swedish
Collection
Kubord 1
Word frequencies from modern newspaper texts from the National Library of Sweden
Swedish
Collection
Kubord 2
Word relations from modern newspaper texts from the National Library of Sweden
Swedish
Collection
Kvinnotidningar
Material from historical women's periodicals
Swedish
Collection
Läkartidningen medical journal
Corpus for health care technical language
Swedish
Linguistic Survey of India (LSI)
1,193,437 English
MAÞiR Trees
An Old Swedish treebank, with lemma, parts-of-speech, and PROIEL-style dependency syntax.
33,721 Swedish
Collection
Medieval letters
Swedish medieval charters from Diplomatarium Suecanum (Svenskt Diplomatariums huvudkartotek, SDHK)
Latin, German, Norwegian, Swedish
Collection
Old Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Swedish
OpenEDGeS
The public license subset of the EDGeS Diachronic Bible Corpus, a diachronically and synchronically parallel corpus of Bible translations in Dutch,English, German and Swedish, with texts from the 14th century until today.
19,399,149 Swedish, English, German, Dutch
Oral Copus for Reference of Contemporary Spanish
Corpus with transcriptions from recorded audio tapes from 1991 to 1992. Part of SOL - Spanish Online
1,200,830 Spanish
Collection
Press
Swedish press
Swedish
Questions and answers about the Swedish language
Counselling mails of the Language Council of Sweden
20,083,415 Swedish
Collection
Riksdag of the Estates
Collection of textual documents from the Swedish Riksdag of the Estates
Swedish
Collection
Riksdagens öppna data
Data from the Swedish parliament collected from data.riksdagen.se
Swedish
ScandiSent
Sentiment Corpus for Swedish, Norwegian, Danish, Finnish and English crawled from trustpilot.
Swedish, Norwegian Bokmål, Danish, English, Finnish
SemEval2020 Task 1
Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (extracts from Kubhist v2)
182,000,000 Swedish
Collection
Somali corpora
A collection of Samli corpora
Somali
Somali: Af Soomaali 1971-79
50,794 Somali
Somali: Af-Soomaali 2001 Somaliland
35,043 Somali
Somali: Af-Soomaali 2001 Soomaaliya
129,947 Somali
Somali: Af-Soomaali 2006 Itoobiya
64,351 Somali
Somali: Af-Soomaali 2010 Somaliland
51,513 Somali
Somali: Af-Soomaali 2013 Somaliland
25,247 Somali
Somali: Af-Soomaali 2018 Soomaaliya
15,677 Somali
Somali: Afka Hooyo 1992-02 Kanada
706 Somali
Somali: Afka Hooyo 2010–19 Iswiidhan
21,542 Somali
Somali: BBC
82,437 Somali
Somali: Caafimaad 1972–79
13,550 Somali
Somali: Caafimaad 1994
8,977 Somali
Somali: Cilmi-Afeed
190,429 Somali
Somali: Cilmiga Bulshada 1980-89
4,951 Somali
Somali: Cilmiga Bulshada 2001 Somaliland
30,258 Somali
Somali: Cilmiga Bulshada 2001-03 Soomaaliya
48,234 Somali
Somali: Cilmiga Bulshada 2010 Somaliland
11,713 Somali
Somali: Cilmiga Bulshada 2011 Itoobiya
30,124 Somali
Somali: Cilmiga Bulshada 2016 Somaliland
54,498 Somali
Somali: Cilmiga Bulshada 2018 Soomaaliya
42,557 Somali
Somali: Cilmiga Deegaanka 2012 Itoobiya
56,874 Somali
Somali: Golaha Wakiillada Somaliland
539,206 Somali
Somali: Haatuf News 2002
1,495,343 Somali
Somali: Haatuf News 2003
2,359,710 Somali
Somali: Haatuf News 2004
1,813,484 Somali
Somali: Haatuf News 2005
2,003,060 Somali
Somali: Haatuf News 2006
2,125,632 Somali
Somali: Haatuf News 2007
1,758,810 Somali
Somali: Haatuf News 2008
1,286,309 Somali
Somali: Haatuf News 2009
393,199 Somali
Somali: Maaddooyinka Kale 1972–79
14,908 Somali
Somali: Ogaden Online
98,454 Somali
Somali: Qoraallo 1956-1970
14,153 Somali
Somali: Qur’aan
141,555 Somali
Somali: Radio Muqdisho
22,801 Somali
Somali: Saynis 1972–77
112,845 Somali
Somali: Saynis 1980–89
33,034 Somali
Somali: Saynis 1994–96
60,787 Somali
Somali: Saynis 2001 Somaliland
29,988 Somali
Somali: Saynis 2001 Soomaaliya
4,659 Somali
Somali: Saynis 2010 Somaliland
30,471 Somali
Somali: Saynis 2011 Soomaaliya
45,689 Somali
Somali: Saynis 2016 Somaliland
31,196 Somali
Somali: Saynis 2018 Soomaaliya
30,786 Somali
Somali: Sheekooyin Carruureed
26,003 Somali
Somali: Sheekooyin Carruureed (Turjuman)
13,865 Somali
BibTeX list (experimental feature)