Skip to main content
Språkbanken Text is a department within Språkbanken.

Language resources

On this page you can browse and search our datasets. Click on a row name to see what files are available for download. You can go directly to the search interface by clicking on the tool logo.
Resurs Typ Språk Åtkomst
Af Soomaali 1993-94
Corpus Somali
Af-Soomaali 2016 Somaliland
Corpus Somali
Applications
Anonymised job applications. The corpus is protected, contact Lena Rogström (lena.rogstroem@svenska.gu.se) for more information and access.
Corpus Swedish
Collection
ASPAC
The Amsterdam Slavic Parallel Aligned Corpus
Corpus Swedish, Belarusian, Bulgarian, Czech, German, Lower Sorbian, Modern Greek (1453-), English, Spanish, French, Croatian, Upper Sorbian, Latin, Macedonian, Dutch, Polish, Portuguese, Romanian, Russian, Kele (Papua New Guinea), Slovak, Slovenian, Serbian, Slavomolisano, Turkmen, Ukrainian
Collection
Bicameral Riksdag
Collection of textual documents from the Swedish bicameral parliament data
Corpus Swedish
Collection
Blog mix
Material from a selection of Swedish blogs. Regularly updated.
Corpus Swedish
Caafimaad 1983
Corpus Somali
Corpus of spoken isiXhosa
A corpus of transcribed and annotated recordings of spoken Xhosa.
Corpus
Dalin: Then Swänska Argus 1732-1734
Manual transcription of Then Swänska Argus by Olof von Dalin, Stockholm, 1732–1734. For OCR analysis.
Corpus Swedish
Databank of 1977 Spanish Press
Text from two Spanish newspapers from 1977. Part of SOL - Spanish Online.
Corpus Spanish
Databank of Eleven Spanish Novels 1951–1971
Corpus consisting of 11 Spanish novels. Part of SOL - Spanish Online.
Corpus Spanish
DiabetologNytt (1996–1999)
The paper DiabetologNytt (Diabetologynews) 1996-1999
Corpus Swedish
DReaM
A multilingual corpus of linguistic descriptions of the world's natural languages.
Corpus English
DReaM-Copyright-Protected
A multilingual corpus of linguistic descriptions of the world's natural languages.
Corpus English
Eukalyptus Treebank of Written Swedish
A treebank with written Swedish data, with parts-of-speech, TIGER-style syntax, multiword expressions and sense annotation
Corpus Swedish
Collection
Europarl
European Parliament Proceedings Parallel Corpus
Corpus Swedish, Danish, German, Modern Greek (1453-), English, Spanish, Finnish, French, Italian, Dutch, Portuguese
Collection
Familjeliv
Material from the Familjeliv internet forum
Corpus Swedish
Collection
Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Corpus Swedish
Collection
Flashback
Material from the Flashback internet forum
Corpus Swedish
Collection
Fornsvenska textbankens material
A collection of Old Swedish texts from Fornsvenska textbanken
Corpus Swedish
Fornsvenska textbankens material: Äldre lagar
Corpus Swedish
Fornsvenska textbankens material: Äldre religiös prosa
Corpus Swedish
Fornsvenska textbankens material: Dalin's Then Swänska Argus 1732-1734
Thän Swänska Argus (1732–1734) by Olof Dalins digitized by Fornsvenska textbanken
Corpus Swedish
Fornsvenska textbankens material: Nysvenska bibelböcker
Corpus Swedish
Fornsvenska textbankens material: Nysvenska krönikor
Corpus Swedish
Fornsvenska textbankens material: Nysvenska lagar
Corpus Swedish
Fornsvenska textbankens material: Nysvenska, övrigt
Corpus Swedish
Fornsvenska textbankens material: Profan prosa
Corpus Swedish
Fornsvenska textbankens material: Verser
Corpus Swedish
Fornsvenska textbankens material: Yngre lagar
Corpus Swedish
Fornsvenska textbankens material: Yngre religiös prosa
Corpus Swedish
Fornsvenska textbankens material: Yngre tänkeböcker
Corpus Swedish
FTS - Faroese text collection
Faroese text collection, in cooperation with University of Faroe Islands, Fróðskaparsetur Føroya, http://www.setur.fo/
Corpus Faroese
Collection
Göteborgsposten
A corpus with texts from the newspaper Göteborgs-Posten
Corpus Swedish
Interrogations
The corpus is protected, contact Ylva Burman (ylva.byrman@svenska.gu.se) for more information and access.
Corpus Swedish
Collection
Kubhist
Diachronic collection of Swedish historical newspaper texts from the period of 1749–1926
Corpus Swedish
Collection
Kubhist 2
Diachronic collection of Swedish historical newspaper texts from the period of 1645–1926. Kubhist 2 is an updated version av Kubhist with improved OCR and more material.
Corpus Swedish
Collection
Kubord 1
Word frequencies from modern newspaper texts from the National Library of Sweden
Corpus Swedish
Collection
Kubord 2
Word relations from modern newspaper texts from the National Library of Sweden
Corpus Swedish
Collection
Kubord-fasttext
A collection of fasttext models trained on modern newspaper texts from the National Library of Sweden
Model Swedish
Kubord-fasttext - Aftonbladet 2010–2022 - lemma
Fasttext model trained on Aftonbladet 2010–2022
Model Swedish
Kubord-fasttext - Aftonbladet 2010–2022 - token
Fasttext model trained on Aftonbladet 2010–2022
Model Swedish
Kubord-fasttext - Dagens Nyheter 2010–2022 - lemma
Fasttext model trained on Dagens Nyheter 2010–2022
Model Swedish
Kubord-fasttext - Dagens Nyheter 2010–2022 - token
Fasttext model trained on Dagens Nyheter 2010–2022
Model Swedish
Kubord-fasttext - Göteborgsposten 2013–2022 - lemma
Fasttext model trained on Göteborgsposten 2013–2022
Model Swedish
Kubord-fasttext - Göteborgsposten 2013–2022 - token
Fasttext model trained on Göteborgsposten 2013–2022
Model Swedish
Collection
Kvinnotidningar
Material from historical women's periodicals
Corpus Swedish
Collection
Läkartidningen medical journal
Corpus for health care technical language
Corpus Swedish
LingFN
A Framenet for the linguistic domain
Lexicon Swedish
LingFN-thesis
A Framenet for the linguistic domain
Lexicon Swedish
LingFN-V2
A Framenet for the linguistic domain
Lexicon Swedish
Linguistic Survey of India (LSI)
Corpus English
lsilex
A lexicon developed within the LSI project
Lexicon Swedish
MAÞiR Trees
An Old Swedish treebank, with lemma, parts-of-speech, and PROIEL-style dependency syntax.
Corpus Swedish
MAÞiR Words
Old Swedish lexical resource based upon Söderwall's dictionary, suitable for creating lemmatizers, amonst other things.
Lexicon Swedish
Collection
Medieval letters
Swedish medieval charters from Diplomatarium Suecanum (Svenskt Diplomatariums huvudkartotek, SDHK)
Corpus Latin, German, Norwegian, Swedish
NordiCon
NordiCon is a database that collects medieval North Germanic personal names in sources outside Scandinavia.
Lexicon English
Collection
NPEGL
The Noun Phrases in Early Germanic Languages database.
Lexicon Old English (ca. 450-1100), Old High German (ca. 750-1050), Old Norse, Old Saxon
Collection
Old Finland Swedish
Part of the Finland Swedish language bank over Swedish in Finland today and yesterday
Corpus Swedish
OpenEDGeS
The public license subset of the EDGeS Diachronic Bible Corpus, a diachronically and synchronically parallel corpus of Bible translations in Dutch,English, German and Swedish, with texts from the 14th century until today.
Corpus Swedish, English, German, Dutch
Oral Copus for Reference of Contemporary Spanish
Corpus with transcriptions from recorded audio tapes from 1991 to 1992. Part of SOL - Spanish Online
Corpus Spanish
OSA (SAOB)
The Swedish Academy Dictionary online
Lexicon Swedish
Collection
Press
Swedish press
Corpus Swedish
Pretrained embeddings
A list of pretrained embeddings for Swedish
Model Swedish
Questions and answers about the Swedish language
Counselling mails of the Language Council of Sweden
Corpus Swedish
Collection
Riksdag of the Estates
Collection of textual documents from the Swedish Riksdag of the Estates
Corpus Swedish
Collection
Riksdagens öppna data
Data from the Swedish parliament collected from data.riksdagen.se
Corpus Swedish
ScandiSent
Sentiment Corpus for Swedish, Norwegian, Danish, Finnish and English crawled from trustpilot.
Corpus Swedish, Norwegian Bokmål, Danish, English, Finnish
SemEval2020 Task 1
Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (extracts from Kubhist v2)
Corpus Swedish
Collection
Somali corpora
A collection of Samli corpora
Corpus Somali
Somali: Af Soomaali 1971-79
Corpus Somali
Somali: Af-Soomaali 2001 Somaliland
Corpus Somali
Somali: Af-Soomaali 2001 Soomaaliya
Corpus Somali
Somali: Af-Soomaali 2006 Itoobiya
Corpus Somali
Somali: Af-Soomaali 2010 Somaliland
Corpus Somali
Somali: Af-Soomaali 2013 Somaliland
Corpus Somali
Somali: Af-Soomaali 2018 Soomaaliya
Corpus Somali
Somali: Afka Hooyo 1992-02 Kanada
Corpus Somali
Somali: Afka Hooyo 2010–19 Iswiidhan
Corpus Somali
Somali: BBC
Corpus Somali
Somali: Caafimaad 1972–79
Corpus Somali
Somali: Caafimaad 1994
Corpus Somali
Somali: Cilmi-Afeed
Corpus Somali
Somali: Cilmiga Bulshada 1980-89
Corpus Somali
Somali: Cilmiga Bulshada 2001 Somaliland
Corpus Somali
Somali: Cilmiga Bulshada 2001-03 Soomaaliya
Corpus Somali
Somali: Cilmiga Bulshada 2010 Somaliland
Corpus Somali
Somali: Cilmiga Bulshada 2011 Itoobiya
Corpus Somali
Somali: Cilmiga Bulshada 2016 Somaliland
Corpus Somali
Somali: Cilmiga Bulshada 2018 Soomaaliya
Corpus Somali
Somali: Cilmiga Deegaanka 2012 Itoobiya
Corpus Somali
Somali: Golaha Wakiillada Somaliland
Corpus Somali
Somali: Haatuf News 2002
Corpus Somali
Somali: Haatuf News 2003
Corpus Somali
Somali: Haatuf News 2004
Corpus Somali
Somali: Haatuf News 2005
Corpus Somali
Somali: Haatuf News 2006
Corpus Somali
Somali: Haatuf News 2007
Corpus Somali
Somali: Haatuf News 2008
Corpus Somali
Somali: Haatuf News 2009
Corpus Somali
BibTeX list (experimental feature)