Skip to main content
Svenska
English
Språkbanken Text is a part of
Språkbanken
.
News and events
Research
Tools
Data
FAQ
About us
Contact us
Menu
Breadcrumb
Home
Language resources
Language resources
On this page you can browse and search our datasets. Click on a row name to see what files are available for download. You can go directly to the search interface by clicking on the tool logo.
All (1323)
Collections (30)
Corpora (1198)
Lexicons (62)
Training and evaluation data (15)
Models (48)
Name or description
Language
- Any -
Swedish
Albanian
Belarusian
Blissymbols
Bosnian
Bulgarian
Croatian
Czech
Danish
Dutch
English
Estonian
Faroese
Finland Swedish
Finnish
French
German
Icelandic
Iranian Persian
Italian
Kele (Papua New Guinea)
Kurdish
Latin
Latvian
Lower Sorbian
Macedonian
Modern Greek (1453-)
Multiple languages
Norwegian
Norwegian Bokmål
Old English (ca. 450-1100)
Old High German (ca. 750-1050)
Old Norse
Old Saxon
Polish
Portuguese
Romanian
Russian
Serbian
Slavomolisano
Slovak
Slovenian
Somali
Spanish
Turkish
Turkmen
Ukrainian
Upper Sorbian
Xhosa
Resurs
Typ
Språk
Åtkomst
Collection
SweLL
SweLL -- Swedish Learner Language -- is a collection of SweLL corpora and derivative resources coming from these corpora. SweLL corpora consisf of learner texts written by learners with other mother tongues than Swedish. All texts have been collected in test situations (none of them coming from home-written tasks).
Corpus
Swedish, Multiple languages
See 9 collected resources
SweLL v1 original
Corpus
Swedish
Word statistics:
stats_SWELLV1-ORIGINAL.txt
2021-08-15 – 760.53 KB – CC BY 4.0
Explore in:
SweLL v1 target
Corpus
Swedish
Word statistics:
stats_SWELLV1-TARGET.txt
2021-08-15 – 675.14 KB – CC BY 4.0
Explore in:
SweLL-gold
Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.
Corpus
Swedish
Explore in:
SweLL-gold original
Corpus
Swedish
Word statistics:
stats_SWELL-ORIGINAL.txt
2020-06-07 – 172.29 KB – CC BY 4.0
Explore in:
SweLL-gold target
Corpus
Swedish
Word statistics:
stats_SWELL-TARGET.txt
2020-06-07 – 156.58 KB – CC BY 4.0
Explore in:
Collection
SweLL-pilot
Essays written by adult learners of Swedish, manually labeled with the CEFR levels (a European scale of language proficiency levels within language learning). Collection period 2006-2015.
Corpus
Swedish
See 3 collected resources
Explore in:
SweLLex
SweLLex is a lexicon of productive vocabulary for Swedish as a second language
Lexicon
Swedish
Dataset:
SweLLex_v1_xlsx.tar.bz2
2025-01-24 – 3.21 MB – CC BY 4.0
Dataset:
SweLLex_v1_tsv.tar.bz2
2025-01-24 – 213.59 KB – CC BY 4.0
Explore in:
SweNLI 1.0
A Swedish NLI dataset
Corpus
Swedish
Dataset:
swenli.zip
2023-03-30 – 55.13 MB – CC BY 4.0
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Corpus
Swedish
Dataset:
sweparaphrase.zip
2023-03-30 – 750.9 KB – CC BY 4.0
SweSAT Swedish Scholastic Aptitude Test Synonyms 1.1
Swedish Scholastic Aptitude Test Synonyms
Lexicon
Swedish
Dataset:
swesat-synonyms.zip
2023-03-30 – 37.73 KB – CC BY 4.0
Swesaurus
A Swedish WordNet
Lexicon
Swedish
Dataset:
swesaurus.xml
2017-09-19 – 12.16 MB – CC BY 4.0
Explore in:
SweWiC 2.0
A Swedish Word-in-Context dataset
Corpus
Swedish
Dataset:
swewic.zip
2023-03-30 – 587.65 KB – CC BY 4.0
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Corpus
Swedish
Dataset:
swewinogender.zip
2023-03-30 – 28.3 KB – CC BY 4.0
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Corpus
Swedish
Dataset:
swewinograd.zip
2023-03-30 – 33.41 KB – CC BY 4.0
Syntag treebank
A Swedish treebank with syntactic analysis of 158 articles from Press-65.
Corpus
Swedish
Dataset:
syntag.txt
2010-02-08 – 4.45 MB – CC BY 4.0
Dataset:
syntag.html
2010-05-24 – 10.15 MB – CC BY 4.0
Sæmundaredda
Ancient Icelandic poetry collection also known as The King's Book
Corpus
Old Norse
Dataset:
eddan.xml.bz2
2015-01-21 – 87.55 KB – CC BY 4.0
Word statistics:
stats_EDDAN.txt
2015-01-25 – 172.88 KB – CC BY 4.0
Explore in:
TalbankenSBX
Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.
Corpus
Swedish
Dataset:
talbanken.xml.bz2
2017-06-07 – 1.54 MB – CC BY 4.0
Word statistics:
stats_TALBANKEN.txt
2016-03-13 – 1.06 MB – CC BY 4.0
Dataset:
changelog.txt
2020-06-11 – 316 bytes – CC BY 4.0
Dataset:
TalbankenSBX_morphsplit20200610.zip
2020-06-11 – 3.64 MB – CC BY 4.0
Dataset:
TalbankenSBX_syntsplit20200610.zip
2020-06-11 – 807.09 KB – CC BY 4.0
Explore in:
TalbankenSTB
Talbanken is a Swedish treebank.
Corpus
Swedish
Dataset:
TalbankenSTB.zip
2020-08-11 – 2.6 MB – CC BY 4.0
Dataset:
TalbankenSTB_README.txt
2020-08-11 – 1.05 KB – CC BY 4.0
Dataset:
TalbankenSTB_documentation.zip
2020-08-11 – 62.23 KB – CC BY 4.0
Dataset:
TalbankenSTB_datasplit.zip
2020-08-11 – 2.6 MB – CC BY 4.0
Dataset:
TalbankenSTB_original_parts.zip
2020-08-11 – 2.95 MB – CC BY 4.0
The English-Swedish Parallel Corpus (ESPC)
ESPC is a combined comparable and parallel corpus suitable for cross-language research for diffferent types.
Corpus
Swedish, English
Explore in:
The Riksdag's open data - Debates
Debates from the Swedish parliament in the period 1993/94-2017/18
Corpus
Swedish
Dataset:
rd-anf-1993-2018.xml.bz2
2020-03-30 – 2.22 GB – CC BY 4.0
Word statistics:
stats_RD-ANF-1993-2018.txt
2022-09-13 – 44.83 MB – CC BY 4.0
The Swedish Culturomics Gigaword Corpus
One billion Swedish words from 1950 and onwards. Code to extract data from the corpus, as well as usage instructions, can be downloaded from https://svn.spraakbanken.gu.se/sb-arkiv/tools/gigaword/
Corpus
Swedish
Dataset:
gigaword-1950-59.tar
2016-06-07 – 92.69 MB – CC BY 4.0
Dataset:
gigaword-1960-69.tar
2016-06-07 – 107.78 MB – CC BY 4.0
Dataset:
gigaword-1970-79.tar
2016-06-07 – 175.03 MB – CC BY 4.0
Dataset:
gigaword-1980-89.tar
2016-06-07 – 217.9 MB – CC BY 4.0
Dataset:
gigaword-1990-99.tar
2016-06-07 – 1.05 GB – CC BY 4.0
Dataset:
gigaword-2000-09.tar
2016-06-07 – 5.48 GB – CC BY 4.0
Dataset:
gigaword-2010-15.tar
2016-06-07 – 4.32 GB – CC BY 4.0
The Swedish Literature Bank: Free Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
Corpus
Swedish
Dataset:
lb-open.xml.bz2
2023-11-13 – 5.75 GB – CC BY 4.0
Word statistics:
stats_lb-open.csv
2023-11-13 – 253.52 MB – CC BY 4.0
Explore in:
The Swedish Literature Bank: Restricted Works
E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)
Corpus
Swedish
Dataset:
lb-restricted.xml.bz2
2023-10-28 – 2.25 GB – CC BY 4.0
Word statistics:
stats_lb-restricted.csv
2023-10-28 – 148.32 MB – CC BY 4.0
Explore in:
The Swedish PoliGraph
An extensible knowledge graph with information on members of the Swedish parliament
Lexicon
Swedish
Dataset:
poligraph.tar.bz2
2020-01-14 – 2.29 MB – GNU GPLv3 or later
Explore in:
Tiden
30 annual volumes of the socialist journal Tiden, 1909–1940
Corpus
Swedish
Dataset:
runeberg-tiden.xml.bz2
2014-12-08 – 89.33 MB – CC BY 4.0
Word statistics:
stats_RUNEBERG-TIDEN.txt
2015-06-25 – 21.59 MB – CC BY 4.0
Explore in:
TISUS texts
Essays written by L2 Swedish learners as part of a TISUS exam
Corpus
Swedish
Explore in:
TISUS v1
Corpus
Swedish
Word statistics:
stats_TISUSV1.txt
2021-07-04 – 407.41 KB – CC BY 4.0
Explore in:
TISUS-texter v2
Corpus
Swedish
Word statistics:
stats_TISUSV2.txt
2020-11-29 – 403.74 KB – CC BY 4.0
Explore in:
Twitter Mix
Material from a selection of Swedish Twitter users. Is regularly updated.
Corpus
Swedish
Word statistics:
stats_twitter.csv
2022-11-09 – 799.05 MB – CC BY 4.0
Explore in:
Twitter: Party Leader Debate June 2013
Material from Twitter, collected during the party leader debate on June 12th 2013 and a few days before and after
Corpus
Swedish
Word statistics:
stats_TWITTER-PLDEBATT-130612.txt
2017-05-21 – 77.03 MB – CC BY 4.0
Explore in:
Twitter: Party Leader Debate May 2014
Material from Twitter, collected during the party leader debate on May 4th 2013 and a few days before and after
Corpus
Swedish
Word statistics:
stats_TWITTER-PLDEBATT-140504.txt
2017-05-21 – 60.7 MB – CC BY 4.0
Explore in:
Twitter: Party Leader Debate October 2013
Material from Twitter, collected during the party leader debate on June 6th 2013 and a few days before and after
Corpus
Swedish
Word statistics:
stats_TWITTER-PLDEBATT-131006.txt
2017-05-21 – 51.9 MB – CC BY 4.0
Explore in:
UNSC-Graph
An extensible knowledge graph for the UNSC corpus, detailing participants and debates from the UN Security Council 1995-2020
Lexicon
English
Dataset:
unsc-graph-1.0.tar.gz
2023-08-31 – 4.8 MB – GNU GPLv3 or later
Ur Dagens Krönika
Eight annual volumes of the cultural journal Ur Dagens Krönika, 1881–1890
Corpus
Swedish
Dataset:
runeberg-urdagkron.xml.bz2
2014-12-08 – 24.43 MB – CC BY 4.0
Word statistics:
stats_RUNEBERG-URDAGKRON.txt
2015-06-25 – 10.48 MB – CC BY 4.0
Explore in:
Vocation list
A list of vocations in Swedish
Lexicon
Swedish
Dataset:
vocationTerms150120.utf.txt.gz
2024-01-25 – 67.12 KB – CC BY 4.0
Collection
Web News
News from Swedish newspapers' websites
Corpus
Swedish
See 13 collected resources
Explore in:
Web News 2001
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2001.xml.bz2
2024-01-04 – 17.13 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2001.csv
2024-01-05 – 4.82 MB – CC BY 4.0
Explore in:
Web News 2002
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2002.xml.bz2
2022-11-30 – 506.49 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2002.csv
2022-12-01 – 33.03 MB – CC BY 4.0
Explore in:
Web News 2003
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2003.xml.bz2
2022-11-30 – 357.9 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2003.csv
2022-12-01 – 25.69 MB – CC BY 4.0
Explore in:
Web News 2004
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2004.xml.bz2
2022-11-30 – 403.31 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2004.csv
2022-12-01 – 26.62 MB – CC BY 4.0
Explore in:
Web News 2005
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2005.xml.bz2
2024-01-05 – 849.81 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2005.csv
2024-01-06 – 40.67 MB – CC BY 4.0
Explore in:
Web News 2006
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2006.xml.bz2
2022-12-01 – 654.61 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2006.csv
2022-12-02 – 39.04 MB – CC BY 4.0
Explore in:
Web News 2007
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2007.xml.bz2
2022-12-01 – 715.52 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2007.csv
2022-12-02 – 40.8 MB – CC BY 4.0
Explore in:
Web News 2008
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2008.xml.bz2
2022-12-01 – 796.9 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2008.csv
2022-12-02 – 44.26 MB – CC BY 4.0
Explore in:
Web News 2009
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2009.xml.bz2
2024-01-05 – 747.74 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2009.csv
2024-01-06 – 40.43 MB – CC BY 4.0
Explore in:
Web News 2010
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2010.xml.bz2
2022-12-02 – 691.09 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2010.csv
2022-12-02 – 36.94 MB – CC BY 4.0
Explore in:
Web News 2011
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2011.xml.bz2
2022-12-02 – 764.57 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2011.csv
2022-12-03 – 40.69 MB – CC BY 4.0
Explore in:
Web News 2012
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2012.xml.bz2
2022-12-02 – 729.32 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2012.csv
2022-12-03 – 39.72 MB – CC BY 4.0
Explore in:
Web News 2013
News from Swedish newspapers' websites
Corpus
Swedish
Dataset:
webbnyheter2013.xml.bz2
2024-01-05 – 652.09 MB – CC BY 4.0
Word statistics:
stats_webbnyheter2013.csv
2024-01-06 – 36.02 MB – CC BY 4.0
Explore in:
Wexjöbladet 1820's
Part of the collection Kubhist2
Corpus
Swedish
Dataset:
kubhist2-wexjobladet-1820.xml.bz2
2024-01-16 – 36.76 MB – CC BY 4.0
Word statistics:
stats_kubhist2-wexjobladet-1820.csv
2024-01-17 – 8.97 MB – CC BY 4.0
Explore in:
Word Embeddings trained on English Wikipedia
Word Embeddings trained on English Wikipedia
Model
English
Dataset:
wiki_300_5_word2vec.model
2024-01-25 – 112.01 MB – CC BY 4.0
Dataset:
wiki_300_5_word2vec.model.syn1neg.npy
2024-01-25 – 3.75 GB – CC BY 4.0
Dataset:
wiki_300_5_word2vec.model.wv.vectors.npy
2024-01-25 – 3.75 GB – CC BY 4.0
Dataset:
wiki_300_50_word2vec.model
2024-01-25 – 28.04 MB – CC BY 4.0
Dataset:
wiki_300_50_word2vec.model.syn1neg.npy
2024-01-25 – 949.26 MB – CC BY 4.0
Dataset:
wiki_300_50_word2vec.model.wv.vectors.npy
2024-01-25 – 949.26 MB – CC BY 4.0
WordNet-SALDO
A linking between SALDO senses and Core WordNet
Lexicon
Swedish, English
Dataset:
wordnet-saldo.xml
2017-09-19 – 5.71 MB – CC BY 4.0
Explore in:
WordReference
A large corpus of native and non-native written speech in four languages.
Corpus
English, Spanish, French, Italian
Dataset:
wordreference.zip
2020-11-10 – 365.51 MB – CC BY 4.0
Written production in learner French
This corpus contains student texts written by Swedish learners of French
Corpus
French
Pagination
First page
« First
Previous page
‹ Previous
Page
1
Page
2
Page
3
Page
4
Page
5
Page
6
Page
7
Page
8
Page
9
Page
10
Page
11
Page
12
Page
13
News and events
News archive
Conferences and workshops
CLT retreat 2020
AI Trust workshop
Autumn Workshop
Höstworkshop 2025
Höstworkshop 2024
Höstworkshop 2023
Höstworkshop 2022
Höstworkshop 2021
Autumn Workshop 2020
Autumn Workshop 2011 and Korp-release
Autumn Workshop 2012
Autumn Workshop 2013
Autumn Workshop 2014
Autumn Workshop 2015
Autumn Workshop 2016
Autumn Workshop 2017
Autumn Workshop 2018
Autumn Workshop 2019
Språkbanken 40 years
CDLC workshop
CLT workshop Spring 2023
EACL 2014
Korp Workshop
Korp Workshop 2014
Korpworkshop 2018
NoDaLiDa 2017
RESOURCEFUL
SLTC 2020
Programme
Instructions
People
Support
Call for papers
Sustainable language representations
Position statements
Workshop on Profiling second language vocabulary and grammar - 2023
Blog
Calendar
Previous events
Research
Publications
Doktorandutbildning
For PhD students and supervisors
Tools
Korp
User manual
Web API
Distribution and development
Corpus statistics
Sentence sets
Karp
Web API
Sparv
Sparv Pipeline
Sparv's user manual
Annotations by Sparv
Web service (API)
Web Sparv
Mink
User manual
Tutorial
Web API
Privacy and data policy
Lärka
Other tools
Catta
IT-baserad grammatikinlärning
Data
FAQ
About us
Staff
Organisation
Språkbanken Text i världen
Språkbanken 50 years
Celebration
A brief history
PhD program
Teaching
How to cite
Alumni
Meetings and workshops
Kick-off meetings
Kick-off H2021
Kick-off V2021
Kick-off H2020
Kick-off V2020
Kick-off H2019
Kick-off V2019
Kick-off H2018
Kick-off V2018
Kick-off H2017
Kick-off V2017
Kick-off H2016
Kick-off V2016
Kick-off H2015
Workshops
End of the year workshop 2024
End of the year workshop 2023
Semester workshop 2022
Semester workshop H2021
Semester workshop V2021
Semester workshop H2020
Semester workshop V2020
Forskningsmöten
SBX Retreat
SBX Retreat 2024
SBX Retreat 2023
SBX Retreat 2022
Working group meetings
Cookies
Internal
Contact us
Help desk