Language resources

On this page you can browse and search our datasets. Click on a row name to see what files are available for download. You can go directly to the search interface by clicking on the tool logo.

All (1329) Collections (33) Corpora (1221) Lexicons (80) Training and evaluation data (39) Models (28)

Number of hits: 1329

Resurs	Typ	Språk	Åtkomst
Språkprov SO 2009 De drygt 94 000 språkexemplen är hämtade ur Svensk ordbok utgiven av Svenska Akademien (2009). Exemplens uppgift är att stödja ordboksdefinitionerna och att ge information om uppslagsordens fraseologi.	Corpus	Swedish	Explore in:
SUC 2.0 Stockholm-Umeå corpus 2.0	Corpus	Swedish	Word statistics: stats_SUC2.txt.zip 2025-04-22 – 1.34 MB – CC-BY-4.0
SUC 3.0 Stockholm-Umeå corpus 3.0	Corpus	Swedish	Dataset: suc3.xml.bz2 2024-06-03 – 84.44 MB – CC-BY-4.0 Word statistics: stats_suc3.csv.zip 2025-04-22 – 1.43 MB – CC-BY-4.0 Explore in:
SUC Novels (StorSUC) Stockholm-Umeå corpus	Corpus	Swedish	Dataset: storsuc.xml.bz2 2017-04-26 – 68.28 MB – CC-BY-4.0 Word statistics: stats_STORSUC.txt.zip 2025-04-22 – 2.25 MB – CC-BY-4.0 Explore in:
SUCX 2.0 Stockholm-Umeå corpus 2.0 scrambled	Corpus	Swedish	Dataset: suc2.xml.bz2 2017-05-19 – 17.68 MB – CC-BY-SA-4.0 Word statistics: stats_SUC2.txt.zip 2025-04-22 – 1.34 MB – CC-BY-SA-4.0 Explore in:
SUCX 3.0 Stockholm-Umeå corpus 3.0 scrambled	Corpus	Swedish	Dataset: suc3.xml.bz2 2024-06-03 – 84.44 MB – CC-BY-SA-4.0 Word statistics: stats_suc3.csv.zip 2025-04-22 – 1.43 MB – CC-BY-4.0 Explore in:
Collection SuperLim 2 A standardized suite for evaluation and analysis of Swedish natural language understanding systems.	Corpus	Swedish	Dataset: SuperLim-2.0.5.zip 2025-12-12 – 62.75 MB – CC-BY-4.0 Dataset: SuperLim_maintenance.odt 2025-12-12 – 8.01 KB
SuperSim (repackaged for Superlim) 2.0 A dataset for word similarity and relatedness in Swedish	Corpus	Swedish	Dataset: supersim-superlim.zip 2023-03-30 – 70.45 KB – CC-BY-4.0
sv-COVID-19 A compilation of various articles related to the COVID-19 pandemic	Corpus	Swedish	Dataset: sv-covid-19.xml.bz2 2025-02-20 – 216.31 MB – CC-BY-4.0 Word statistics: stats_sv-covid-19.csv.zip 2025-04-22 – 2.45 MB – CC-BY-4.0 Explore in:
SVALex SVALex is a lexicon of receptive vocabulary for Swedish as a second language	Lexicon	Swedish	Dataset: svalex_xlsx.tar.bz2 2025-01-24 – 2.16 MB – CC-BY-NC-SA-4.0 Dataset: svalex_tsv.tar.bz2 2025-01-24 – 203.25 KB – CC-BY-NC-SA-4.0 Explore in:
Svensk Tidskrift 27 annual volumes of the conservative journal Svensk Tidskrift, from 1891 to 1940	Corpus	Swedish	Dataset: runeberg-svtidskr.xml.bz2 2014-12-08 – 93.06 MB – CC-BY-4.0 Word statistics: stats_RUNEBERG-SVTIDSKR.txt.zip 2025-04-22 – 4.24 MB – CC-BY-4.0 Explore in:
Svenska MWELex Swe-MWELex is a sense-based word list of multi-word expressions that learners of Swedish as a second language can handle at the different levels of proficiency (according to the CEFR scale). The word list features MWE items and their frequencies from essays (productive vocabulary, based on SweLL-pilot) and from course books (receptive vocabulary, based on COCTAILL). Besides, each MWE has been classified by its type (based on their syntactic and lexical characteristics), as well as by a subgroup within the group of verbal MWEs)	Lexicon	Swedish	Dataset: swe-mwelex.xlsx 2025-03-12 – 184.75 KB – CC-BY-4.0 Dataset: swe-mwelex.csv 2025-02-20 – 414.88 KB – CC-BY-NC-SA-4.0 Explore in:
Collection SVT news News texts from svt.se	Corpus	Swedish	See 21 collected resources Explore in:
SVT news 2004 News texts from svt.se	Corpus	Swedish	Dataset: svt-2004.xml.bz2 2022-12-06 – 12.54 MB – CC-BY-4.0 Word statistics: stats_svt-2004.csv.zip 2025-04-22 – 2.17 MB – CC-BY-4.0 Explore in:
SVT news 2005 News texts from svt.se	Corpus	Swedish	Dataset: svt-2005.xml.bz2 2022-12-06 – 94.29 MB – CC-BY-4.0 Word statistics: stats_svt-2005.csv.zip 2025-04-22 – 15.38 MB – CC-BY-4.0 Explore in:
SVT news 2006 News texts from svt.se	Corpus	Swedish	Dataset: svt-2006.xml.bz2 2022-12-06 – 120.32 MB – CC-BY-4.0 Word statistics: stats_svt-2006.csv.zip 2025-04-22 – 18.43 MB – CC-BY-4.0 Explore in:
SVT news 2007 News texts from svt.se	Corpus	Swedish	Dataset: svt-2007.xml.bz2 2022-12-06 – 159.96 MB – CC-BY-4.0 Word statistics: stats_svt-2007.csv.zip 2025-04-22 – 22.97 MB – CC-BY-4.0 Explore in:
SVT news 2008 News texts from svt.se	Corpus	Swedish	Dataset: svt-2008.xml.bz2 2022-12-06 – 221.24 MB – CC-BY-4.0 Word statistics: stats_svt-2008.csv.zip 2025-04-22 – 29.19 MB – CC-BY-4.0 Explore in:
SVT news 2009 News texts from svt.se	Corpus	Swedish	Dataset: svt-2009.xml.bz2 2022-12-06 – 254.45 MB – CC-BY-4.0 Word statistics: stats_svt-2009.csv.zip 2025-04-22 – 32.25 MB – CC-BY-4.0 Explore in:
SVT news 2010 News texts from svt.se	Corpus	Swedish	Dataset: svt-2010.xml.bz2 2022-12-06 – 284.46 MB – CC-BY-4.0 Word statistics: stats_svt-2010.csv.zip 2025-04-22 – 34.85 MB – CC-BY-4.0 Explore in:
SVT news 2011 News texts from svt.se	Corpus	Swedish	Dataset: svt-2011.xml.bz2 2022-12-06 – 268.69 MB – CC-BY-4.0 Word statistics: stats_svt-2011.csv.zip 2025-04-22 – 33.23 MB – CC-BY-4.0 Explore in:
SVT news 2012 News texts from svt.se	Corpus	Swedish	Dataset: svt-2012.xml.bz2 2022-12-06 – 273.87 MB – CC-BY-4.0 Word statistics: stats_svt-2012.csv.zip 2025-04-22 – 32.61 MB – CC-BY-4.0 Explore in:
SVT news 2013 News texts from svt.se	Corpus	Swedish	Dataset: svt-2013.xml.bz2 2022-12-06 – 397.91 MB – CC-BY-4.0 Word statistics: stats_svt-2013.csv.zip 2025-04-22 – 43.4 MB – CC-BY-4.0 Explore in:
SVT news 2014 News texts from svt.se	Corpus	Swedish	Dataset: svt-2014.xml.bz2 2022-12-07 – 454.63 MB – CC-BY-4.0 Word statistics: stats_svt-2014.csv.zip 2025-04-22 – 47.99 MB – CC-BY-4.0 Explore in:
SVT news 2015 News texts from svt.se	Corpus	Swedish	Dataset: svt-2015.xml.bz2 2022-12-07 – 539.73 MB – CC-BY-4.0 Word statistics: stats_svt-2015.csv.zip 2025-04-22 – 54.12 MB – CC-BY-4.0 Explore in:
SVT news 2016 News texts from svt.se	Corpus	Swedish	Dataset: svt-2016.xml.bz2 2022-12-07 – 613.63 MB – CC-BY-4.0 Word statistics: stats_svt-2016.csv.zip 2025-04-22 – 58.8 MB – CC-BY-4.0 Explore in:
SVT news 2017 News texts from svt.se	Corpus	Swedish	Dataset: svt-2017.xml.bz2 2022-12-07 – 601.37 MB – CC-BY-4.0 Word statistics: stats_svt-2017.csv.zip 2025-04-22 – 56.78 MB – CC-BY-4.0 Explore in:
SVT news 2018 News texts from svt.se	Corpus	Swedish	Dataset: svt-2018.xml.bz2 2022-12-07 – 533.68 MB – CC-BY-4.0 Word statistics: stats_svt-2018.csv.zip 2025-04-22 – 52.65 MB – CC-BY-4.0 Explore in:
SVT news 2019 News texts from svt.se	Corpus	Swedish	Dataset: svt-2019.xml.bz2 2022-12-07 – 515.99 MB – CC-BY-4.0 Word statistics: stats_svt-2019.csv.zip 2025-04-22 – 51.21 MB – CC-BY-4.0 Explore in:
SVT news 2020 News texts from svt.se	Corpus	Swedish	Dataset: svt-2020.xml.bz2 2022-12-07 – 453.02 MB – CC-BY-4.0 Word statistics: stats_svt-2020.csv.zip 2025-04-22 – 45.42 MB – CC-BY-4.0 Explore in:
SVT news 2021 News texts from svt.se	Corpus	Swedish	Dataset: svt-2021.xml.bz2 2022-12-07 – 424.19 MB – CC-BY-4.0 Word statistics: stats_svt-2021.csv.zip 2025-04-22 – 43.7 MB – CC-BY-4.0 Explore in:
SVT news 2022 News texts from svt.se	Corpus	Swedish	Dataset: svt-2022.xml.bz2 2023-08-30 – 395.67 MB – CC-BY-4.0 Word statistics: stats_svt-2022.csv.zip 2025-04-22 – 16.01 MB – CC-BY-4.0 Explore in:
SVT news 2023 News texts from svt.se	Corpus	Swedish	Dataset: svt-2023.xml.bz2 2023-08-29 – 211.47 MB – CC-BY-4.0 Word statistics: stats_svt-2023.csv.zip 2025-04-22 – 9.7 MB – CC-BY-4.0 Explore in:
SVT news unknown date News texts from svt.se	Corpus	Swedish	Dataset: svt-nodate.xml.bz2 2023-02-08 – 862.74 KB – CC-BY-4.0 Word statistics: stats_svt-nodate.csv.zip 2025-04-22 – 105.07 KB – CC-BY-4.0 Explore in:
SW1203 v1	Corpus	Swedish	Word statistics: stats_SW1203V1.txt.zip 2025-04-22 – 69.06 KB – CC-BY-4.0 Explore in:
SW1203-essays Essays written by L2 Swedish language learners, university courses	Corpus	Swedish	Word statistics: stats_SW1203.txt.zip 2025-04-22 – 71.13 KB – CC-BY-4.0 Explore in:
SW1203-uppsatser version 2	Corpus	Swedish	Word statistics: stats_SW1203V2.txt.zip 2025-04-22 – 70.02 KB – CC-BY-4.0 Explore in:
Swe-NERC A resource for training and evaluation of Named Entity Recognition for Swedish.	Corpus	Swedish
Swedberg's Swensk Ordabok Swedberg's Swensk Ordabok	Lexicon	Swedish, Latin	Dataset: swedberg.xml 2017-09-19 – 8.89 MB – CC-BY-4.0 Explore in:
Swedberg's Swensk Ordabok (morphology, rudimentary) Swedberg's Swensk Ordabok (morphology, rudimentary)	Lexicon	Swedish	Dataset: swedbergm.xml 2017-09-19 – 5.76 MB – CC-BY-4.0 Explore in:
SweDiagnostics Swedish version of (Super)GLUE Diagnostic	Corpus	Swedish	Dataset: swediagnostics.zip 2023-04-04 – 72.89 KB – CC-BY-4.0
Swedish ABSAbank An annotated Swedish corpus for aspect-based sentiment analysis	Corpus	Swedish	Dataset: swe-absa-bank.zip 2020-03-04 – 128.55 MB – CC-BY-4.0 Dataset: absabankimm-combined.zip 2023-02-20 – 15.87 MB – CC-BY-4.0
Swedish ABSAbank-Imm 1.1 An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)	Corpus	Swedish	Dataset: absabank-imm.zip 2023-03-30 – 1.03 MB – CC-BY-4.0
Swedish analogy 2.0 Swedish semantic and syntactic similarity	Corpus	Swedish	Dataset: sweanalogy.zip 2023-03-30 – 178.63 KB – CC-BY-4.0
Swedish Bible 1873 Swedish translation of the Bible from 1873	Corpus	Swedish	Dataset: bibel1873dalin.xml.bz2 2015-05-20 – 5.84 MB – CC-BY-4.0 Word statistics: stats_BIBEL1873DALIN.txt.zip 2025-04-22 – 306.18 KB – CC-BY-4.0 Explore in:
Swedish Bible 1917 Official Swedish translation of the Bible from 1917	Corpus	Swedish	Dataset: bibel1917.xml.bz2 2015-05-19 – 7.5 MB – CC-BY-4.0 Word statistics: stats_BIBEL1917.txt.zip 2025-04-22 – 359.83 KB – CC-BY-4.0 Explore in:
Swedish book reviews Texts from newspapers and magazines, manually annotated with book reviews.	Corpus	Swedish	Dataset: kno-dagny.zip 2025-12-15 – 13.74 MB – CC-BY-4.0 Dataset: kno-oob.zip 2025-12-15 – 70.13 MB – CC-BY-4.0
Swedish Code of Statutes Swedish Code of Statutes 1880-01-01 – 2023-12-15	Corpus	Swedish	Dataset: sfs.xml.bz2 2024-05-13 – 325.85 MB – CC-BY-4.0 Word statistics: stats_sfs.csv.zip 2025-04-22 – 3.04 MB – CC-BY-4.0 Explore in:
Swedish cognitive tests (synthetic data) Swedish cognitive tests refer to a collection of neuropsychological assessments used to evaluate cognitive functions, particularly language and executive functions.	Corpus	Swedish	Dataset: sweBNT-syntheticData_v3.xlsx 2026-03-27 – 29.4 KB – CC-BY-4.0 Dataset: sweSVF-syntheticData_v3.xlsx 2026-03-27 – 20.14 KB – CC-BY-4.0 Dataset: sweFAS-syntheticData_v3.xlsx 2026-03-27 – 23.35 KB – CC-BY-4.0 Dataset: sweTripToStockholm-1-syntheticData_v1.txt 2026-03-26 – 1.89 KB – CC-BY-4.0 Dataset: sweTripToStockholm-2-syntheticData_v1.txt 2026-03-26 – 2.36 KB – CC-BY-4.0 Dataset: sweCookieTheft-syntheticData_v1.txt 2026-03-26 – 3 KB – CC-BY-4.0
Swedish Diachronic Word Embeddings Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data	Model	Swedish	Dataset: HENGCHEN-TAHMASEBI_-_2020_-_Kubhist2_diachronic_embeddings.zip 2024-01-25 – 15.13 GB – CC-BY-4.0
Swedish Drama Dialogue Swedish Drama Dialogue is a corpus of Swedish drama dialogue from the period 1730-1950, consisting of 34 complete dramas.	Corpus	Swedish	Dataset: dramadialog.xml.bz2 2026-04-20 – 9.62 MB – CC-BY-4.0 Word statistics: stats_dramadialog.csv.zip 2026-04-23 – 694.41 KB – CC-BY-4.0 Explore in:
Swedish EAT: question classification A translated version of the QAQC dataset for expected-answer-type classification.	Corpus	Swedish	Dataset: swe_qaqc_train.csv 2023-06-08 – 361.34 KB – CC-BY-4.0 Dataset: Swedish_EAT_v1.0.tsv 2023-06-08 – 2.05 KB – CC-BY-4.0
Swedish fraktur 1626-1816 A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.	Corpus	Swedish	Dataset: svensk-fraktur-1626-1816.tar.gz 2021-11-26 – 757.73 MB – CC-BY-4.0
Swedish framenet (SweFN) A lexical semantic resource based on the same principles as the English Berkeley FrameNet. This part of the resource contains the corpus examples, automatically enriched with linguistic information.	Corpus	Swedish	Dataset: swefn-ex.xml.bz2 2021-11-25 – 3.62 MB – CC-BY-4.0 Word statistics: stats_swefn-ex.csv.zip 2025-04-22 – 465.83 KB – CC-BY-4.0 Explore in:
Swedish FrameNet (SweFN) A lexical semantic resource based on the same principles as the English Berkeley FrameNet. This part of the resource contains the frames and the manually annotated semantic content.	Lexicon	Swedish	Dataset: swefn.xml 2021-11-09 – 7 MB – CC-BY-4.0 Dataset: swefn-full.zip 2021-12-21 – 7.53 MB – CC-BY-4.0 Explore in:
Swedish FrameNet 2.0 (SweFN) A lexical semantic resource based on the same principles as the English Berkeley FrameNet. This version is updated to correspond to BFN 1.7.	Lexicon	Swedish	Dataset: swefn-2-0.json.zip 2024-10-16 – 1006.51 KB – CC-BY-4.0 Dataset: swefn-2-0.tsv.zip 2024-10-16 – 969.61 KB – CC-BY-4.0
Swedish names 2023 Lists of given names in Sweden 2023, intended to help with combining spelling variations.	Lexicon	Swedish	Dataset: swename2023.zip 2025-07-17 – 1.28 MB – CC-BY-4.0 Explore in:
Swedish newspapers 1818-1870 A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.	Corpus	Swedish	Dataset: svenska-tidningar-1818-1870.tar.gz 2020-05-26 – 458.22 MB – CC-BY-4.0
Swedish newspapers 1871-1906 A selection of Swedish newspapers printed between 1871 and 1906 from the collections of Kungliga biblioteket (KB). For OCR analysis.	Corpus	Swedish	Dataset: svenska-tidningar-1871-1906.tar.gz 2022-05-03 – 831.74 MB – CC-BY-4.0
Swedish party programs and election manifestos Swedish political party programs and election manifestos 1887–2024	Corpus	Swedish	Dataset: vivill.xml.bz2 2024-06-10 – 165.57 MB – CC-BY-4.0 Word statistics: stats_vivill.csv.zip 2025-04-22 – 1.16 MB – CC-BY-4.0 Explore in:
Swedish Prose Fiction 1800–1900 All Swedish fiction published for the first time during the years 1800, 1820, 1840, 1860, 1880 and 1900	Corpus	Swedish	Dataset: spf.xml.bz2 2017-05-19 – 231.69 MB – CC-BY-4.0 Word statistics: stats_SPF.txt.zip 2025-04-22 – 3.68 MB – CC-BY-4.0 Explore in:
Swedish treebank A Swedish treebank built from recycled language resources	Corpus	Swedish
Swedish Twitter 2015 Material collected from a selection of Swedish speaking twitter users from 2015	Corpus	Swedish	Word statistics: stats_TWITTER-2015.txt.zip 2025-04-22 – 145.29 MB – CC-BY-4.0 Explore in:
Swedish Twitter 2016 Material collected from a selection of Swedish speaking twitter users from 2016	Corpus	Swedish	Word statistics: stats_TWITTER-2016.txt.zip 2025-04-22 – 185.83 MB – CC-BY-4.0 Explore in:
Swedish Twitter 2017 Material collected from a selection of Swedish speaking twitter users from 2017	Corpus	Swedish	Word statistics: stats_TWITTER-2017.txt.zip 2025-04-22 – 149.19 MB – CC-BY-4.0 Explore in:
Swedish Wikipedia Corpus of Swedish Wikipedia	Corpus	Swedish	Dataset: wikipedia-sv.xml.bz2 2023-05-12 – 3.59 GB – CC-BY-4.0 Word statistics: stats_wikipedia-sv.csv.zip 2025-04-22 – 196.44 MB – CC-BY-4.0 Explore in:
Swedish words, LEXIN Lexicon for immigrants. Second edition	Lexicon	Swedish, Albanian, Bosnian, English, Finnish, Modern Greek (1453-), Croatian, Kurdish, Iranian Persian, Russian, Serbian, Somali, Spanish, Turkish	Dataset: LEXIN.zip 2024-01-25 – 1.05 MB – CC-BY-4.0 Explore in:
Swedish-Finnish word lists Swedish-Finnish word lists within various domains	Lexicon	Swedish
SweDN 1.0 A Swedish text summarization corpus	Corpus	Swedish
SweFAQ 2.0 Frequently asked questions from Swedish authorities' websites with shuffled answers	Corpus	Swedish	Dataset: swefaq.zip 2023-03-30 – 89.81 MB – CC-BY-4.0
SweFraCas 1.0 Textual inference/entailment problem set	Corpus	Swedish	Dataset: swefracas.tsv 2021-06-10 – 100.92 KB – CC-BY-4.0 Dataset: swefracas_documentation_sheet.tsv 2021-06-15 – 4.23 KB – CC-BY-4.0
Collection SweLL SweLL -- Swedish Learner Language -- is a collection of SweLL corpora and derivative resources coming from these corpora. SweLL corpora consisf of learner texts written by learners with other mother tongues than Swedish. All texts have been collected in test situations (none of them coming from home-written tasks).	Corpus	Swedish, Multiple languages	See 12 collected resources
SweLL-gold Essays written by adult learners of Swedish, manually pseudonymized and correction annotated. The corpus contains both the original learner text and a corrected version of each essay. Collection period 2017-2020.	Corpus	Swedish	Word statistics: stats_SWELLV1-ORIGINAL.txt.zip 2025-04-22 – 147.52 KB – CC-BY-4.0 Word statistics: stats_SWELLV1-TARGET.txt.zip 2025-04-22 – 132.13 KB – CC-BY-4.0 Explore in:
Collection SweLL-pilot Essays written by adult learners of Swedish, manually labeled with the CEFR levels (a European scale of language proficiency levels within language learning). Collection period 2006-2015.	Corpus	Swedish	See 3 collected resources Explore in:
SweLLex SweLLex is a lexicon of productive vocabulary for Swedish as a second language	Lexicon	Swedish	Dataset: SweLLex_v1_xlsx.tar.bz2 2025-01-24 – 3.21 MB – CC-BY-4.0 Dataset: SweLLex_v1_tsv.tar.bz2 2025-01-24 – 213.59 KB – CC-BY-4.0 Explore in:
SweNLI 1.0 A Swedish NLI dataset	Corpus	Swedish	Dataset: swenli.zip 2023-03-30 – 55.13 MB – CC-BY-4.0
SweParaphrase 2.0 Semantic Textual Similarity reference data (STS Benchmark).	Corpus	Swedish	Dataset: sweparaphrase.zip 2023-03-30 – 750.9 KB – CC-BY-4.0
SweSAT Swedish Scholastic Aptitude Test Synonyms 1.1 Swedish Scholastic Aptitude Test Synonyms	Lexicon	Swedish	Dataset: swesat-synonyms.zip 2023-03-30 – 37.73 KB – CC-BY-4.0
Swesaurus A Swedish WordNet	Lexicon	Swedish	Dataset: swesaurus.xml 2017-09-19 – 12.16 MB – CC-BY-4.0 Explore in:
SweWiC 2.0 A Swedish Word-in-Context dataset	Corpus	Swedish	Dataset: swewic.zip 2023-03-30 – 587.65 KB – CC-BY-4.0
SweWinogender 2.0 A Swedish dataset for coreference and gender bias	Corpus	Swedish	Dataset: swewinogender.zip 2023-03-30 – 28.3 KB – CC-BY-4.0
SweWinograd 2.0 A Swedish dataset for pronoun resolution	Corpus	Swedish	Dataset: swewinograd.zip 2023-03-30 – 33.41 KB – CC-BY-4.0
Syntag treebank A Swedish treebank with syntactic analysis of 158 articles from Press-65.	Corpus	Swedish	Dataset: syntag.txt 2010-02-08 – 4.45 MB – CC-BY-4.0 Dataset: syntag.html 2010-05-24 – 10.15 MB – CC-BY-4.0
Sæmundaredda Ancient Icelandic poetry collection also known as The King's Book	Corpus	Old Norse	Dataset: eddan.xml.bz2 2015-01-21 – 87.55 KB – CC-BY-4.0 Word statistics: stats_EDDAN.txt.zip 2025-04-22 – 40.11 KB – CC-BY-4.0 Explore in:
Söderwall Dictionary of Old Swedish	Lexicon	Swedish	Dataset: soederwall.xml 2017-09-19 – 23.42 MB – CC-BY-4.0 Explore in:
Söderwall Supplement Dictionary of Old Swedish	Lexicon	Swedish	Dataset: soederwall-supp.xml 2017-09-19 – 15.45 MB – CC-BY-4.0 Explore in:
TalbankenSBX Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.	Corpus	Swedish	Dataset: talbanken.xml.bz2 2017-06-07 – 1.54 MB – CC-BY-4.0 Word statistics: stats_TALBANKEN.txt.zip 2025-04-22 – 206.82 KB – CC-BY-4.0 Dataset: changelog.txt 2020-06-11 – 316 bytes – CC-BY-4.0 Dataset: TalbankenSBX_morphsplit20200610.zip 2020-06-11 – 3.64 MB – CC-BY-4.0 Dataset: TalbankenSBX_syntsplit20200610.zip 2020-06-11 – 807.09 KB – CC-BY-4.0 Explore in:
TalbankenSTB Talbanken is a Swedish treebank.	Corpus	Swedish	Dataset: TalbankenSTB.zip 2020-08-11 – 2.6 MB – CC-BY-4.0 Dataset: TalbankenSTB_README.txt 2020-08-11 – 1.05 KB – CC-BY-4.0 Dataset: TalbankenSTB_documentation.zip 2020-08-11 – 62.23 KB – CC-BY-4.0 Dataset: TalbankenSTB_datasplit.zip 2020-08-11 – 2.6 MB – CC-BY-4.0 Dataset: TalbankenSTB_original_parts.zip 2020-08-11 – 2.95 MB – CC-BY-4.0
The Arabic E-Book Corpus A collection of 1,745 books in Arabic.	Corpus	Arabic	Dataset: arabic-ebooks.xml.bz2 2025-09-12 – 142.88 MB – CC-BY-4.0 Explore in:
The English-Swedish Parallel Corpus (ESPC) ESPC is a combined comparable and parallel corpus suitable for cross-language research for diffferent types.	Corpus	Swedish, English	Explore in:
The noise of the miners in Falun 1743 – Court records Transcribed and annotated protocols from trials against 79 miners in Falun in 1743. The transcribed texts are partly annotated with place and person names, titles and definitions of archaic words mainly belonging to the mining trade.	Corpus	Swedish	Dataset: gruvdrangarna.xml.bz2 2025-10-17 – 1.96 MB – CC-BY-4.0 Word statistics: stats_gruvdrangarna.csv.zip 2025-10-17 – 175.22 KB – CC-BY-4.0 Explore in:
The Riksdag's open data - Debates Debates from the Swedish parliament in the period 1993/94-2017/18	Corpus	Swedish	Dataset: rd-anf-1993-2018.xml.bz2 2020-03-30 – 2.22 GB – CC-BY-4.0 Word statistics: stats_RD-ANF-1993-2018.txt.zip 2025-04-22 – 8.63 MB – CC-BY-4.0
The Swedish Culturomics Gigaword Corpus One billion Swedish words from 1950 and onwards. Code to extract data from the corpus, as well as usage instructions, can be downloaded from https://svn.spraakbanken.gu.se/sb-arkiv/tools/gigaword/	Corpus	Swedish	Dataset: gigaword-1950-59.tar 2016-06-07 – 92.69 MB – CC-BY-4.0 Dataset: gigaword-1960-69.tar 2016-06-07 – 107.78 MB – CC-BY-4.0 Dataset: gigaword-1970-79.tar 2016-06-07 – 175.03 MB – CC-BY-4.0 Dataset: gigaword-1980-89.tar 2016-06-07 – 217.9 MB – CC-BY-4.0 Dataset: gigaword-1990-99.tar 2016-06-07 – 1.05 GB – CC-BY-4.0 Dataset: gigaword-2000-09.tar 2016-06-07 – 5.48 GB – CC-BY-4.0 Dataset: gigaword-2010-15.tar 2016-06-07 – 4.32 GB – CC-BY-4.0
The Swedish Literature Bank: Free Works E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)	Corpus	Swedish	Dataset: lb-open.xml.bz2 2023-11-13 – 5.75 GB – CC-BY-4.0 Word statistics: stats_lb-open.csv.zip 2025-04-22 – 43.69 MB – CC-BY-4.0 Explore in:
The Swedish Literature Bank: Restricted Works E-texts and searchable facsimiles fron the Swedish Literature Bank (litteraturbanken.se)	Corpus	Swedish	Dataset: lb-restricted.xml.bz2 2023-10-28 – 2.25 GB – CC-BY-4.0 Word statistics: stats_lb-restricted.csv.zip 2025-04-22 – 26.24 MB – CC-BY-4.0 Explore in:
The Swedish PoliGraph An extensible knowledge graph with information on members of the Swedish parliament	Lexicon	Swedish	Dataset: poligraph.tar.bz2 2020-01-14 – 2.29 MB – GPL-3.0-or-later Explore in:
Tiden 30 annual volumes of the socialist journal Tiden, 1909–1940	Corpus	Swedish	Dataset: runeberg-tiden.xml.bz2 2014-12-08 – 89.33 MB – CC-BY-4.0 Word statistics: stats_RUNEBERG-TIDEN.txt.zip 2025-04-22 – 4.11 MB – CC-BY-4.0 Explore in:
TISUS texts Essays written by L2 Swedish learners as part of a TISUS exam	Corpus	Swedish	Explore in:
TISUS v1	Corpus	Swedish	Word statistics: stats_TISUSV1.txt.zip 2025-04-22 – 77.62 KB – CC-BY-4.0 Explore in:
TISUS-texter v2	Corpus	Swedish	Word statistics: stats_TISUSV2.txt.zip 2025-04-22 – 77.18 KB – CC-BY-4.0 Explore in:

Page manager: sb-webb