Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Explore Type Language Download
Samling
SuperLim
A standardized suite for evaluation and analysis of Swedish natural language understanding systems.
Corpus Swedish
CoDeRooMor, v.01
Morphological dataset (word-building morphology), Swedish L2 profiles project,
Lärka
Lexicon Swedish
DaLAJ v.1.0
Dataset for Linguistic Acceptability Judgments (and more), v.1.0., is a collection of sentences from SweLL (Swedish Learner Language) essays. Each DaLAJ sentence contains one error only.
Corpus Swedish
Dalin: Then Swänska Argus 1732-1734
Manual transcription of Then Swänska Argus by Olof von Dalin, Stockholm, 1732–1734. For OCR analysis.
Corpus Swedish
Eukalyptus Treebank of Written Swedish
A treebank with written Swedish data, with parts-of-speech, TIGER-style syntax, multiword expressions and sense annotation
Corpus Swedish
SemEval2020 Task 1
Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (extracts from Kubhist v2)
Corpus Swedish
SIC2 - Stockholm Internet Corpus
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.
Korp
Corpus Swedish
SUC 2.0
Stockholm-Umeå corpus 2.0
Corpus Swedish
SUC 3.0
Stockholm-Umeå corpus 3.0
Korp
Corpus Swedish
SuperSim (repackaged for Superlim)
A test set for word similarity and relatedness in Swedish
Corpus Swedish
SweDiagnostics
Swedish version (Super)GLUE Diagnostic
Corpus Swedish, English
Swedish ABSAbank
An annotated Swedish corpus for aspect-based sentiment analysis
Corpus Swedish
Swedish ABSAbank-Imm 1.0
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Corpus Swedish
Swedish analogy test set v1.0
Swedish semantic and syntactic similarity: test set
Corpus Swedish
Swedish FAQ (mismatched) 1.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Corpus Swedish
Swedish fraktur 1626-1816
A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.
Corpus Swedish
Swedish newspapers 1818-1870
A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus Swedish
Swedish newspapers 1871-1906
A selection of Swedish newspapers printed between 1871 and 1906 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus Swedish
Swedish treebank
A Swedish treebank built from recycled language resources
Corpus Swedish
SweFraCas 1.0
Textual inference/entailment problem set
Corpus Swedish
SweParaphrase
A subset of the Semantic Textual Similarity reference data (STS Benchmark).
Corpus Swedish
SweSAT Swedish Scholastic Aptitude Test Synonyms
Swedish Scholastic Aptitude Test Synonyms
Lexicon Swedish
SweWiC
A Swedish Word-in-Context test set.
Corpus Swedish
SweWinogender
A Swedish test set for coreference and gender bias.
Corpus Swedish
SweWinograd
A Swedish test set for pronoun resolution.
Corpus Swedish
Syntag treebank
A Swedish treebank with syntactic analysis of 158 articles from Press-65.
Corpus Swedish
TalbankenSBX
Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.
Korp
Corpus Swedish
TalbankenSTB
Talbanken is a Swedish treebank.
Corpus Swedish