Skip to main content

Language resources

On this page you can browse and search our corpora and lexicons. Click on a resource name to see what files are available for download. You can go directly to the search interface by clicking on the Korp or Karp logo.
Resource Type Language Access
Argumentation sentences 1.0
A translated corpus for classifying sentence stance in relation to a topic.
Corpus Swedish
CoDeRooMor, v.01
Morphological dataset (word-building morphology), Swedish L2 profiles project
Lexicon Swedish
DaLAJ-GED-SuperLim 2.0
Dataset for Linguistic Acceptability Judgments (and more), v.2.0
Corpus Swedish
Dalin: Then Swänska Argus 1732-1734
Manual transcription of Then Swänska Argus by Olof von Dalin, Stockholm, 1732–1734. For OCR analysis.
Corpus Swedish
Eukalyptus Treebank of Written Swedish
A treebank with written Swedish data, with parts-of-speech, TIGER-style syntax, multiword expressions and sense annotation
Corpus Swedish
SemEval2020 Task 1
Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (extracts from Kubhist v2)
Corpus Swedish
SIC2 - Stockholm Internet Corpus
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.
Corpus Swedish
SUC 2.0
Stockholm-Umeå corpus 2.0
Corpus Swedish
SUC 3.0
Stockholm-Umeå corpus 3.0
Corpus Swedish
Collection
SuperLim 2
A standardized suite for evaluation and analysis of Swedish natural language understanding systems.
Corpus Swedish
SuperSim (repackaged for Superlim) 2.0
A dataset for word similarity and relatedness in Swedish
Corpus Swedish
Swe-NERC
A resource for training and evaluation of Named Entity Recognition for Swedish.
Corpus Swedish
SweDiagnostics
Swedish version of (Super)GLUE Diagnostic
Corpus Swedish
Swedish ABSAbank
An annotated Swedish corpus for aspect-based sentiment analysis
Corpus Swedish
Swedish ABSAbank-Imm 1.1
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Corpus Swedish
Swedish analogy 2.0
Swedish semantic and syntactic similarity
Corpus Swedish
Swedish EAT: question classification
A translated version of the QAQC dataset for expected-answer-type classification.
Corpus Swedish
Swedish fraktur 1626-1816
A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.
Corpus Swedish
Swedish newspapers 1818-1870
A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus Swedish
Swedish newspapers 1871-1906
A selection of Swedish newspapers printed between 1871 and 1906 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus Swedish
Swedish treebank
A Swedish treebank built from recycled language resources
Corpus Swedish
SweDN 1.0
A Swedish text summarization corpus
Corpus Swedish
SweFAQ 2.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Corpus Swedish
SweNLI 1.0
A Swedish NLI dataset
Corpus Swedish
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Corpus Swedish
SweSAT Swedish Scholastic Aptitude Test Synonyms 1.1
Swedish Scholastic Aptitude Test Synonyms
Lexicon Swedish
SweWiC 2.0
A Swedish Word-in-Context dataset
Corpus Swedish
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Corpus Swedish
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Corpus Swedish
Syntag treebank
A Swedish treebank with syntactic analysis of 158 articles from Press-65.
Corpus Swedish
TalbankenSBX
Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.
Corpus Swedish
TalbankenSTB
Talbanken is a Swedish treebank.
Corpus Swedish