A standardized suite for evaluation and analysis of Swedish natural language understanding systems.

NB: Many resources in the suite were adapted from some other datasets. The column "Comment" indicates which dataset was used as a source. All changes from the source (if any) are documented on the resource page.

ResourceIdentifierTaskSizeMeasureStatus and dateCommentCitation
Aspect-Based Sentiment Analysis (Immigration)Absabank-ImmLabel the sentiment that the author of a text expressed towards immigration on the 1--5 scale852 documents, 241K tokensSpearman correlation coefficientFor internal review, 2021-03-04reformatted subset of the original datasetOriginal Absabank
4872 paragraphs, 199K tokens
Swedish FAQ (mismatched)FAQMatch the question with the answer within a category292 QA pairs, 31 categoriesAccuracyFor internal review, 2021-03-15new dataset
Högskoleprovet ordförståelseHSP-OrdSelect the correct synonym or description of a word or expression782 expressionsAccuracyFor internal review, 2020-12-16new dataset
Swedish Test Set for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change DetectionLexSemChangeDetermine whether a given word has changed its meaning during a hundred year period31 lemmasAccuracyFor internal review, 2020-12-03= original datasetSemEval 2020-1
Determine to what extent a given word has changed its meaning during a hundred year periodSpearman correlation coefficient
SweWinogenderSweWinogenderCoreference resolution and bias detection624 pronounsFor internal review, 2021-03-08Partial translation of the English Winogender data
SuperLim diagnostic datasetSuperLim DiagnosticNatural language inference of isolated linguistic phenomena.1106 sentence pairsR3Preliminary version, 40% translated, 2021-03-24Swedish translation of original SuperGLUE Diagnostic Dataset

How do I cite?

If you are using the suite as a whole, use the standard reference. If you are using an individual resource, use the citation(s) as mentioned in the table above. If you are uncertain, try to think what is the best way to give credit to people whose work you are using.

Current standard reference: Laddar publikation....

Original Absabank: Laddar publikation...

Swedish Test Set for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection: Laddar publikation...

Most resources do not have any training data!

Yes, in its current version SwedishGLUE is mostly a suite of test sets (however, splits into train, dev and test will be provided for some of the larger resources). We strive to develop it further, which will hopefully result in training data appearing here as well.

I trained a system and want to submit its results. How do I do that?

Instructions will appear here later.

I have a dataset that I think can become part of SwedishGLUE.

Please contact us at