A standardized suite for evaluation and analysis of Swedish natural language understanding systems.


Aspect-Based Sentiment Analysis (Immigration)Label the sentiment that the author of a text expressed towards immigration on the 1--5 scale10-fold cross-validation, consecutive[1]
DaLAJDetermine whether a sentence is correct Swedish or notHold-out[2]
Swedish FAQ (mismatched)Match the question with the answer within a categoryTest only
SweSAT synonymsSelect the correct synonym or description of a word or expressionTest only
Swedish Analogy test setGiven two word pairs A:B and C:D, capture that the relation between A and B is the same as between C and DTest only[3]
Swedish Test Set for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change DetectionDetermine whether a given word has changed its meaning during a hundred year periodTest only[4]
Determine to what extent a given word has changed its meaning during a hundred year period
SweFraCasGiven the question and the premises, choose the suitable answerTest only
SweWinogradResolve pronouns to their antecedents in items constructed to require reasoning (Winograd Schemata)Test only
SweWinogenderFind the correct antecedent of a personal pronoun, avoiding the gender biasTest only[5]
SweDiagnosticsDetermine the logical relation between the two sentencesTest only
SweParaphraseDetermine how similar two sentences areTest only
SuperSimPredict semantic word similarity and/or relatedness between words out of context.Test only[6]
SweWiCSay if instances of a word in two contexts represent the same word sense.Test only

Frequently asked questions

How do I cite SuperLim?

  • To cite the suite as a whole, use the standard reference given below.
  • If you discuss individual resources, for instance when reporting results on different SuperLim tasks, also cite the references for these resources – even if you discuss all of them! The references are given in the table above and in the documentation sheet for each resource.

Standard reference SuperLim:
Loading publication...
[1] Original Absabank:
Loading publication...
[2] DaLAJ:
Volodina, Elena, Yousuf Ali Mohammed, and Julia Klezl (2021). DaLAJ - a dataset for linguistic acceptability judgments for Swedish. In Proceedings of the 10th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2021). Linköping Electronic Conference Proceedings 177:3, s. 28-37.
[3] Analogy:
Tosin Adewumi, Foteini Liwicki, Markus Liwicki. (2020). Corpora compared: The case of the Swedish Gigaword & Wikipedia corpora. In: Proceedings of the 8th SLTC, Gothenburg. arXiv preprint arXiv:2011.03281
[4] Swedish Test Set for SemEval 2020 Task 1:
Unsupervised Lexical Semantic Change Detection: Loading publication...
[5] Winogender:
Saga Hansson, Konstantinos Mavromatakis, Yvonne Adesam, Gerlof Bouma and Dana Dannélls (2021). The Swedish Winogender Dataset. In The 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), Reykjavik.
[6] SuperSim:
Hengchen, Simon and Tahmasebi, Nina (2021). SuperSim: a test set for word similarity and relatedness in Swedish. In The 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), Reykjavik. arXiv preprint arXiv:2014.05228

How do different models perform on the SuperLim tasks?

SuperLim does not currently have a leaderboard (though we do hope to create one). As a temporary solution, we will be collecting the available information here. If you have evaluated your model on some of our data, please let us know, we will add your results!

  • Faton Rekathati's (KBLab) evalution: SweParaphrase, Swedish FAQ, SweSAT, SuperSim.

Most resources do not have any training data!

Yes, in its current version SuperLim is mostly a suite of test sets (however, splits into train, dev and test are provided for some of the larger resources). We strive to develop it further, which will hopefully result in training data appearing here as well.

I am using contextualized embeddings (aka dynamic embeddings, aka token embeddings). How should I apply my model to those of your tasks where there is no context (e.g. Analogy)?

There is currently no predefined answer, since we do not want to impose any unnecessary restrictions on how the models solve the tasks. We suggest that you devise the necessary method yourself (you can e.g. average across contextualized embeddings in order to generate "classic/statis/type" embeddings). It is important, however, that you document what you do very clearly (if you average: how exactly? If you use any additional corpora for that, which ones?), since that might affect comparability.

I trained a system and want to submit its results. How do I do that?

Instructions will appear here later. But if you have already trained something (even if it's one task and not the whole set), drop us a line about how it went, it will really help us!

I have a dataset that I think can become part of SuperLim.

Please contact us at Do the same if you have any other question not covered here.

SuperLim or SwedishGlue?

The name of the collection is SuperLim. The initial work on it was funded by the project, which is called SuperLim in Swedish and SwedishGlue in English.

Resource typeCorpus