NB: Many resources in the suite were adapted from some other datasets. The column "Comment" indicates which dataset was used as a source. All changes from the source (if any) are documented on the resource page.
Resource | Identifier | Task | Size | Measure | Status and date | Comment | Citation |
---|---|---|---|---|---|---|---|
Aspect-Based Sentiment Analysis (Immigration) | Absabank-Imm | Label the sentiment that the author of a text expressed towards immigration on the 1--5 scale | 852 documents, 241K tokens | Spearman correlation coefficient | For internal review, 2021-03-04 | reformatted subset of the original dataset | Laddar publikation... |
4872 paragraphs, 199K tokens | |||||||
Swedish FAQ (mismatched) | FAQ | Match the question with the answer within a category | 292 QA pairs, 31 categories | Accuracy | For internal review, 2021-03-15 | new dataset | |
Högskoleprovet ordförståelse | HSP-Ord | Select the correct synonym or description of a word or expression | 782 expressions | Accuracy | For internal review, 2020-12-16 | new dataset | |
Swedish Test Set for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection | LexSemChange | Determine whether a given word has changed its meaning during a hundred year period | 31 lemmas | Accuracy | For internal review, 2020-12-03 | = original dataset | Laddar publikation... |
Determine to what extent a given word has changed its meaning during a hundred year period | Spearman correlation coefficient | ||||||
SweWinogender | SweWinogender | Coreference resolution and bias detection | 624 pronouns | For internal review, 2021-03-08 | Partial translation of the English Winogender data | ||
SuperLim diagnostic dataset | SuperLim Diagnostic | Natural language inference of isolated linguistic phenomena. | 1106 sentence pairs | R3 | Preliminary version, 40% translated, 2021-03-24 | swedish translation of original SuperGLUE Diagnostic Dataset |
How do I cite?
If you are using the suite as a whole, use the standard reference (see below). If you are using an individual resource, use the citation(s) provided in the table above. If you are uncertain, try to think what is the best way to give credit to people whose work you are using.
Current standard reference: Laddar publikation....
Most resources do not have any training data!
Yes, in its current version SwedishGLUE is mostly a suite of test sets (however, splits into train, dev and test will be provided for some of the larger resources). We strive to develop it further, which will hopefully result in training data appearing here as well.
I trained a system and want to submit its results. How do I do that?
Instructions will appear here later.
I have a dataset that I think can become part of SwedishGLUE.
Please contact us at sb-guld@svenska.gu.se.