Skip to main content

SweDiagnostics

Citation

Språkbanken Text. (2021-06-16). SweDiagnostics [Data set]. Språkbanken Text.
Additional ways to cite the dataset.
Swedish version (Super)GLUE Diagnostic

Swedish version of the SuperGLUE diagnostic dataset.

Manual translation of the SuperGLUE Diagnostic Dataset. The data includes all annotated original sentence pairs of SuperGLUE and their Swedish translations.

License: Creative Commons CC-BY 4.0 International (please refer to Språkbanken, University of Gothenburg, Sweden).

I. IDENTIFYING INFORMATION
Title* SweDiagnostics
Subtitle
Created by* Felix Morger, Gothenburg University (felix.morger@gu.se)
Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se)
Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/en/resources/superlim
License(s)* CC BY 4.0
Abstract* Manual Swedish translation of all 1106 sentence pairs of the SuperGLUE diagnostic dataset.
Funded by* Vinnova (grant no. 2020-02523)
Cite as
Related datasets SuperLim, SuperGLUE diagnostic dataset, FraCaS test suite
II. USAGE
Key applications Fine-grained analysis of system performance on a broad range of linguistic phenomena.
Intended task(s)/usage(s) Natural language inference.
Recommended evaluation measures Matthew's correlation coefficient.
Dataset function(s)
Recommended split(s) No split.
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 1106
Nature of the content* Pairs of sentences annotated according with their inference relation and the linguistic phenomena that account for their differencs
Format* Comma-separated
Data source(s)* SuperGLUE Diagnostic Dataset: Pruksachatkun, Yada & Nangia, Nikita & Singh, Amanpreet & Michael, Julian & Hill, Felix & Levy, Omer & Bowman, Samuel. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.
Data collection method(s)* See original source.
Data selection and filtering* See original source.
Data preprocessing* See original source.
Data labeling* Some data labels (annotations) were changed to fit with Swedish example, but in general the aim was to keep such changes to a minimum.
Annotator characteristics
IV. ETHICS AND CAVEATS
Ethical considerations See original data source.
Things to watch out for See original data source.
V. ABOUT DOCUMENTATION
Data last updated* 2021-06-04, v1.0
Which changes have been made, compared to the previous version* Full translation coverage.
Access to previous versions
This document created* 2021-06-04, Felix Morger.
This document last updated* 2021-06-04, Felix Morger.
Where to look for further details
Documentation template version* 1
VI. OTHER
Related projects
References
File Size Modified Licence
swediagnostics-v1.0.csv
swediagnostics-v1.0.csv (csv)
487.85 KB 2021-06-16 CC BY 1.0
attribution

Collection

SuperLim

Successors

Type

  • Corpus
  • Training and evaluation data

Language

Swedish
English

Size

Entries: 1,106

Updated

2021-06-16

Contact

Språkbanken
sb-info@svenska.gu.se