Skip to main content

SweDiagnostics

Citation

Morger, Felix. (2023-04-04). SweDiagnostics [Data set]. Språkbanken Text. https://doi.org/10.23695/yepn-se26
Additional ways to cite the dataset.
Swedish version of (Super)GLUE Diagnostic
I. IDENTIFYING INFORMATION
Title* SuperLim Diagnostic Dataset, v1.1
Subtitle
Created by* Felix Morger, Gothenburg University (felix.morger@gu.se)
Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se)
Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/en/resources/superlim
License(s)* CC BY 4.0
Abstract* Manual Swedish translation of all 1106 sentence pairs of the SuperGLUE diagnostic dataset.
Funded by* Vinnova (grants no. 2020-02523, 2021-04165)
Cite as
Related datasets SuperLim, SuperGLUE diagnostic dataset, FraCaS test suite
II. USAGE
Key applications Fine-grained analysis of system performance on a broad range of linguistic phenomena.
Intended task(s)/usage(s) Natural language inference.
Recommended evaluation measures Krippendorff's alpha (the official SuperLim measure), Matthews' correlation coefficient.
Dataset function(s) Diagnostics
Recommended split(s) Test only
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 1106
Nature of the content* Pairs of sentences annotated according with their inference relation and the linguistic phenomena that account for their differencs
Format* JSONL and TSV. Nine columns/objects: id, four columns with the information about the relevant linguistic phenomena; domain; label; premise; hypothesis
Data source(s)* SuperGLUE Diagnostic Dataset: Pruksachatkun, Yada & Nangia, Nikita & Singh, Amanpreet & Michael, Julian & Hill, Felix & Levy, Omer & Bowman, Samuel. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.
Data collection method(s)* See original source.
Data selection and filtering* See original source.
Data preprocessing* See original source.
Data labeling* Some data labels (annotations) were changed to fit with Swedish example, but in general the aim was to keep such changes to a minimum.
Annotator characteristics
IV. ETHICS AND CAVEATS
Ethical considerations See original data source.
Things to watch out for See original data source.
V. ABOUT DOCUMENTATION
Data last updated* 2023-03-01, v1.1
Which changes have been made, compared to the previous version* Minor format changes
Access to previous versions
This document created* 2021-06-04, Felix Morger.
This document last updated* 2023-04-02, Aleksandrs Berdicevskis.
Where to look for further details
Documentation template version* v1.1
VI. OTHER
Related projects
References
File Size Modified Licence
swediagnostics.zip
an archive with the dataset in JSONL and TSV formats and the documentation sheet (zip)
72.89 KB 2023-04-04 CC BY 4.0
attribution

Collection

SuperLim 2

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Creators

  • Morger, Felix

Updated

2023-04-04

Contact

Språkbanken
sb-info@svenska.gu.se