Hoppa till huvudinnehåll
Språkbanken Text är en avdelning inom Språkbanken.

SweDiagnostics

Standardreferens Information

Felix Morger. 2024. SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish. In Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024, pages 118–124, Torino, Italia. ELRA and ICCL. Publication Bibtex

Citering Information

Morger, Felix (2023). SweDiagnostics (uppdaterad: 2023-04-04). [Data set]. Språkbanken Text. https://doi.org/10.23695/yepn-se26
BibTeX Ytterligare sätt att citera datamängden.
Svenska versionen av (Super)GLUE diagnostik
I. IDENTIFYING INFORMATION
Title* SuperLim Diagnostic Dataset, v1.1
Subtitle
Created by* Felix Morger, Gothenburg University (felix.morger@gu.se)
Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se)
Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/en/resources/superlim
License(s)* CC BY 4.0
Abstract* Manual Swedish translation of all 1106 sentence pairs of the SuperGLUE diagnostic dataset.
Funded by* Vinnova (grants no. 2020-02523, 2021-04165)
Cite as
Related datasets SuperLim, SuperGLUE diagnostic dataset, FraCaS test suite
II. USAGE
Key applications Fine-grained analysis of system performance on a broad range of linguistic phenomena.
Intended task(s)/usage(s) Natural language inference.
Recommended evaluation measures Krippendorff's alpha (the official SuperLim measure), Matthews' correlation coefficient.
Dataset function(s) Diagnostics
Recommended split(s) Test only
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 1106
Nature of the content* Pairs of sentences annotated according with their inference relation and the linguistic phenomena that account for their differencs
Format* JSONL and TSV. Nine columns/objects: id, four columns with the information about the relevant linguistic phenomena; domain; label; premise; hypothesis
Data source(s)* SuperGLUE Diagnostic Dataset: Pruksachatkun, Yada & Nangia, Nikita & Singh, Amanpreet & Michael, Julian & Hill, Felix & Levy, Omer & Bowman, Samuel. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.
Data collection method(s)* See original source.
Data selection and filtering* See original source.
Data preprocessing* See original source.
Data labeling* Some data labels (annotations) were changed to fit with Swedish example, but in general the aim was to keep such changes to a minimum.
Annotator characteristics
IV. ETHICS AND CAVEATS
Ethical considerations See original data source.
Things to watch out for See original data source.
V. ABOUT DOCUMENTATION
Data last updated* 2023-03-01, v1.1
Which changes have been made, compared to the previous version* Minor format changes
Access to previous versions
This document created* 2021-06-04, Felix Morger.
This document last updated* 2023-04-02, Aleksandrs Berdicevskis.
Where to look for further details
Documentation template version* v1.1
VI. OTHER
Related projects
References
Fil Storlek Modifierad Licens
swediagnostics.zip
an archive with the dataset in JSONL and TSV formats and the documentation sheet (zip)
72.89 KB 2023-04-04 CC BY 4.0
attribution

Del av samling

Typ

  • Korpus
  • Tränings- och utvärderingsdata

Språk

svenska

Storlek

Skapad av

  • Morger, Felix

Updaterad

2023-04-04

Kontakt

Språkbanken
sb-info@svenska.gu.se