Hoppa till huvudinnehåll

SweFraCas 1.0

Textual inference/entailment problem set
I. IDENTIFYING INFORMATION
Title* SweFracas v1.0
Subtitle A Swedish version of the Fracas inference/entailment dataset
Created by* Lars Borin (lars.borin@gu.se)
Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se)
Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/en/resources/swefracas
License(s)* CC BY 4.0
Abstract* A textual inference/entailment problem set, derived from FraCas. The original English Fracas [1] was converted to html and edited by Bill MacCartney [2], and then automatically translated to Swedish by Peter Ljunglöf and Magdalena Siverbo [3]. The current tabular form of the set was created by Aleksandrs Berdicevskis by merging the Swedish and English versions and removing some of the problems. Finally, Lars Borin went through all the translations, correcting and Swedifying them manually. As a result, many translations are rather liberal and diverge noticeably from the English original
Funded by* Vinnova (grant no. 2020-02523)
Cite as
Related datasets Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim). See also Abstract
II. USAGE
Key applications Machine Learning, Inference, Entailment, Evaluation of language models, Diagnostics
Intended task(s)/usage(s) (1) Evaluate models on the following task: given the question and the premises, choose the suitable answer (Ja 'Yes'; Nej 'No'; Vet ej 'Don't know'; Jo 'Positive answer to a negated question')
Recommended evaluation measures (1) R4 (Matthews correlation coefficient)
Dataset function(s) Testing
Recommended split(s) Test data only
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 305 problems
Nature of the content* Inference problems, where a question has to be answered, given a number of promises
Format* Tab-separated, five columns:
"id" -- unique integer id of the problem;
"original_id" -- the id of the corresponding problem in the original dataset
"attribute" -- which attribute does the row within the problem contain: "premiss" (premise), "fråga" (question), "svar" (answer), "why" and "note". The latter two are taken from MacCartney's conversion and refer only to English data. They are kept for information only;
"value" -- the Swedish sentence. "why" and "note" are always empty for Swedish;
"original_value" -- the original English sentence. Provided for information only. Note that many translations are rather liberal.
Data source(s)* See Abstract
Data collection method(s)* See Abstract
Data selection and filtering* 41 problems in the original set did not have a definite answer (different answers were possible depending on the interpretation). They were excluded.
Data preprocessing* None
Data labeling* Most of the labels map straightforwardly on the original English labels (Yes Ja, Don't know Vet ej, No Nej), with three exceptions: 97, 98 (Nej Jo) and 108 (No Vet ej)
Annotator characteristics PhD in linguistics; native speaker of Swedish
IV. ETHICS AND CAVEATS
Ethical considerations
Things to watch out for In the original dataset, all examples were classified by the linguistic phenomena they represent. It is not necessary that the Swedish translations follow exactly the same classification (most of them probably do, but it has not been checked).
V. ABOUT DOCUMENTATION
Data last updated* 2021-06-09, v1.0
Which changes have been made, compared to the previous version* This is the first official version
Access to previous versions
This document created* 2021-06-09, Aleksandrs Berdicevskis
This document last updated* 2021-06-09, Aleksandrs Berdicevskis
Where to look for further details
Documentation template version* v1.0
VI. OTHER
Related projects
References [1] Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Johan Van Genabith, Jan Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, et al. 1996. Using the framework. Technical report, Technical Report LRE 62-051 D-16, The FraCaS Consortium. ftp://ftp.cogsci.ed.ac.uk/pub/FRACAS/del16.ps.gz
[2] https://nlp.stanford.edu/~wcmac/downloads/fracas.xml
[3] Peter Ljunglöf and Magdalena Siverbo. 2012. A bilingual treebank for the FraCas test suite. In SLTC 2012, page 53. https://gup.ub.gu.se/publication/168965?lang=en, https://gup.ub.gu.se/publication/168965?lang=en

Kontakt

Språkbanken (sb-info@svenska.gu.se)