Skip to main content
Språkbanken Text is a part of Språkbanken.

Swedish EAT: question classification

Citation Information

Språkbanken Text (2023). Swedish EAT: question classification (updated: 2023-06-08). [Data set]. Språkbanken Text. https://doi.org/10.23695/zgkb-s720
BibTeX Additional ways to cite the dataset.
A translated version of the QAQC dataset for expected-answer-type classification.
I. IDENTIFYING INFORMATION
Title* Swedish EAT v1.0
Subtitle
Created by* Jonatan Cerwall (jonatancerwall@gmail.com)
Publisher(s)* Språkbanken Text
Link(s) / permanent identifier(s)*
License(s)*
Abstract* This dataset is a translated version of the QAQC dataset (https://cogcomp.seas.upenn.edu/Data/QA/QC/) for expected-answer-type classification. Taxonomy is the Li and Roth Taxonomy, also from https://cogcomp.seas.upenn.edu/Data/QA/QC/.
Funded by*
Cite as Cerwall, J. (2021). What the BERT? Fine-tuning KB-BERT for Question Classification. Unpublished manuscript, School of Electrical Engineering and Computer Science, KTH.
Related datasets
II. USAGE
Key applications Machine learning, EAT Classification
Intended task(s)/usage(s) Evaluate models by standard classification
Recommended evaluation measures Accuracy
Dataset function(s) Testing
Recommended split(s) Test only
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 5451 questions in training set, 500 in test set.
Nature of the content* Open ended factoid questions.
Format* Comma-separated, four columns:
text -- the open ended factoid question
verbose label -- both the coarse-grained label and the fine-grained label formatted as COARSE:fine
coarse label -- coarse-grained label
fine label -- fine-grained label
Data source(s)* Translated from the QAQC dataset (https://cogcomp.seas.upenn.edu/Data/QA/QC/)
Data collection method(s)* --
Data selection and filtering* --
Data preprocessing* --
Data labeling* --
Annotator characteristics
IV. ETHICS AND CAVEATS
Ethical considerations "Some outdated treatment of women (eg "Vilka är de sexigaste kvinnorna i världen?")"
Things to watch out for
V. ABOUT DOCUMENTATION
Data last updated* 2021-07-27
Which changes have been made, compared to the previous version* First version
Access to previous versions
This document created* 2021-07-27
This document last updated* 2023-06-08
Where to look for further details
Documentation template version*
VI. OTHER
Related projects
References

Annotation

Classification of factoid questions by the type of the answer that is expected (coarse label and fine-grained label)

Caveats

Some outdated treatment of women (eg "Vilka är de sexigaste kvinnorna i världen?")

References

  • https://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1607477&dswid=-4107

Download

File Size Modified Licence
361.34 KB 2023-06-08 CC BY 4.0
attribution
2.05 KB 2023-06-08 CC BY 4.0
attribution

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Sentences: 0
Tokens: 0

Keywords

  • gold
  • translated dataset
  • neither-corpus-nor-lexicon

Updated

2023-06-08

Contact

Språkbanken
sb-info@svenska.gu.se