Skip to main content

Argumentation sentences 1.0

A translated corpus for classifying sentence stance in relation to a topic.
I. IDENTIFYING INFORMATION
Title* Argumentation sentences
Subtitle A translated corpus for classifying sentence stance in relation to a topic.
Created by* Anna Lindahl (anna.lindahl@svenska.gu.se)
Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se)
Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/en/resources/superlim
License(s)* CC BY 4.0
Abstract* Argumentation sentences is a translated corpus for the task of identifying stance in relation to a topic. It consists of sentences labeled with pro, con or non in relation to one of six topics. The original dataset [1] can be found here https://github.com/trtm/AURC. The test set is manually corrected translations, the training set is machine translated.
Funded by* Vinnova (grant no. 2021-04165)
Cite as
Related datasets Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim)
II. USAGE
Key applications Machine learning, argumentation mining, stance classification
Intended task(s)/usage(s) Evaluate models on the following task: Given a sentence and a topic, determine if the sentence is for, against or neutral in relation to the topic.
Recommended evaluation measures Krippendorff’s alpha (the official SuperLim measure), MCC, F
Dataset function(s) Training, testing
Recommended split(s) Train, dev, test (provided)
III. DATA
Primary data* Text
Language* Swedish
Dataset in numbers* 5265 sentences split over 6 topics, 3450 train, 750 dev and 1065 test
Nature of the content* Topics: Abortion, Death penalty, Nuclear power, Marijuana legalization, Minimum wage, Cloning. Each topic has a set of associated sentences, lableled with pro, con or non in relation to the topic.
Format* Jsonl with the following keys: sentence_id = the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself
Tab-separated with 4 columns: the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself
Data source(s)* The original data comes from the AURC dataset [1] ( https://github.com/trtm/AURC). For this corpus, only the in-domain topics were used.
Data collection method(s)* Collected from the Common Crawl archive. See [1]
Data selection and filtering* A subset of the original data, only the in-domain topics are used.
Data preprocessing* Sentences were machine translated. The test set was then manually corrected.
Data labeling* The sentences are labeled with pro, con or non, signifying their stance in relation to a topic.
Annotator characteristics
IV. ETHICS AND CAVEATS
Ethical considerations
Things to watch out for
V. ABOUT DOCUMENTATION
Data last updated* 20221215
Which changes have been made, compared to the previous version* First version
Access to previous versions
This document created* 20221215 by Anna Lindahl
This document last updated* 20220203 by Anna Lindahl
Where to look for further details
Documentation template version* v1.1
VI. OTHER
Related projects
References [1] Trautmann, D., Daxenberger, J., Stab, C., Schütze, H., & Gurevych, I. (2020, April). Fine-grained argument unit recognition and classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9048-9056).
File Size Modified Licence
argumentation-sentences.zip
an archive with the dataset in JSONL and TSV formats and the documentation sheet (zip)
827.04 KB 2023-03-30 CC BY 4.0
attribution

Collection

SuperLim 2

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Tokens: 0

Contact

Språkbanken
sb-info@svenska.gu.se