Skip to main content

Gothenburg Dialogue Corpus (GDC)

GDC is a collection of 360 individual dialogues transcribed from recordings.
Gothenburg Dialogue Corpus (GDC) is a collection of 360 individual dialogues transcribed from recordings of about 25 different social activites. The corpus was initiated in the late 1970's to meet a growing interest in naturalistic spoken language data. The GDC data is very diverse considering the different social activities with regard to punctuation, grammar, vocabulary and the role of language and communication in human social life. The corpus consist of both audio (50%) and audio/video (50%) recordings of naturalistically occurring interactions.

For access please contact data@flov.gu.se.
File Size Modified Licence
stats_GDC.txt
Word statistics: Information (CSV)
3.95 MB 2017-03-26 CC BY 4.0
attribution

Type

  • Corpus

Language

Swedish

Size

Sentences: 107,700
Tokens: 1,473,608

Contact

Institutionen för filosofi, lingvistik och vetenskapsteori
data@flov.gu.se