Skip to main content
Språkbanken Text is a part of Språkbanken.

Corpus of spoken isiXhosa

Standard reference Information

Eva-Marie Bloom Ström, Onelisa Slater, Aron Zahran, Aleksandrs Berdicevskis, Anne Schumacher (2023): Preparing a corpus of spoken Xhosa, in Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), Gothenburg and online 11–12 September 2023, pages 62-67 BibTeX

Citation Information

Språkbanken Text (2024). Corpus of spoken isiXhosa (updated: 2024-11-26). [Data set]. Språkbanken Text. https://doi.org/10.23695/xrsg-mp07
BibTeX Additional ways to cite the dataset.
A corpus of transcribed and annotated recordings of spoken Xhosa.

The Corpus of Spoken isiXhosa

The Corpus of Spoken isiXhosa consists of transcribed and annotated recordings of spoken Xhosa [xho]. The recordings have been made in the Eastern Cape in South Africa from 2015 onwards. The transcribed texts are annotated with morpheme-by-morpheme glosses, part-of-speech tags, and free English translations.

The recordings and the annotations of Xhosa data have been made as part of three different research projects led by senior lecturer Eva-Marie Bloom Ström at the University of Gothenburg. All projects, including the ongoing ‘How do words get in order? The role of speaker-hearer interaction in languages of southern Africa’, were founded by the Swedish Research Council.

The Corpus has been developed in collaboration with Språkbanken Text.

For more on annotation, preparation of data, and acknowledgements see:

  • Bloom Ström, E.-M., Slater, O., Zahran, A., Berdicevskis, A., & Schumacher, A. (2023). Preparing a corpus of spoken Xhosa. Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), 62–67. https://aclanthology.org/2023.clasp-1.7

For questions about the corpus:
Eva-Marie Bloom Ström eva-marie.strom@gu.se

If you notice any errors or inconsistencies in annotations, please report them to this email address.

Main contributors:

  • Eva-Marie Bloom Ström
    Senior Lecturer, University of Gothenburg
  • Onelisa Slater
    MA, Rhodes University
  • Aron Zahran
    PhD, Inalco/Llacan (CNRS) & Ghent University

Accessible through

Type

  • Corpus

Language

Size

Sentences: 1,347
Tokens: 7,039

Created

2024-05-08

Updated

2024-11-26

Contact

Språkbanken Text
sb-info@svenska.gu.se