The Corpus of Spoken isiXhosa
The Corpus of Spoken isiXhosa consists of transcribed and annotated recordings of spoken Xhosa [xho]. The recordings have been made in the Eastern Cape in South Africa from 2015 onwards. The transcribed texts are annotated with morpheme-by-morpheme glosses, part-of-speech tags, and free English translations.
The recordings and the annotations of Xhosa data have been made as part of three different research projects led by senior lecturer Eva-Marie Bloom Ström at the University of Gothenburg. All projects, including the ongoing ‘How do words get in order? The role of speaker-hearer interaction in languages of southern Africa’, were founded by the Swedish Research Council.
The Corpus has been developed in collaboration with Språkbanken Text.
For more on annotation, preparation of data, and acknowledgements see:
- Bloom Ström, E.-M., Slater, O., Zahran, A., Berdicevskis, A., & Schumacher, A. (2023). Preparing a corpus of spoken Xhosa. Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), 62–67. https://aclanthology.org/2023.clasp-1.7
For questions about the corpus:
Eva-Marie Bloom Ström eva-marie.strom@gu.se
If you notice any errors or inconsistencies in annotations, please report them to this email address.
Main contributors:
- Eva-Marie Bloom Ström
Senior Lecturer, University of Gothenburg - Onelisa Slater
MA, Rhodes University - Aron Zahran
PhD, Inalco/Llacan (CNRS) & Ghent University