Hoppa till huvudinnehåll

Corpus of spoken isiXhosa

Standardreferens Information

Eva-Marie Bloom Ström, Onelisa Slater, Aron Zahran, <a href='/om/personal/sasha'>Aleksandrs Berdicevskis</a>, <a href='/om/personal/anne'>Anne Schumacher</a> (2023): <a href="https://gup.ub.gu.se/publication/328710?lang=sv">Preparing a corpus of spoken Xhosa</a>, in <em>Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), Gothenburg and online 11–12 September 2023 / Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik (Editors)</em>, pages <em>62-67</em> <a href="https://spraakbanken.gu.se/forskning/publikationer/bibtex/328710"> <img src="https://spraakbanken.gu.se/modules/custom/sb_publications/assets/bibtex.png" alt="BibTeX" class="inline"/> </a>

Datacitering Information

Språkbanken Text (2025). Corpus of spoken isiXhosa (uppdaterad: 2025-04-23). [Data set]. Språkbanken Text. https://doi.org/10.23695/xrsg-mp07
BibTeX Ytterligare sätt att citera datamängden.
A corpus of transcribed and annotated recordings of spoken Xhosa.

The Corpus of Spoken isiXhosa

The Corpus of Spoken isiXhosa consists of transcribed and annotated recordings of spoken Xhosa [xho]. The recordings have been made in the Eastern Cape in South Africa from 2015 onwards. The transcribed texts are annotated with morpheme-by-morpheme glosses, part-of-speech tags, and free English translations.

The recordings and the annotations of Xhosa data have been made as part of three different research projects led by senior lecturer Eva-Marie Bloom Ström at the University of Gothenburg. All projects, including the ongoing ‘How do words get in order? The role of speaker-hearer interaction in languages of southern Africa’, were founded by the Swedish Research Council.

The Corpus has been developed in collaboration with Språkbanken Text.

A user guide and more extensive information about the corpus data can be found in the Corpus of Spoken isiXhosa Manual [PDF].

For more on annotation, preparation of data, and acknowledgements see:

  • Bloom Ström, E.-M., Slater, O., Zahran, A., Berdicevskis, A., & Schumacher, A. (2023). Preparing a corpus of spoken Xhosa. Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), 62–67. https://aclanthology.org/2023.clasp-1.7

For questions about the corpus: Eva-Marie Bloom Ström eva-marie.strom@gu.se

If you notice any errors or inconsistencies in annotations, please report them to this email address.

Main contributors:

  • Eva-Marie Bloom Ström Senior Lecturer, University of Gothenburg
  • Onelisa Slater MA, Rhodes University
  • Aron Zahran PhD, Inalco/Llacan (CNRS) & Ghent University

Tillgänglig via

Åtkomst Plattform Licens
CC-BY-4.0

Ladda ned

Fil Storlek Modifierad Licens
xhosa.xml.bz2
corpus Information (XML)
295.86 KB 2026-03-09 CC-BY-4.0

Typ

  • Korpus

Språk

xhosa

Storlek

Token: 8 688
Meningar: 1 890

Skapad

2024-05-08

Uppdaterad

2025-04-23

Kontakt

sb-info@svenska.gu.se