Speaker: Herbert Lange
Title: Semi-automatic quality assurance for audiovisual corpus data
Abstract:
Gathering high-quality language data is important for linguistic research. This is a particular challenge for audiovisual data, e.g. for language documentation but also learner and sign language data. The data has to be transcribed and annotated and it is essential to understand the annotations for later reuse. In the QUEST project (QUality ESTablished) we developed processes and criteria to validate and improve audiovisual corpus data as well as implemented automatic validation procedures where possible.
I will present a semi-automatic review process to improve and certify the quality of corpus data as well as a concrete implementation of relevant criteria for language documentation.