Hoppa till huvudinnehåll

CLT seminar

Datum

20 oktober 2022 10:30–11:30

Plats

C442

Öppet

Öppet för allmänheten

Speaker: Herbert Lange

Title: Semi-automatic quality assurance for audiovisual corpus data

Abstract:
Gathering high-quality language data is important for linguistic research. This is a particular challenge for audiovisual data, e.g. for language documentation but also learner and sign language data. The data has to be transcribed and annotated and it is essential to understand the annotations for later reuse. In the QUEST project (QUality ESTablished) we developed processes and criteria to validate and improve audiovisual corpus data as well as implemented automatic validation procedures where possible.
I will present a semi-automatic review process to improve and certify the quality of corpus data as well as a concrete implementation of relevant criteria for language documentation.