next up previous
Next: INTERSECT Up: Other projects Previous: The Canadian Hansard Corpus

The CRATER-project

Corpus Resources and Terminology Extraction, CRATER, is a project involving three languages: English, French and Spanish. The corpus consists entirely of technical texts from the International Telecommunications Union (ITU) [McEnery et al.1995]. The project is a cooperative effort between Lancaster University, England and Universidad Autónoma de Madrid, Spain. The corpus is completed and consists of 5,5 million words. The project uses a number of alignment algorithms ranging from sentence to word alignment [McEnery et al.1995, 199-200,]. The texts are tagged with part-of-speech and morphological annotation.



Daniel Ridings
Sun Mar 31 09:05:43 METDST 1996