next up previous
Next: Uppsala Up: Other projects Previous: INTERSECT

Linköping

The Linköping Parallel Corpus consists of texts translated from English into Swedish. The texts are from manuals and novels. The manuals amount to 525,000 words in English and Swedish respectively. For the novels there are approximately 280,000 words in each language. There is also a small part with machine translated text, the so called SICS ATIS-Corpus consisting of 2,500 words in each language pair. The overall corpus adds upp to about 1,600,000 words all tagged in SGML.gif Around 20,000 words have been aligned using two different alignment algorithms, the one being a locally developed program called LinAlign and the other being TAlign, a commercial product from the TRADOS company. This part is also tagged according to the Ahrenberg-Merkel taxonomi, AM95Kod, which includes information such as structural equivalence, content equivalence, changes in characteristic features, shifts in meaning and transpositions of primary segments. The project leader is Lars Ahrenberg.



Daniel Ridings
Sun Mar 31 09:05:43 METDST 1996