The Linköping Parallel Corpus consists of texts translated from
English into Swedish. The texts are from manuals and novels. The
manuals amount to 525,000 words in English and Swedish
respectively. For the novels there are approximately 280,000 words in
each language. There is also a small part with machine translated
text, the so called SICS ATIS-Corpus consisting of 2,500 words in each language
pair. The overall corpus adds upp to about 1,600,000 words all tagged
in SGML.
Around 20,000 words have been aligned using
two different alignment algorithms, the one being a locally developed
program called
LinAlign and the other being TAlign, a commercial product from
the TRADOS company. This part is also tagged according to the
Ahrenberg-Merkel taxonomi, AM95Kod, which includes information
such as structural equivalence, content equivalence,
changes in characteristic features, shifts in meaning and transpositions of
primary segments. The project leader is Lars
Ahrenberg.