Next: The composition of PEDANT Up: PEDANT Parallel Texts in Previous: Umeå

PEDANT

Now that some of the related projects have been presented we will turn our attention to PEDANT. We will be dealing with some of the same aspects--language pairs, text classification, storage, etc.--but we are now in a position to go into greater detail. With the exception of the LINGUA and INTERSECT projects it is rarely explained how the results of alignment are made available. The LINGUA project is producing user friendly parallel concordances and CRATER is concentrating on tools and methods. The Scandinavian work is focusing on contrastive studies, and parallel texts offer a wealth of material if they are used judiciously, but we are not told how they are being used. Are there tools that enable searching for specified criteria and for returning such searches together with matches in the other half of the language pairs? We are often informed that the material is being annotated with TEI-conformant mark-up, but we are not always told how it is being used. Once again, the LINGUA project has provided the most details. The Norwegian project explains how they link source and target pairs by using an attribute corresp in the <s> tag, i.e., <s id=ST1.1.1.s1 corresp=ST1T.1.1.s1> (Hofland95; [54]Hofland95a). The first attribute, id= identifies the sentence being tagged and the second attribute corresp= points to the translation in the target text. We are not, however, told how, this information is being used. To our knowledge the only application that understands this annotation is an sgml parser. The parser will check and make sure the id-references actually exist, but no more.

We decided from the outset that the results of our work should be easily accessible for others and that the data should provide fine-grained linguistic detail as well. These two requirements have resulted in two methods of presentation and storage. We store our data in a relational database for interactive retrieval and we store our data as a tagged TEI-conformant corpus. These are not, however, totally independent of each other. There is a one-to-one relationship between the two formats, enabling us to keep them synchronized with each other with minimum of effort.

Next: The composition of PEDANT Up: PEDANT Parallel Texts in Previous: Umeå

Daniel Ridings
Sun Mar 31 09:05:43 METDST 1996