Now that some of the related projects have been presented we will turn
our attention to PEDANT. We will be dealing with some of the
same aspects--language pairs, text classification, storage,
etc.--but we are now in a position to go into greater detail. With
the exception of the LINGUA and INTERSECT projects it is rarely explained how the
results of alignment are made available. The LINGUA project is
producing user friendly parallel concordances and CRATER is
concentrating on tools and methods. The Scandinavian work is focusing
on contrastive studies, and parallel texts offer a wealth of material
if they are used judiciously, but we are not told how they are
being used. Are there tools that enable searching for specified
criteria and for returning such searches together with matches in the
other half of the language pairs? We are often informed that the
material is being annotated with TEI-conformant mark-up, but we are
not always told how it is being used. Once again, the LINGUA project
has provided the most details. The Norwegian project explains how
they link source and target pairs by using an attribute corresp
in the <s> tag, i.e., <s id=ST1.1.1.s1
corresp=ST1T.1.1.s1> (Hofland95; [54]Hofland95a).
The first attribute, id= identifies the sentence being tagged
and the second attribute corresp= points to the translation in
the target text. We are not, however, told how, this
information is being used. To our knowledge the only application that
understands this annotation is an sgml parser. The parser will check
and make sure the id-references actually exist, but no more.
We decided from the outset that the results of our work should be easily accessible for others and that the data should provide fine-grained linguistic detail as well. These two requirements have resulted in two methods of presentation and storage. We store our data in a relational database for interactive retrieval and we store our data as a tagged TEI-conformant corpus. These are not, however, totally independent of each other. There is a one-to-one relationship between the two formats, enabling us to keep them synchronized with each other with minimum of effort.