This report will deal with the subject of ``parallel texts.'' In its simplest form this refers to a text in one language that has been translated into one or more other languages. All of these taken together are called ``parallel texts,'' since they, ideally, contain the same information in parallel with each other. In its most advanced form the collection of texts has been designed and composed according to the same principles that are relevant for monolingual corpora. Such principles are presently in a state of flux, both for monolingual and even more so for multilingual corpora. The present report will only touch upon such questions, but will make no attempt to go into detail.
Over the last years there has been an increasing interest in parallel corpora. They provide a base material for a wide range of applications ranging from computer-assisted language learning at the school level to streamlining the creation of terminological databases for international corporations such as AT&T. They can be used to improve different types of computer-aided translation systems, and for human translators they provide a valuable source, as well as a useful tool in their training. Parallel texts, however, are not limited in their usefulness to translation studies; projects such as LINGUA, which will be dealt with below, use them with the express aim of providing material for second language learning and they are finding an increasingly important place in contrastive linguistics [Aijmer and Altenberg1995].
Språkbanken (literally: language-bank) in Göteborg gave the collection of parallel texts high priority for the academic year 1995/1996. There are already several collections of texts from Swedish newspapers that date back to 1965 and which have been systematically complemented every eleven years. In addition to the newspaper material there are two collections of novels from 1976-77 and 1980-81. This material amounts to about 20 million running words. Many of the novels, particularly from 1976-77, are in fact translations into Swedish, making the collection of parallel texts a natural extension of our present resources. At the same time the Language Bank has the task of fulfilling the corpus requirements for the Swedish participation in the PAROLE project. A comparable corpus, that is a corpus in Swedish that is matched to other language corpora with respect to text types and selection criteria [Sinclair1994, 12,], will be built up through the department's involvement in PAROLE .
In order to place PEDANT in a wider context the first part of this report will describe projects around the world, favoring those involving the Swedish language. The second part will introduce PEDANT, its structure, language pairs, domain coverage, storage, alignment and annotation. Some of present work that is being done on the texts will be briefly mentioned together with plans for the future. With this report we merely want to introduce our work in such a way that future reports can build upon it and deal with specifics in more detail.