The diversity of the world's 6,500 languages embodies a wealth of information on human cognition and the history of populations. As languages go extinct, the linguistic heritage of human kind increasingly resides in grammars and dictionaries, which are rapidly accumulating. Accessing this heritage entails that the descriptions are available and that they are read by someone. Availability is a problem because publications are often difficult to access.
In this project we aim to enhance access to the world’s linguistic heritage by making an existing collection of more than 9,000 PDF documents no longer protected by to copy-right available in a stable archive enriched by added metadata and computational tools developed to search information within the texts. Moreover, a number of dictionaries will be converted to apps for mobile devices that can be distributed to speakers of minority languages, handing back to these speakers some of their linguistic heritage. The developed resources, particularly grammatical descriptions, are to be used for experimentation and development of methodologies for automatic extraction of linguistic features.