Background

Traditionally, researchers often study the diversity of world's languages by reading and comparing grammatical descriptions manually. Nowadays, a large amount of linguistic descriptions and books are easily available in digital formats. Reading them all for a wider-level comparison and analysis is way beyond individual people's capabilities. Text technology, i.e. computer-based text management in natural language, is now powerful enough to potentially be used to harvest facts at different levels of detail within a given domain (in this case, information on world languages).

Project Description

In this project we want to utilize a useful collection of 9000 digitized grammatical descriptions covering over a thousand languages in order to significantly expand the ability to make major language comparisons. For this purpose, the project will develop methodologies to enable computers to read grammatical descriptions and automatically extract information ("linguistic facts"). We are to explore and develop a notion of "language profile", which is a structured digital collection and representation of a language encapsulating all available knowledge about a language extracted from various sources.

The project is funded by the Marcus and Amalia Wallenberg (MAW) Foundation and lasts between 2018-07-01 - 2022-06-30. The project is run by two partners:

Uppsala University, Department of Linguistics and Philology, Sweden
Språkbanken Text, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden

Research Team

Harald Hammarström (Principal Investigator)
Markus Forsberg
Shafqat Mumtaz Virk

For more information on project Homepage

Milage: Multilingual Automated Grammar Extraction

Background

Project Description

Project duration

Project members

Funding

Project type