Traditionally, researchers often study the diversity of world's languages by reading and comparing grammatical descriptions manually. Nowadays, a large amount of linguistic descriptions and books are easily available in digital formats. Reading them all for a wider-level comparison and analysis is way beyond individual people's capabilities. Text technology, i.e. computer-based text management in natural language, is now powerful enough to potentially be used to harvest facts at different levels of detail within a given domain (in this case, information on world languages).
Project DescriptionIn this project we want to utilize a useful collection of 9000 digitized grammatical descriptions covering over a thousand languages in order to significantly expand the ability to make major language comparisons. For this purpose, the project will develop methodologies to enable computers to read grammatical descriptions and automatically extract information ("linguistic facts"). We are to explore and develop a notion of "language profile", which is a structured digital collection and representation of a language encapsulating all available knowledge about a language extracted from various sources.
The project is funded by the Marcus and Amalia Wallenberg (MAW) Foundation and lasts between 2018-07-01 - 2022-06-30. The project is run by two partners:
- Uppsala University, Department of Linguistics and Philology, Sweden
- Språkbanken Text, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden
Research Team
- Harald Hammarström (Principal Investigator)
- Markus Forsberg
- Shafqat Mumtaz Virk
For more information on project Homepage