Lars Borin

I am director of Nationella språkbanken (the National Swedish Language Bank), national coordinator of Swe-Clarin and professor of natural language processing, a discipline falling under the wider umbrella of language technology (LT). LT is a multi-faceted research field. It has an obvious practical side to it, where the aim is to develop methods enabling computers to exhibit human-like linguistic behavior, e.g., text understanding in large-scale dodument processing and information access, machine translation, and automated speech-to-text and text-to-speech conversion. There is also a more theoretical aspect to LT, aiming to investigate general properties of human language, both in order to contribute to linguistic research – this being one of my own central research interests – and in order to incorporate the knowledge thus gained into increasingly sophisticated language processing systems.

Such systems are seeing increased use as research support tools in disciplines where the content of text (and speech) constitute central research data, rich sources of information about history, society, politics, etc. Enormous amounts of digital text are produced on a daily basis, and historical texts making up a considerable and important part of our cultural heritage, are being digitized apace. LT can help researchers cope with this enormous material – billions of words in Swedish only – by developing effective tools for digital humanities and social scientific research. This is another important focus of my research, and in particular questions of methodology arising from the confrontation of traditional “close-reading”, qualitative methods, with large-scale quantitative approaches.

In order to develop high-performance and high-precision language processing systems, we need access to so-called language resources, both large amounts of text of relevant types and genres, and databases containg rich linguistic information, for instance richly structured digital lexical resources. Hence, another central aspect of my research is concerned with the development and optimal utilization of language resources for all modern and historical varieties of written Swedish.

Research interests

language technology infrastructure, digital language resources, e-science, research methodology, digital historical linguistics, lexicography, lexical semantics, language typology, digital humanities, computer-assisted language learning, multi-word expressions

Professor of natural language processing