- Associate Professor in NLP
- Ph.D. computer science
- Master of Science in Engineering Mathematics
- Bachelor of Science in Mathematics
Our project Towards Automatic Detection of Language Change (2019 - 2022) with Simon Hengchen, Richard Johansson, and Maria Koptjevskaja Tamm has been running for a year and a half. So far, we have published an ACL paper, organized the first ever international workshop on computational methods for historical language change (LChange'19), given several talks, and given a few local workshops, the latest at the Alan Turing Institute. Our survey paper on Computational Approaches to Lexical Semantic Change can be found on ArXiV. We have also run the SemEval-2020: Task 1 on Unsupservised Lexical Semantic Change Detection. Keep updated with our latest activities and plans via our news-list.
- lexical semantic change
- data science for the humanities
- algorithms and methods for temporal evolution
- knowledge discovery
With advances in technology and culture and through high impact events, our language changes. We invent new words, add or change meanings of existing words and rename existing things. This results in a dynamic language that progresses with our needs and provides us with the possibilities to express ourselves and describe the world around us. The phenomenon is called language change.
I work on automatic detection of language changes with two main objectives in mind; to help users find and interpret content in long-term archives. When things, locations and people have different names in the archives from those we are familiar with, we cannot find relevant documents by means of simple string matching techniques. The strings matching the modern names will not correspond to the strings matching the names stored in the archive. And, even if we are able to find relevant documents, there is no guarantee that we can interpret the content. Words and expressions reflect our culture and evolve over time. Without explicit knowledge about these changes, we risk placing modern meanings on these expressions which leads to wrong interpretations.
My focus is primarily on word sense changes and general word-to-word changes. Word sense change relates to words that stay the same over time but change their meaning, either by adding one or several new senses, changing an existing sense or removing a sense. Also new words with their new senses fall into this category.
“ Sebastini's benenefit last night at the Opera-House was overflowing with the fashionable and gay. ”
The above quote was published on April 27, 1787 in The Times and carries a reference to a word that has since changed its primary meaning. When read today, the word gay will most likely be interpreted as homosexual because of the popularity of the sense today. However, this particular sense of the word was not introduced until the early 20th century and instead, in this context the word should be interpreted with the sense of happy.
Word-to-word changes relate to concepts that stay the same with different words to represent them. The general problem is difficult to solve but has an easier subproblem, namely term-to-term changes which relate to named entity changes.
“ The Germans are brought nearer to Stalingrad and the command of the lower Volga. ”
The above quote was published on July 18, 1942 in The Times and refers to the Russian city that figures often in the context of World War II. In reference to World War II people speak of the city of Stalingrad or the Battle of Stalingrad, however, the city cannot be found on a modern map. In 1961, Stalingrad was renamed to Volgograd and has since been replaced on maps and in modern resources. Not knowing of this change leads to several problems. Knowing only about Volgograd means that the history of the city becomes inaccessible as documents that describe its history only contain the name Stalingrad (or Tsaritsyn as the city was named before 1925). Reversely, knowing only about Stalingrad makes is difficult to find information about the current state and location of the city.
Broadly, I am interested in temporal evolution of all sorts of information, including opinions, topics and graphs and in textual information extraction. Lately, I have been involved in creating a Billion word corpus for modern Swedish, a representation of written, modern Swedish as well as creating a Swedish sentiment lexicon.
Synergies conference, 2020 (May 2020, Odense)
Literary studies and DH
Workshop on Digital Literacy, 2020
6th Estonian Digital Humanities Conference dhe2018
Chair local organizing committee
21st Nordic Conference on Computational Linguistics (NoDaLiDa 2017)
Digital Humanities in the Nordic Countries 2nd Conference (DHN2017)
Local organizing committee
14th Conference of the European Chapter of the Association for Computational Linguistics (EACL2014)
International Workshop on Social Media Semantics (SMS 2013)
- 1st International Workshop on Computational Approaches to Historical Language Change
- Workshop on Automatic Detection of Language Change
- The first Swedish national SWE-CLARIN workshop
- Semantic technologies for research in the humanities and social sciences (STRiX)
SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
Schlechtweg, Dominik; McGillivray, Barbara; Hengchen, Simon; Dubossarsky, Haim; Tahmasebi, Nina
SemEval-2020 Task description paper
Swedish Test Data for SemEval 2020 Task 1:Unsupervised Lexical Semantic Change Detection
Tahmasebi, Nina; Hengchen, Simon; Schlechtweg, Dominik; McGillivray, Barbara; Dubossarsky, Haim