- Associate Professor in NLP
- Ph.D. computer science
- Master of Science in Engineering Mathematics
- Bachelor of Science in Mathematics
We have been awarded a large research program by RJ, Change is Key! that starts in 2022 and spans six years! Within this program, we will develop
state-of-the-art computational methods to detect different aspects of semantic change and variation, to facilitate and simplify research in the text-based humanities and social sciences, historical linguistics and lexicography. This program has two main aims, firstly to develop corpus-based methods for detecting semantic change (over time) and variation (across social groups and media types). We will collaborate with researchers from social sciences, gender studies, and literature to answer their research questions. We will together develop methods and research methodology for their specific needs. By identifying and handling changes in language automatically, we can ensure a correct interpretation of our language and facilitate studies of our contemporary and historical societies. We have 11 participating researchers and one research engineer from six participating universities, IMS Stuttgart, Queen Mary University of London, University of Lund, Institute for Analytical Sociology (IAS) Linköping University, KU Leuven. PI is Nina Tahmasebi at the University of Gothenburg.
Our project Towards Automatic Detection of Language Change (2019 - 2022) with Simon Hengchen, Dominik Schlechtweg, and Maria Koptjevskaja Tamm has been running since 2019. So far, we have published an ACL paper, organized the first ever international workshop on computational methods for historical language change (LChange'19) and two follow-ups, given several talks, and given a few local workshops, the latest at the Alan Turing Institute. Our survey paper on Computational Approaches to Lexical Semantic Change can be found on ArXiV. We have also run the SemEval-2020: Task 1 on Unsupservised Lexical Semantic Change Detection. Keep updated with our latest activities and plans via our news-list.
Research interests
- lexical semantic change
- data science for the humanities
- algorithms and methods for temporal evolution
- knowledge discovery
With advances in technology and culture and through high impact events, our language changes. We invent new words, add or change meanings of existing words and rename existing things. This results in a dynamic language that progresses with our needs and provides us with the possibilities to express ourselves and describe the world around us. The phenomenon is called language change.
I work on automatic detection of language changes with two main objectives in mind; to help users find and interpret content in long-term archives. When things, locations and people have different names in the archives from those we are familiar with, we cannot find relevant documents by means of simple string matching techniques. The strings matching the modern names will not correspond to the strings matching the names stored in the archive. And, even if we are able to find relevant documents, there is no guarantee that we can interpret the content. Words and expressions reflect our culture and evolve over time. Without explicit knowledge about these changes, we risk placing modern meanings on these expressions which leads to wrong interpretations.
My focus is primarily on word sense changes and general word-to-word changes. Word sense change relates to words that stay the same over time but change their meaning, either by adding one or several new senses, changing an existing sense or removing a sense. Also new words with their new senses fall into this category.
“ Sebastini's benenefit last night at the Opera-House was overflowing with the fashionable and gay. ”
The above quote was published on April 27, 1787 in The Times and carries a reference to a word that has since changed its primary meaning. When read today, the word gay will most likely be interpreted as homosexual because of the popularity of the sense today. However, this particular sense of the word was not introduced until the early 20th century and instead, in this context the word should be interpreted with the sense of happy.
Word-to-word changes relate to concepts that stay the same with different words to represent them. The general problem is difficult to solve but has an easier subproblem, namely term-to-term changes which relate to named entity changes.
“ The Germans are brought nearer to Stalingrad and the command of the lower Volga. ”
The above quote was published on July 18, 1942 in The Times and refers to the Russian city that figures often in the context of World War II. In reference to World War II people speak of the city of Stalingrad or the Battle of Stalingrad, however, the city cannot be found on a modern map. In 1961, Stalingrad was renamed to Volgograd and has since been replaced on maps and in modern resources. Not knowing of this change leads to several problems. Knowing only about Volgograd means that the history of the city becomes inaccessible as documents that describe its history only contain the name Stalingrad (or Tsaritsyn as the city was named before 1925). Reversely, knowing only about Stalingrad makes is difficult to find information about the current state and location of the city.
Broadly, I am interested in temporal evolution of all sorts of information, including opinions, topics and graphs and in textual information extraction. Lately, I have been involved in creating a Billion word corpus for modern Swedish, a representation of written, modern Swedish as well as creating a Swedish sentiment lexicon.
Professional Duties
Keynote
Synergies conference, 2020 (May 2020, Odense)
Literary studies and DH
Synergies conference talk on YouTube
Workshop on Digital Literacy, 2020
Keynote
6th Estonian Digital Humanities Conference dhe2018
Chair local organizing committee
21st Nordic Conference on Computational Linguistics (NoDaLiDa 2017)
http://nodalida2017.se/
Programme committee
Digital Humanities in the Nordic Countries 2nd Conference (DHN2017)
http://dhn2017.eu/
Local organizing committee
14th Conference of the European Chapter of the Association for Computational Linguistics (EACL2014)
http://eacl2014.org/
Workshop chair
- 3rd International Workshop on Computational Approaches to Historical Language Change (LChange'22)
https://languagechange.org/events/2022-acl-lcworkshop/ - 2nd International Workshop on Computational Approaches to Historical Language Change (LChange'21)
https://languagechange.org/events/2021-acl-lcworkshop/ - 1st International Workshop on Computational Approaches to Historical Language Change
https://languagechange.org/events/2019-acl-lcworkshop/ - Workshop on Automatic Detection of Language Change
workshop-automatic-detection-language-change
Test set / datasets
DWUG SV: Diachronic Word Usage Graphs for Swedish
Schlechtweg, Dominik; Tahmasebi, Nina; Hengchen, Simon; Dubossarsky, Haim; McGillivray, Barbara
DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages
SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
Schlechtweg, Dominik; McGillivray, Barbara; Hengchen, Simon; Dubossarsky, Haim; Tahmasebi, Nina
SemEval-2020 Task description paper
Swedish Test Data for SemEval 2020 Task 1:Unsupervised Lexical Semantic Change Detection
Tahmasebi, Nina; Hengchen, Simon; Schlechtweg, Dominik; McGillivray, Barbara; Dubossarsky, Haim
Word Sense Change Test Set
Tahmasebi and Risse: Finding Individual Word Sense Changes and their Delay in Appearance. RANLP 2017
Named Entity Evolution Dataset
Tahmasebi, Gossen, Kanhabua, Holzmann and Risse: NEER: An Unsupervised Method for Named Entity Evolution Recognition Coling 2012