Education
Doctor of Philosophy in Natural Language Processing
Master of Arts in Computational Linguistics
Bachelor of Electronics and Computer Engineering
Research interests
- Multilingual Natural Language Generation
- Lexical Semantics
- Knowledge Representation
- Digital Humanities
Research description
My research interests in language technology span the areas of textual analysis, lexical semantics, multilingual natural language generation, and knowledge representation standards. I have specific expertise in developing natural language applications and resources. Recently, I have been involved in several digital humanities projects within SWE-CLARIN. Since January 2019 I am involved in a research infrastructure project in collaboration with Kungliga biblioteket (KB), where the aim is to improve KB's Optical Character Recognition (OCR) process, especially in relation to the digitisation of newspapers. We are focusing on improving OCR errors with the help of electronic dictionaries and word lists automatically extracted from corpora. Previous experiments on improving OCR with specific word lists that we carried out in the project A free cloud service for OCR demonstrated the usefulness of this approach when applied on historical material.
Another research project I am a part of is the Swedish FrameNet (SweFN++), a lexical-semantic resource based on the theory of frame semantics that has been expanded from and constructed in line with the Berkeley FrameNet. A large part of my work within the project involves the development of domain specific semantic frames, semantic and syntactic annotations of examples, and automatic production of texts from framenet data. One focus of my work is on developing NLP applications that exploit FN data. An example of an application we developed is generation of computational multilingual FrameNet-based grammar and lexicon from FrameNet-annotated corpora.
Other NLP projects related to my research interests that I have participated in are:
The Swedish constructicon project, SweCcn -- a Swedish constructicon, a large electronic database of Swedish constructions, which has been developed as an extension of the Swedish FrameNet. My work focused on developing an automatic approach based on the resource grammar library provided by Grammatical Framework. We acquired a computational construction grammar from the Swedish construction in order to extend and improve the Swedish resource grammar library. Another line of my work concerned exploitation of statistical methods to validate constructions that are targeted towards second language (L2) learners.
The Linked-Open Data (LTLOD@SB) project, where we published four Swedish lexical-semantic resources available at Språkbanken in RDF with Lemon.
I was the leader of the workpackage Case study: Cultural Heritage in the MOLTO EU project coordinated at the department of Computer Science and Engineering at Chalmers. I was working mainly with texts from the cultural heritage domain in 15 languages. My contributions to the project were: multilingual natural language generation from Semantic Web ontologies in the grammatical framework (GF), and multilingual knowledge extraction from Wikipedia articles. The work resulted in a multilingual system that enables interaction with digital museum libraries through natural language text.
The EU project Semantic Mining in Biomedicine, coordinated at University of Gothenburg. My work in the project involved experiments of semantic mining techniques to extract texts from Medline which were written for different groups of readers and natural language generation of biomedical texts in three languages: English, Swedish and French.
Teaching
- Main lecturer for the Language Technology Resources course offered by the Master's programme in Language Technology, 2018--2023
- Guest lecturer on OCR for the Master's programme in Digital Humanities, 2018
- Lexical semantics lecture for the course Natural Language Processing, 2013--2015, 2017--2021
- Finite State Technology for the course Natural Language Processing, 2015