Shafqat Virk

I am a computer scientist who works with natural language processing (NLP) and language technology for digital humanities and its underlying disciplines. In my recent research, I have focused on the use of language technology particularly for (1) the automatic detection of semantic change in languages (2) the automatic extraction and structured representation of typological linguistic data from natural language descriptions, and (3) the digitization and accessibility of rich resources on natural languages of the world.

During my PhD, I worked with Grammatical Framework group at University of Gothenburg, and developed resource grammars for six Indo-Iranian languages including Hindi, Urdu, Persian, Punjabi, Nepali, and Sindhi. These grammars are in the form of software libraries and encode different syntactical and lexical aspects of those natural languages.

As a postdoctorate researcher at the Institute of Information Science at Academia Sinica, Taiwan, I worked on a Propbank based semantic role labeling and an information extraction systems for English and Chineese. I was also involved in the French FrameNet project at IRIT, France, where we worked on a FrameNet based semantic role labeling system, and on the extension of the relational structure of English FrameNet.

I am working as a researcher at Språkbanken since 2015. Here, I have been involved in a number of projects most notably in the South Asia as a Linguistic Area, DReaM and Milage. Currently, I am involved in a six year project ChangeIsKey where we are aiming to develop state-of-the-art methods and tools to automatically detect semantic change (over time) in natural languages.

Projects - current

Change is Key!

This program has two main aims, firstly to develop corpus-based methods for detecting semantic change (over time) and variation (across social groups and media). This will create general tools for the study and detection of language change at large-scale and directly benefit historical linguistics and lexicography. Secondly, we will collaborate with researchers from social sciences, gender studies, and literature to answer their research questions. We will develop tools, evaluation data, and research methodology for their specific needs.

Projects - finished

Publications

2024

Pauline Sander, Simon Hengchen, Wei Zhao, Xiaocheng Ma, Emma Sköldberg, Shafqat Virk, Dominik Schlechtweg (2024): The DURel Annotation Tool, in Book of Abstracts of the Workshop Large Language Models and Lexicography, 8 October 2024 Cavtat, Croatia / Simon Krek (ed.)
Emma Sköldberg, Shafqat Virk, Pauline Sander, Simon Hengchen, Dominik Schlechtweg (2024): Revealing Semantic Variation in Swedish Using Computational Models of Semantic Proximity–Results From Lexicographical Experiments, in Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress 8–12 October 2024 Cavtat, Croatia (eds. Kristina Š. Despot, Ana Ostroški Anić & Ivana Brač )
Dominik Schlechtweg, Shafqat Virk, Pauline Sander, Emma Sköldberg, Lukas Theuer Linke, Tuo Zhang, Nina Tahmasebi, Sabine Schulte im Walde (2024): The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change, in Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, March 17-22, 2024, St. Julians, Malta

2023

Shafqat Virk, Per Klang, Lars Borin, Anju Saxena (2023): LingFN: A Framenet for the Linguistic Domain, in Computational Linguistics and Intelligent Text Processing, 20th international Conference, CICLing 2019, La I Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I / Alexander Gelbuk (ed.), pages 367-379
Claes Ohlsson, Shafqat Virk, Nina Tahmasebi (2023): Going to the market together. A presentation of a mixed methods project, in TwinTalks Workshop at DH2023, 10 July, Graz, Austria

2022

Elena Volodina, Dana Dannélls, Aleksandrs Berdicevskis, Markus Forsberg, Shafqat Virk (2022): Live and Learn- Festschrift in honor of Lars Borin

2021

Lars Borin, Anju Saxena, Shafqat Virk, Bernard Comrie (2021): Swedish FrameNet++ and comparative linguistics, in The Swedish FrameNet++: Harmonization, integration, method development and practical language technology applications / editor(s): Dana Dannélls, Lars Borin and Karin Friberg Heppin, pages 139–165
Shafqat Virk, Dana Dannélls, Lars Borin, Markus Forsberg (2021): A Data-Driven Semi-Automatic Framenet Development Methodology, in Proceedings of the International Conference on Recent Advances in Natural Language Processing, 1–3 September, 2021 / Edited by Galia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva
Shafqat Virk, Daniel Foster, Azam Sheikh Muhammad, Raheela Saleem (2021): A Deep Learning System for Automatic Extraction of Typological Linguistic Information from Descriptive Grammars, in Proceedings of Recent Advances in Natural Language Processing, Sep 1–3, 2021/ edited by Galia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva
Shafqat Virk, Dana Dannélls, Azam Sheikh Muhammad (2021): A Novel Machine Learning Based Approach for Post-OCR Error Detection, in Proceedings of the International Conference on Recent Advances in Natural Language Processing, 1–3 September, 2021 / Edited by Galia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva
Dana Dannélls, Shafqat Virk (2021): A Supervised Machine Learning Approach for Post-OCR Error Detection for Historical Text, in Linköping Electronic Press Workshop and Conference Collection. Selected contributions from the Eighth Swedish Language Technology Conference (SLTC-2020), 25-27 November, 2020
Lars Borin, Anju Saxena, Shafqat Virk, Bernard Comrie (2021): A bird’s-eye view on South Asian languages through LSI: Areal or genetic relationships?, in Journal of South Asian Languages and Linguistics, volume 7, issue 2, pages 151-185

2020

Shafqat Virk, Harald Hammarström, Markus Forsberg, Søren Wichmann (2020): The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World’s Languages, in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 11–16 May 2020 / Editors : Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Søren Wichmann, Shafqat Virk (2020): Towards a data-driven network of linguistic terms, in Swedish Language Technology Conference (SLTC)
Dana Dannélls, Shafqat Virk (2020): OCR Error Detection on Historical Text Using Uni-Feature and Multi-Feature Based Machine Learning Models, in Swedish Language Technology Conference (SLTC), 25-27 November 2020, University of Gothenburg
Shafqat Virk, Harald Hammarström, Lars Borin, Markus Forsberg, Søren Wichmann (2020): From Linguistic Descriptions to Language Profiles, in Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020). Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020 / Edited by : Maxim Ionov, John P. McCrae, Christian Chiarcos, Thierry Declerck, Julia Bosque-Gil, and Jorge Gracia

2019

Shafqat Virk, Azam Sheikh Muhammad, Lars Borin, Muhammad Irfan Aslam, Saania Iqbal, Nazia Khurram (2019): Exploiting frame semantics and frame-semantic parsing for automatic extraction of typological information from descriptive grammars of natural languages, in 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019

2018

Shafqat Virk, K.V.S Prasad (2018): Towards Hindi/Urdu FrameNets via the Multilingual FrameNet, in Proceedings of the LREC 2018 Workshop. International FrameNet Workshop 2018 : Multilingual Framenets and Constructicon, 12 May 2018 – Miyaza, Japan / Edited by Tiago Timponi Torrent, Lars Borin and Collin F. Baker
Lars Borin, Shafqat Virk, Anju Saxena (2018): Language technology for digital linguistics: Turning the Linguistic Survey of India into a rich source of linguistic information, in Lecture Notes in Computer Science. Computational Linguistics and Intelligent Text Processing, 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017
Per Malm, Shafqat Virk, Lars Borin, Anju Saxena (2018): LingFN: Towards a framenet for the linguistics domain, in Proceedings : LREC 2018 Workshop, International FrameNet Workshop 2018. Multilingual Framenets and Constructicons, May 12, 2018, Miyazaki, Japan / Edited by Tiago Timponi Torrent, Lars Borin and Collin F. Baker
Lars Borin, Shafqat Virk, Anju Saxena (2018): Many a little makes a mickle - infrastructure component reuse for a massively multilingual linguistic study, in Selected papers from the CLARIN Annual Conference 2017, Budapest, 18–20 September 2017

2017

Shafqat Virk, Lars Borin, Anju Saxena, Harald Hammarström (2017): Automatic extraction of typological linguistic features from descriptive grammars, in Text, Speech, and Dialogue 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017, Proceedings / edited by Kamil Ekštein, Václav Matoušek
Harald Hammarström, Shafqat Virk, Markus Forsberg (2017): Poor man's OCR post-correction: Unsupervised recognition of variant spelling applied to a multilingual document collection, in DATeCH2017, Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage, Göttingen, Germany — June 01 - 02, 2017

2016

Lars Borin, Shafqat Virk, Anju Saxena (2016): Towards a Big Data View on South Asian Linguistic Diversity, in WILDRE-3 – 3rd Workshop on Indian Language Data: Resources and Evaluation

2014

Shafqat Virk, K. V. S. Prasad, Aarne Ranta, Krasimir Angelov (2014): Developing an interlingual translation lexicon using WordNets and Grammatical Framework, in Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

2013

Shafqat Virk (2013): Computational Linguistics Resources for Indo-Iranian Languages

2012

K. V. S. Prasad, Shafqat Virk (2012): Computational evidence that Hindi and Urdu share a grammar but not the lexicon, in 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP)", collocated with COLING 12
Olga Caprotti, Aarne Ranta, Krasimir Angelov, Ramona Enache, John J. Camilleri, Dana Dannélls, Grégoire Détrez, Thomas Hallgren, K. V. S. Prasad, Shafqat Virk (2012): High-quality translation: Molto tools and applications, in The fourth Swedish Language Technology Conference (SLTC)
Shafqat Virk (2012): Computational Grammar Resources for Indo-Iranian Languages
Shafqat Virk, ELNAZ ABOLAHRAR (2012): An Open Source Persian Computational Grammar, in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)

2011

Shafqat Virk, Muhammad Humayoun, Aarne Ranta (2011): An Open-Source Punjabi Resource Grammar, in Proceedings of RANLP-2011, Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September, 2011, pages 70-76

2010

Shafqat Virk, Muhammad Humayoun, Aarne Ranta (2010): An Open Source Urdu Resource Grammar, in Proceedings of the 8th Workshop on Asian Language Resources (Coling 2010 workshop)

Position

Researcher

Academic title

associate professor

Email

shafqat.virk@svenska.gu.se

Phone number

+46 (0)31 786 1093

Blog and news items

2020-04-07 Blog: A multilingual annotated corpus of world's natural language descriptions

Page manager: sb-webb