Shafqat Mumtaz Virk

I am a researcher at Språkbanken, University of Gothenburg. My main research interests are in the areas of language engineering, computational linguistic resources development, semantic parsing, and information extraction. During my PhD, I worked with Grammatical Framework group at University of Gothenburg, and developed resource grammars for six Indo-Iranian languages including Hindi, Urdu, Persian, Punjabi, Nepali, and Sindhi. These grammars are in the form of software libraries and encode different syntactical and lexical aspects of those natural languages.

Recently, I have been mostly involved in semantic parsing and information extraction. As a postdoctorate researcher at Academia Sinica, Taiwan, I worked on a Propbank based semantic role labeling and an information extraction systems for English and Chineese. I was also involved in the French FrameNet project at IRIT, France, where we worked on a FrameNet based semantic role labeling system, and on the extension of the relational structure of English FrameNet.

At Språkbanken, I am involved in the South Asia as a Linguistic Area project, where a primary goal is do a systematic investigation of the claim that South Asia is a classic linguistic area. With the Grierson’s Linguistic Survey of India (LSI; 1903-1927) as a primary data source, we are exploring the use of computational methods for automatic extraction of useful information about various grammatical features of South Asian languages described in the LSI. Largely relying on semantic parsing and open information techniques, we are developing methodologies and/or tools which we expect to be useful for a wider audience.



  • Shafqat Virk, Harald Hammarström, Lars Borin, Markus Forsberg, Søren Wichmann (2020): From Linguistic Descriptions to Language Profiles, in Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020). Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020 / Edited by : Maxim Ionov, John P. McCrae, Christian Chiarcos, Thierry Declerck, Julia Bosque-Gil, and Jorge Gracia.
  • Shafqat Virk, Harald Hammarström, Markus Forsberg, Søren Wichmann (2020): The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World’s Languages, in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 11–16 May 2020 / Editors : Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis.









  • Shafqat Virk, Muhammad Humayoun, Aarne Ranta (2011): An Open-Source Punjabi Resource Grammar, in Proceedings of RANLP-2011, Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September, 2011, pages 70-76.


Show all publications as BibTeX
Shafqat Mumtaz Virk



  • +46 (0)31 786 1093