Skip to main content
Språkbanken Text is a part of Språkbanken.

Project Activities

Organization of international workshops

  1. International workshop on typological profiles of language families of South Asia, Uppsala University, 15-16 September 2016..
  2. Grammar Data Mining (GDM): Extracting Linguistic Features From Grammatical Descriptions September 5-6, 2019 - Varna, Bulgaria

Invited presentations

  1. Anju Saxena is invited to deliver a plenary lecture at the 32nd South Asian languages analysis roundtable SALA-32 with a title "Indo-Aryan in typological and areal perspective". Venue: Universidade de Lisboa, Portugal. 27-29 April 2016.
  2. Anju Saxena is invited to deliver a keynote lecture at the International Workshop on Munda Linguistics with a title "Critical evluation of the Munda influence on Himalayan languages". Venue: Deccan College, India. 15-16 March, 2017
  3. Anju Saxena. South Asia as a linguistic area? Exploring big-data methods in areal and genetic linguistics – A project presentation. Christian-Albrechts-Universität zu Kiel. 19 April 2018


  1. Shafqat Virk, Harald Hammarström, Lars Borin, Markus Forsberg, Søren Wichmann (2020): From Linguistic Descriptions to Language Profiles, in Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020). Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020 / Edited by : Maxim Ionov, John P. McCrae, Christian Chiarcos, Thierry Declerck, Julia Bosque-Gil, and Jorge Gracia BibTeX
  2. Shafqat Virk, Azam Sheikh Muhammad, Lars Borin, Muhammad Irfan Aslam, Saania Iqbal, Nazia Khurram (2019): Exploiting frame semantics and frame-semantic parsing for automatic extraction of typological information from descriptive grammars of natural languages, in 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019 BibTeX
  3. Shafqat Virk, Azam Sheikh Muhammad, Lars Borin, Muhammad Irfan Aslam, Saania Iqbal, Nazia Khurram (2019): Exploiting frame semantics and frame-semantic parsing for automatic extraction of typological information from descriptive grammars of natural languages, in 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019 BibTeX
  4. Lars Borin, Shafqat Virk, Anju Saxena (2018): Language technology for digital linguistics: Turning the Linguistic Survey of India into a rich source of linguistic information, in Lecture Notes in Computer Science. Computational Linguistics and Intelligent Text Processing, 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017 BibTeX
  5. Lars Borin, Shafqat Virk, Anju Saxena (2018): Many a little makes a mickle - infrastructure component reuse for a massively multilingual linguistic study, in Selected papers from the CLARIN Annual Conference 2017, Budapest, 18–20 September 2017 BibTeX
  6. Per Malm, Shafqat Virk, Lars Borin, Anju Saxena (2018): LingFN: Towards a framenet for the linguistics domain, in Proceedings : LREC 2018 Workshop, International FrameNet Workshop 2018. Multilingual Framenets and Constructicons, May 12, 2018, Miyazaki, Japan / Edited by Tiago Timponi Torrent, Lars Borin and Collin F. Baker BibTeX
  7. Shafqat Virk, K.V.S Prasad (2018): Towards Hindi/Urdu FrameNets via the Multilingual FrameNet, in Proceedings of the LREC 2018 Workshop. International FrameNet Workshop 2018 : Multilingual Framenets and Constructicon, 12 May 2018 – Miyaza, Japan / Edited by Tiago Timponi Torrent, Lars Borin and Collin F. Baker BibTeX
  8. Per Malm, Malin Ahlberg, Dan Rosén (2018): Uneek: a Web Tool for Comparative Analysis of Annotated Texts, in Proceedings of the LREC 2018 Workshop International FrameNetWorkshop 2018: Multilingual Framenets and Constructicons, 7-12 May 2018, Miyazaki (Japan) / [ed] Tiago Timponi Torrent, Lars Borin & Collin F. Baker, 2018 BibTeX
  9. Harald Hammarström, Shafqat Virk, Markus Forsberg (2017): Poor man's OCR post-correction: Unsupervised recognition of variant spelling applied to a multilingual document collection, in DATeCH2017, Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage, Göttingen, Germany — June 01 - 02, 2017 BibTeX
  10. Shafqat Virk, Lars Borin, Anju Saxena, Harald Hammarström (2017): Automatic extraction of typological linguistic features from descriptive grammars, in Text, Speech, and Dialogue 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017, Proceedings / edited by Kamil Ekštein, Václav Matoušek. BibTeX
  11. Saxena, Anju (ed.) 2017. Typological profiles of language families of South Asia. Special theme issue. Journal of South Asian languages and linguistics 4.1.
  12. Lars Borin, Shafqat Virk, Anju Saxena (2016): Towards a Big Data View on South Asian Linguistic Diversity, in WILDRE-3 – 3rd Workshop on Indian Language Data: Resources and Evaluation BibTeX

Conference and workshop presentations

  1. Lars Borin, Anju Saxena, Bernard Comrie, Shafqat Mumtaz Virk. South Asian linguistic relationships seen through the LSI. Workshop on South Asia as a “Sprachbund”? Advances in the study of language contact in South Asia. SALA-35. South Asian languages analysis round table. INALCO-Paris. 29-31 October 2019
  2. Shafqat Mumtaz Virk, Azam Sheikh Muhammad, Lars Borin, Muhammad Irfan Aslam, Saania Iqbal, and Nazia Khurram, Exploiting Frame Semantics and Frame-Semantic Parsing for Auto- matic Extraction of Typological Information from Descriptive Grammars of Natural Languages, Proceedings of the Recent Advances in Natural Language Processing (RANLP), 2-6 September 2019, Varna Bulgaria.
  3. Anna Sjöberg and Anju Saxena. The use of the copula in non-copula constructions of South Asia. SLE 2019. 52nd annual meeting of the Societas Linguistica Europaea. Leipzig University. 21-24 August 2019
  4. Anna Sjöberg and Anju Saxena. The use of the copula in non-copula constructions in the languages of South Asia. ALT19. The 13rd conference of the Association for Linguistic Typology. University of Pavia. 4-6 September 2019
  5. Shafqat Mumtaz Virk, Per Malm, Lars Borin, Anju Saxena: LingFN: A framenet for the linguistic domain. 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), April 7 to 13, 2019, La Rochelle, France.
  6. Per Malm, Shafqat Mumtaz Virk, Lars Borin, Anju Saxena: LingFN: Towards a framenet for the linguistics domain. FrameNet Workshop 2018: Multilingual FrameNets and Constructicons, collocated with LREC, in Miyazaki, Japan, 2018.
  7. Per Malm, Malin Ahlberg, Dan Rosén: Uneek: A web tool for linguistic analysis. The International FrameNet Workshop 2018: Multilingual FrameNets and Constructicons, collocated with LREC, in Miyazaki, Japan, 2018.
  8. Shafqat Mumtaz Virk, and K.V.S. Prasad. Towards Hindi/Urdu FrameNets via the Multilingual FrameNet. The International FrameNet Workshop 2018: Multilingual FrameNets and Constructicons, collocated with LREC, in Miyazaki, Japan, 2018.
  9. Harald Hammarström, Shafqat Virk and Markus Forsberg, Poor Man’s OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection DATeCH International Conference and the Digitisation Days, 1-2 June 2017, Göttingen Germany.
  10. Shafqat Mumtaz Virk, Lars Borin, Anju Saxena and Harald Hammarström, Automatic Extraction of Typological Linguistic Features from Descriptive Grammars, 20th International Conference on Text, Speech and Dialogue (TSD) Aug 27, 2017 - Aug 31, 2017 Prague
  11. Lars Borin, Shafqat Mumtaz Virk and Anju Saxena. Language Technology for Digital Linguistics: Turning the Linguistic Survey of India Into a Rich Source of Linguistic Information, 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), April 17 to 23, 2017, Budapest, Hungary.
  12. Lars Borin, Shafqat Mumtaz Virk and Anju Saxena. Exploring Big-Data Methods for Large-Scale Comparative Linguistic Research. The Sixth Swedish Language Technology Conference (SLTC) Umeå University, 17-18 November, 2016.
  13. Anju Saxena and Lars Borin. Indo-Aryan within an Areal Perspective. International workshop on typological profile of language families of South Asia. Uppsala University. 2016.
  14. Borin, Lars; Shafqat Mumtaz Virk and Anju Saxena. Towards a Big Data View on South Asian Linguistic Diversity. 3rd Workshop on Indian Language Data Resource and Evaluation. LREC 2016. Slovenia. 24 May 2016.

Project meetings and collaborative activities

  1. Project Meetings

    1. September 2015: University of Iceland. Participants: Lars Borin, Anju Saxena, Bernard Comrie, Peter Austin, Auður Hauksdóttir
    2. January 2016: Uppsala University. Participants: Lars Borin, Anju Saxena, Shafqat Mumtaz Virk
    3. May 2016: University of Iceland. Participants: Lars Borin, Anju Saxena, Bernard Comrie, Peter Austin, Auður Hauksdóttir
    4. September 2016: Uppsala University and University of Gothenburg. Participants: Lars Borin, Bernard Comrie, Scott DeLancey, Colette Grinevald, Anju Saxena, Shafqat Mumtaz Virk
    5. April 2017. University of Iceland. Participants: Lars Borin, Bernard Comrie, Anju Saxena. Theme: questionnaire for the second round

  2. In addition, GU/UU members have project meetings once every week/fortnight.
  3. Collaborative Activities

    1. The project has been contacted by Thórhallur Eythórsson (University of Iceland) for collaboration. Meetings: September 2015, May 2016. This has resulted in student funding for three years with Thórhallur Eythórsson as the Project leader.

Master theses

  1. Anna Sjöberg. The use of the copula in non-copula constructions in the languages of South Asia. Uppsala University. 2018
  2. Muhammad Irfan, Semantic Frame Based Automatic Extraction of Typological Information from Descriptive Grammars, Institutionen f ̈or kommunikation och information Examensarbete i datavetenskap, Högskolan i Skövde. May 2019.
  3. Daniel Foster, Automatic Frame-Semantic Parsing for Linguistic Descriptions:Extracting typological linguistic information from unstructured text, MS Thesis, Department of Swedish, University of Gothenburg, September, 2019.

Other presentations

  1. Shafqat Mumtaz Virk, Lars Borin, and Anju Sexena: South Asia as a Linguistic Area? Exploring Big-Data Methods in Areal and Genetic Linguistics, Culturomics Partner Meeting, 2016-04-21, Språkbanken, Gothenburg.
  2. Lars Borin, Shafqat Mumtaz Virk, Anju Saxena, South Asia as a linguistic area? Exploring big-data methods in areal and genetic linguistics, Språkbanken Kick-off Meeting, 2016-08-25, Gothenburg.
  3. Shafqat Mumtaz Virk, Lars Borin, and Anju Sexena: Linguistic information extraction and visualization: Some preliminary results from the LSI, Culturomics Partner Meeting, 2016-11-24, Chalmers University, Gothenburg.
  4. Building linguistic maps for South Asian languages. Wogel seminar. Department of Linguistics and Philology, Uppsala University. 16 March 2018
  5. LingFN: Towards a FrameNet for the Linguistics Domain, GIFT University, Punjab Pakistan, 2019-01-05.
  6. LingFN: a FrameNet for the Linguistics Domain, CLT-Retreat 2019-05-08.