Site map

Taraka Rama

PhD Student


I am enrolled in GSLT as a PhD student since Spetember 16th, 2010. My interests are in creation of language resources, unsupervised methods and computational historical linguistics. I am also a member at CLT, the Centre for Language Technology at Göteborg. I am part of Digital Areal Linguistics project DAL.



Comparative evaluation of string similarity measures for automatic language classification.


Properties of phoneme N -grams across the world’s language families.

Publications before 2010

A Computational Model of the Phonetic Space and Its Applications. In process

Anil Kumar Singh, Sethuramalingam Subramaniam and Taraka Rama. 2010 Transliteration as Alignment vs. Transliteration as Generation for the Purpose of Crosslingual Information Retrieval. Traitement Automatique des Langues, Special Issue on Multilingualism and NLP. Vol. 51, Number 2. 2010. [pdf][Bibtex]

Taraka Rama, Sudheer Kolachina and Lakshmi Bai B. 2009 Quantitative methods for Phylogenetic Inference in Historical Linguistics: An experimental case study of South Central Dravidian. Indian Linguistics, Vol. 70, 2009.[pdf]

Karthik Gali, Sriram Venkatapathy and Taraka Rama. 2009 From Factorial to Quadtratic Time Complexity for Sentence Realization using Nearest Neighbour Algorithm. STIL 2009, Brazil

Taraka Rama, Anil Kumar Singh. 2009 From Bag of Languages to Family Trees from Noisy Corpus. RANLP 2009, Borovets, Bulgaria.[pdf]

Taraka Rama, Karthik Gali. 2009 Modeling Transliteration as a Phrase Based Statistical Machine Translation Problem, NEWS 2009, ACL-IJCNLP 2009, Singapore [pdf]

Taraka Rama, Anil Kumar Singh and Sudheer Kolachina. 2009 Modeling Letter to Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training NAACL HLT 2009 Student Research Workshop, Boulder, Colorado, USA[pdf]

Taraka Rama, Karthik Gali and Avinesh PVS. 2008 Does Syntactic Knowledge help English-Hindi SMT ? Procedings of the NLP Tools contest, ICON 2008.[pdf]


I maintain this page on references to Computational historical linguistics. It is improved and worked on everyday.

I worked on cognate identification and phylogeny in my Masters' thesis. The contributions of this thesis are as follows:

  • Applied SMT techniques to transliteration shared task (IJCNLP 2009) for the language pair Hindi-English
  • Applied SMT techniques to the Letter to Phoneme task for English, German, and French.
  • Applied phylogenetic techniques such as Maximum Parsimony and Bayesian techniques to the character data of South-Central Dravidian languages.
  • Developed a linguistically well-formulated model for Indian languages and applied it to the corpora for computing the distance between 9 Indian languages. The model can be applied to any phonetic transcription system.
  • The outcome of the work on Dravidian languages is that the work formed the foundation to provide data from the Etymological dictionary of Dravidian languages.

I am a member of ASJP Consortium.

My first supervisor is Lars Borin.
My second supervisor is Søren Wichmann
My third supervisor is Markus Forsberg

I spent the spring of 2012 at Max Planck Institute for Evolutionary Anthropology, Leipzig.

I spent two months (March and April of 2013) as a Guest Researcher at Alpha-Informatica

There is an interesting open-source NLP toolkit for South Asian Languages Sanchay


Taraka Rama 2014. Vocabulary lists in computational historical linguistics University of Gothenburg BibTeX

Taraka Rama, Lars Borin 2014. N-Gram Approaches to the Historical Dynamics of Basic Vocabulary Journal of Quantitative Linguistics [pdf] BibTeX

Taraka Rama, Prasanth Kolachina, Sudheer Kolachina 2013. Two methods for automatic cognate identification. Quantitative Investigations in Theoretical Linguistics. 76 [pdf] BibTeX

Taraka Rama, Sudheer Kolachina 2013. Distance-based Phylogenetic Inference Algorithms in the Subgrouping of Dravidian Languages Approaches to Measuring Linguistic DifferencesDe Gruyter Mouton BibTeX

Taraka Rama 2013. Phonotactic diversity predicts the time depth of the world\'s language families PLoS ONE [pdf] BibTeX

Taraka Rama, Prasanth Kolachina 2012. How good are typological distances for determining genealogical relationships among languages? Proceedings of the 24th International Conference on Computational Linguistics [pdf] BibTeX

Taraka Rama 2012. N-gram approaches to the historical dynamics of basic vocabulary Preproceedings of Computational approaches to the study of dialectal and typological variation [pdf] BibTeX

Søren Wichmann, Eric Holman, Taraka Rama, Robert S. Walker 2012. Correlates of reticulation in linguistic phylogenies Language Dynamics and Change. [pdf] BibTeX

Taraka Rama, Lars Borin 2012. Properties of phoneme N -grams across the world’s language families Proceedings of the Fourth Swedish Language Technology Conference (SLTC) BibTeX

Taraka Rama, Lars Borin 2011. Estimating Language Relationships from a Parallel Corpus. A Study of the Europarl Corpus NEALT Proceedings Series (NODALIDA 2011 Conference Proceedings). 161-167 [pdf] BibTeX

Søren Wichmann, Taraka Rama, Eric Holman 2011. Phonological diversity, word length, and population sizes across languages: The ASJP evidence Linguistic Typology [pdf] BibTeX

Sudheer Kolachina, Taraka Rama, Lakshmi Bai 2011. Maximum parsimony method in the subgrouping of Dravidian languages Quantitative Investigations in Theoretical Linguistics [pdf] BibTeX

All publications as BibTeX

Taraka Rama. Presentation at Licentiate Seminar. January 2014.[pdf]

Taraka Rama. Comparative study of string similarity and vector similarity measures for Bulgarian dialect classification. Poster at Language Diversity Congress, Computational Issues in Studying Language Diversity: Storage, Analysis and Inference. July 2013.

Taraka Rama, Prasanth Kolachina. How good are typological distances for determining genealogical relationships among languages? Poster in COLING 2012, IIT Mumbai[pdf]

Taraka Rama. N-gram approaches to the historical dynamics of basic vocabulary. ESSLI workshop on computational approaches to typological and dialectogical variation. 2012 [pdf]

Taraka Rama (Joint work with Søren Wichmann and Eric W. Holman). Correlates of reticulation in linguistic phylogenies. CLT fall seminar. 2012 [pdf]

Taraka Rama, Lars Borin. Properties of phoneme N-grams across the world’s language families. Fourth Swedish Language Technology Conference, University of Lund. 2012 [pdf]

First year at GSLT. 2011 [pdf]

Taraka Rama, Sudheer Kolachina. Distance-based algorithms in the subgrouping of Dravidian languages. Workshop on comparing approaches to measuring linguistic differences. 2011 [pdf]

Søren Wichmann, Taraka Rama, Eric W. Holman. Phonological diversity, Mean word length and Population Sizes across worlds' languages. CLT Retreat 2011 [pdf]

Sudheer Kolachina, Taraka Rama. Revisiting Unchanged Cognates as criterion in Linguistic Subgrouping. ICHL, Osaka 2011.[pdf]

Taraka Rama, Lars Borin. Estimating language distances from parallel corpus. A study of Europarl corpus NODALIDA 2011. Latvia. [pdf]

Sudheer Kolachina, Taraka Rama, Lakshmi Bai. Maximum Parsimony for subgrouping in Dravidian. QITL, Berlin 2011. [pdf]

Taraka Rama. Explorations in Phoneme N-grams for Automatic Language Classification CLT Seminar, March 2011. [pdf]

Contact information

Taraka Rama

Språkbanken, Göteborgs universitet, Box 200
405 30 Göteborg

Visiting address:
Lennart Torstenssonsgatan 8

070 3728519

Contact form

© University of Gothenburg 2009, Box 100, 405 30 Gothenburg, Sweden
Tel +46 31 786 0000, Contact

About the site