Natural Language Processing
I am enrolled at GSLT as a PhD student since Spetember 16th, 2010. My interests are in computational historical linguistics, creation of language resources, and unsupervised methods. I am also affiliated with Centre for Language Technology in Göteborg.
I am a research assistant in Digital Areal Linguistics project.
I am a member of ASJP Consortium.
My google scholar page has full list of publications.
I am writing my thesis and am interested in taking up a research position in the next two years. I am broadly interested in language evolution and collecting more data and applying computationally intensive methods for modeling language change.
I contribute code to the Quantitative historical linguistics project. Clone it here.
with Søren Wichmann. Jackknifing the black sheep: ASJP classification performance and Austronesian. For the proceedings of the symposium "Let's talk about trees", National Museum of Ethnology, Osaka, Febr. 9-10, 2013. pdf
with Lars Borin. Comparative evaluation of string similarity measures for automatic language classification. pdf
(Single author) Gap-weighted subsequences for automatic cognate identification and phylogenetic inference. pdf
with Lars Borin. Properties of phoneme N -grams across the world’s language families. pdf
A Computational Model of the Phonetic Space and Its Applications. In process
Anil Kumar Singh, Sethuramalingam Subramaniam and Taraka Rama. 2010 Transliteration as Alignment vs. Transliteration as Generation for the Purpose of Crosslingual Information Retrieval. Traitement Automatique des Langues, Special Issue on Multilingualism and NLP. Vol. 51, Number 2. 2010. [pdf][Bibtex]
Taraka Rama, Sudheer Kolachina and Lakshmi Bai B. 2009 Quantitative methods for Phylogenetic Inference in Historical Linguistics: An experimental case study of South Central Dravidian. Indian Linguistics, Vol. 70, 2009.[pdf]
Karthik Gali, Sriram Venkatapathy and Taraka Rama. 2009 From Factorial to Quadtratic Time Complexity for Sentence Realization using Nearest Neighbour Algorithm. STIL 2009, Brazil
Taraka Rama, Anil Kumar Singh. 2009 From Bag of Languages to Family Trees from Noisy Corpus. RANLP 2009, Borovets, Bulgaria.[pdf]
Taraka Rama, Karthik Gali. 2009 Modeling Transliteration as a Phrase Based Statistical Machine Translation Problem, NEWS 2009, ACL-IJCNLP 2009, Singapore [pdf]
Taraka Rama, Anil Kumar Singh and Sudheer Kolachina. 2009 Modeling Letter to Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training NAACL HLT 2009 Student Research Workshop, Boulder, Colorado, USA[pdf]
Taraka Rama, Karthik Gali and Avinesh PVS. 2008 Does Syntactic Knowledge help English-Hindi SMT ? Procedings of the NLP Tools contest, ICON 2008.[pdf]
I maintain this page on references to Computational historical linguistics. http://lingphylo.wikidot.com/start
I worked on cognate identification and phylogeny for my Masters' thesis.
The contributions of this thesis are as follows:
I spent the spring of 2012 at Max Planck Institute for Evolutionary Anthropology, Leipzig.
I spent two months (March and April of 2013) as a Guest Researcher at Alpha-Informatica
There is an interesting open-source NLP toolkit for South Asian Languages Sanchay
Here is the bibliography of my licentiate thesis. I tried to link the recent work in computational historical linguistics with work done in traditional historical linguistics. The thesis in its complete form is not still available for download. The thesis is composed of a Kappa and five papers. I tried to include many examples from Dravidian languages. If ever you cite it, please cite as bibtex.
Taraka Rama, Lars Borin 2015.
Comparative evaluation of string similarity measures for automatic language classification.
Sequences in Language and TextDe Gruyter Mouton
Taraka Rama, Lars Borin 2014.
N-Gram Approaches to the Historical Dynamics of Basic Vocabulary
Journal of Quantitative Linguistics
Taraka Rama 2014.
Vocabulary lists in computational historical linguistics
University of Gothenburg
Lars Borin, Anju Saxena, Taraka Rama, Bernard Comrie 2014.
Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics
Proceedings of LREC 2014. 3137-3144
Taraka Rama, Sudheer Kolachina 2013.
Distance-based Phylogenetic Inference Algorithms in the Subgrouping of Dravidian Languages
Approaches to Measuring Linguistic DifferencesDe Gruyter Mouton
Taraka Rama 2013.
Phonotactic diversity predicts the time depth of the world\'s language families
Taraka Rama, Prasanth Kolachina 2012.
How good are typological distances for determining genealogical relationships among languages?
Proceedings of the 24th International Conference on Computational Linguistics
Taraka Rama, Lars Borin 2012.
Properties of phoneme N -grams across the world’s language families
Proceedings of the Fourth Swedish Language Technology Conference (SLTC)
Søren Wichmann, Eric Holman, Taraka Rama, Robert S. Walker 2012.
Correlates of reticulation in linguistic phylogenies
Language Dynamics and Change.
Taraka Rama 2012.
N-gram approaches to the historical dynamics of basic vocabulary
Preproceedings of Computational approaches to the study of dialectal and typological variation
Søren Wichmann, Taraka Rama, Eric Holman 2011.
Phonological diversity, word length, and population sizes across languages: The ASJP evidence
Sudheer Kolachina, Taraka Rama, Lakshmi Bai 2011.
Maximum parsimony method in the subgrouping of Dravidian languages
Quantitative Investigations in Theoretical Linguistics
Taraka Rama, Lars Borin 2011.
Estimating Language Relationships from a Parallel Corpus. A Study of the Europarl Corpus
NEALT Proceedings Series (NODALIDA 2011 Conference Proceedings). 161-167
I never got the chance to teach. I hope I will be able to teach sometime.
Taraka Rama. Presentation at Licentiate Seminar. January 2014.[pdf]
Taraka Rama. Comparative study of string similarity and vector similarity measures for Bulgarian dialect classiﬁcation. Poster at Language Diversity Congress, Computational Issues in Studying Language Diversity: Storage, Analysis and Inference. July 2013.
Taraka Rama, Prasanth Kolachina. How good are typological distances for determining genealogical relationships among languages? Poster in COLING 2012, IIT Mumbai[pdf]
Taraka Rama. N-gram approaches to the historical dynamics of basic vocabulary. ESSLI workshop on computational approaches to typological and dialectogical variation. 2012 [pdf]
Taraka Rama (Joint work with Søren Wichmann and Eric W. Holman). Correlates of reticulation in linguistic phylogenies. CLT fall seminar. 2012 [pdf]
Taraka Rama, Lars Borin. Properties of phoneme N-grams across the world’s language families. Fourth Swedish Language Technology Conference, University of Lund. 2012 [pdf]
First year at GSLT. 2011 [pdf]
Taraka Rama, Sudheer Kolachina. Distance-based algorithms in the subgrouping of Dravidian languages. Workshop on comparing approaches to measuring linguistic differences. 2011 [pdf]
Søren Wichmann, Taraka Rama, Eric W. Holman. Phonological diversity, Mean word length and Population Sizes across worlds' languages. CLT Retreat 2011 [pdf]
Sudheer Kolachina, Taraka Rama. Revisiting Unchanged Cognates as criterion in Linguistic Subgrouping. ICHL, Osaka 2011.[pdf]
Taraka Rama, Lars Borin. Estimating language distances from parallel corpus. A study of Europarl corpus NODALIDA 2011. Latvia. [pdf]
Sudheer Kolachina, Taraka Rama, Lakshmi Bai. Maximum Parsimony for subgrouping in Dravidian. QITL, Berlin 2011. [pdf]
Taraka Rama. Explorations in Phoneme N-grams for Automatic Language Classification CLT Seminar, March 2011. [pdf]