Hoppa till huvudinnehåll


	title        = {Arabicized and Romanized Berber Automatic Identification},
	abstract     = {We present an automatic language identification tool for both Arabicized Berber (Berber written in the Arabic script) and Romanized Berber (Berber written in the Latin script). The focus is on short texts (social media content). We use supervised machine learning method with character and word-based n-gram models as features. We also describe the corpora used in this paper. For both Arabicized and Romanized Berber, character-based 5-grams score the best giving an F-score of 99.50%.},
	booktitle    = {Proceedings of TICAM 2016},
	author       = {Adouane, Wafia and Semmar, Nasredine  and Johansson, Richard},
	year         = {2016},
	publisher    = {IRCAM},
	address      = {Morocco},