Corpus of Somali Wikipedia Summary Resource type Corpus Language Somali Tokens 869,335 Sentences 42,514 Download wikipedia-so.xml.bz2 corpus (XML) licence: CC BY 4.0 (attribution) stats_WIKIPEDIA-SO.txt token frequencies (CSV) licence: CC BY 4.0 (attribution)