Skip to main content


A multilingual corpus of linguistic descriptions of the world's natural languages.

This resource contains a multilingual digitized version of thousands of documents describing natural languages of the world. The corpus is annotated with various meta, word, and text level attributes, and is password protected for copyright reasons. More details about the data and annotations can be found in the reference given below:

There is also an openly available part of the corpus which can be found here.

Standard reference:
Shafqat Virk, Harald Hammarström, Markus Forsberg, Søren Wichmann (2020): The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World’s Languages, in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 11–16 May 2020 / Editors : Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis BibTeX


  • Corpus




Sentences: 34,350,897
Tokens: 225,617,801


Språkbanken Text