Many efforts to handle older historical language materials will run into the combined problems of having a low resource language with high amounts of variation. With "low resource", we mean that there are no or few electronic resources like annotated corpora, computational morphologies, parsers, etc, that we may use as processing tools or to develop such tools. "High variation" refers to differences between materials in terms of orthography, punctuation conventions, vocabulary, morphological distinctions, word order, etc, which for instance may be due to a low level of standardization, but also to materials that are nominally from the same language stage lying far apart in time and/or space. Either of these problems on their own will present a challenge for standard data-driven techniques, but together they make their employment really problematic.
Even though a historical language may be considered low-resource from a natural language processing point of view, it may be an actively studied language and well described in grammars and dictionaries, in fields like philology and historical linguistics. This suggests that processing historical material should not just be a matter of trying to overcome our field's technical/methodological challenges, but also of crossing into other fields studying these materials and collaborating with their experts, so that we may take advantage of their expertise and resources. These latter, however, typically belong to a much more knowledge-driven tradition than the data-driven models that are dominant in present-day natural language processing.
In this half-day workshop, we aim to bring together researchers working on processing historical materials, with a particular focus on work that investigates the combination of data-driven and knowledge-driven modelling. With "processing", we mean a wide range of text processing tasks from different angles and at different levels, be it creating transcriptions and editions of manuscripts, constructing lexica, tagging, parsing, or content-oriented processing such as semantic parsing, information extraction, etc.
The workshop is open for researchers working on historical material from any language area, and for researchers in computational linguistics as well as computational philology. Our primary focus is on older historical text, for instance from times corresponding to the European medieval period or older, however, we welcome contributions on newer historical material, too.
The workshop is organized in conjunction with Nodalida 2017 in Gothenburg in the afternoon of May 22.
We are happy to announce that Michelle Waldispühl (University of Gothenburg) will be giving a keynote speech.
The workshop on Processing Historical Language is organized as part of the Marcus and Amalia Wallenberg Foundation funded project Methods for the automatic analysis of text in digital historical resources (MAW 2012.0146 to Gerlof Bouma and Yvonne Adesam).
The organizers can be contacted at firstname.lastname@example.org. This address should not be used for submissions, which are handled through example.org See the instructions for authors in the call for papers.