The Swedish language of the Middle Ages, Old Swedish (ca 1225-1526), is preserved in manuscripts, letters and early print. These documents are valuable for a wide variety of researchers, such as linguists interested in Swedish language changes during that time, law scholars who want to explore mediaeval laws, theologians who study early translations of bible texts, or medical historians who are interested in mediaeval folk healing.
In the MAþiR -- Methods for the automatic Analysis of Text in digital Historical Resources -- project, we create tools for automatic linguistic analysis of Old Swedish. The project is related to Språkbanken's historical resource efforts, Diabase, and lies in the field of computational linguistics and natural language processing. By adding grammatical information to digitized Old Swedish texts, we can facilitate studies of this cultural heritage and enable new ways to explore it.
Developing tools for Old Swedish is a demanding task, even with the best computational linguistic methods, due to properties of the Old Swedish texts. First, the language of the time was changing, regarding e.g. word order and inflection. Second, there was no orthographical standard, in the modern sense. The same word could be spelled in many different ways. The word "maþir", meaning man ord human, was e.g. also spelled "mæþr", "mander" or "meþer". Different spellings were even present in the same paragraph. Third, the language varies between the texts, as 300 years have passed between the earliest and the latest texts, and they come from different geographical areas and are of different genres. Fourth, most automatic methods require either a very detailed computational description of the language, or a large amount of text, which has already been linguistically annotated, to be used as training material for the computer. None of this is currently available for Old Swedish. The core of the MAÞiR-project is exploring ways of handling these challenges in the Old Swedish texts.