Background
The project En fri molntjänst för OCR `A free cloud service for OCR', funded by the National Library of Sweden (51-KB709-2012), ran from 1st September, 2013 until 31st August, 2014. The project was a collaboration between the University Library of University of Gothenburg and Språkbanken at the Department of Swedish, Multilingualism, Language Technology of University of Gothenburg.
Project description
It aims to create a prototype Optical Character Recognition (OCR) web service for processing old Swedish texts that are printed in a blackletter (fraktur) or roman typeface, using one of two open source OCR engines. Our ultimate goal is to provide a service for libraries, museums and archives to upload any digitized document and retrieve an OCRed text with high quality, independent on the quality of the print.
In the project, we have evaluated two open source OCR engines: OCRopus and Tesseract and further developed one of them, namely OCRopus. Using OCRopus we have set up a open webservice for OCR that can handle Swedish Blackletter print as well as Roman type print. The pilot cloud service and web API can be found here. The material and tools developed in the project are freely available for download from this site under the license CC-BY.
Institutes/organisations
Språkbanken
Universitetsbliblioteket