The project En fri molntjänst för OCR `A free cloud service for OCR', funded by the National Library of Sweden (51-KB709-2012), ran from 1st September, 2013 until 31st August, 2014. The project was a collaboration between the University Library of University of Gothenburg and Språkbanken 'the Swedish Language Bank' at the Department of Swedish of University of Gothenburg, its aim to create a prototype Optical Character Recognition (OCR) web service for processing old Swedish texts that are printed in a blackletter (fraktur) or roman typeface, using one of two open source OCR engines. Our ultimate goal is to provide a service for libraries, museums and archives to upload any digitized document and retrieve an OCRed text with high quality, independent on the quality of the print.
In the project, we have evaluated two open source OCR engines: OCRopus and Tesseract and further developed one of them, namely OCRopus. Using OCRopus we have set up a open webservice for OCR that
can handle Swedish Blackletter print as well as Roman type print. The pilot cloud service and web API can be found here. The material and tools developed in the project are freely available for download from this site under the license CC-BY.
Extensions to OCRopus
Trained OCRopus character models for Swedish
Trained Tesseract character models for Swedish
Borin, Lars and Bouma, Gerlof and Dannélls, Dana (2016): A free cloud service for OCR / En fri molntjänst för OCR. (GU-ISS 2016-01) Department of Swedish, University of Gothenburg.