Skip to main content
Språkbanken Text is a department within Språkbanken.

Swedish newspapers 1818-1870

Citation Information

Språkbanken Text (2020). Swedish newspapers 1818-1870 (updated: 2020-05-26). [Data set]. Språkbanken Text. https://doi.org/10.23695/9bnq-xc71
BibTeX Additional ways to cite the dataset.
A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.

Svenska tidningar 1818–1870 contains a selection of digitized versions of Swedish newspapers from 1818 to 1870. It is part of the so called Kubhist corpus which was digitized at Kungliga biblioteket (KB). One newspaper was randomly selected from each year. For each newspaper two pages were selected, the second and fourth. All pages were automatically processed using advanced document layout analysis where each segment in the digitized page was framed and numbered. Each segment was processed with Abbyy FineReader version 11 and was manually transcribed by a transcription company who specializes in double-keying.

This particular subset contains 106 pages, 5,059 segments and 186,013 words in total.

It was produced as a part of the project Evaluation and refinement of an enhanced OCR-process for mass digitisation financed by RJ (dnr IN18-0940:1) for the period of 2019-2020.

File Size Modified Licence
458.22 MB 2020-05-26 CC BY 4.0
attribution

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Tokens: 186,013

Keywords

  • fraktur
  • historical newspapers
  • OCR
  • reference text

Updated

2020-05-26

Contact

Språkbanken Text
sb-info@svenska.gu.se