Skip to main content
Språkbanken Text is a part of Språkbanken.

Kubhist

Citation Information

Språkbanken Text. Kubhist [Data set]. Språkbanken Text. https://doi.org/10.23695/h7qd-bj40
BibTeX Additional ways to cite the dataset.
Diachronic collection of Swedish historical newspaper texts from the period of 1749–1926

The corpus collection Kubhist is a result of the project Digidaily which was run between 2010 and 2014.

The collection is split into parts, by source periodical and decade.

Annotation

The data in Kubhist is enriched with some metadata, such as the price of an issue, printing location, periodicity, political tendency and publication timespan. This information is collected from the Nya Lundstedt dagstidningar database at the National Library of Sweden.

Caveats

Text quality is highly variable, due in part to uneven printing and stains on the OCRed originals. Many issues open with strange character sequences resulting from the OCR attempting to interpret title ornaments as text. The digital text gets readable as soon as it reaches the part of the page where the articles start.

References

Datasets in this collection

Type

  • Corpus
  • Collection

Language

Swedish

Size

Resources: 78

Contact

Språkbanken Text
sb-info@svenska.gu.se