Skip to main content
Språkbanken Text is a department within Språkbanken.

sv-COVID-19

Citation Information

Språkbanken Text (2023). sv-COVID-19 (updated: 2023-05-29). [Data set]. Språkbanken Text. https://doi.org/10.23695/k6fh-4f59
BibTeX Additional ways to cite the dataset.
A compilation of various articles related to the COVID-19 pandemic

sv-covid-19 is a collection of Swedish news texts, scientific and popular science articles and articles from certain blogs and social media wuch as Flashback and Twitter, which started to be published at the beginning of the coronavirus pandemic (early 2020). The latest verision of the corpus consists of approximately eight million words and 9000 articles. The corpus contains various text types and texts with different stylistic levels. The texts have been marked up with word class tags, morphological analysis and lemma, as well as some structural and functional information, such as author names.

References

File Size Modified Licence
sv-covid-19.xml.bz2
this file contains a scrambled version of the corpus Information (XML)
200.6 MB 2023-05-29 CC BY 4.0
attribution
stats_sv-covid-19.csv
Word statistics: Information (CSV)
12.47 MB 2023-05-29 CC BY 4.0
attribution

Type

  • Corpus

Language

Swedish

Size

Sentences: 488,246
Tokens: 8,130,201

Keywords

  • news texts
  • social media
  • scientific articles
  • medical articles

Updated

2023-05-29

Contact

Språkbanken
sb-info@svenska.gu.se