Skip to main content

sv-COVID-19

A compilation of various articles related to the COVID-19 pandemic
sv-covid-19 is a collection of Swedish news texts, scientific and popular science articles and articles from certain blogs and social media wuch as Flashback and Twitter, which started to be published at the beginning of the coronavirus pandemic (early 2020). The latest verision of the corpus consists of approximately eight million words and 9000 articles. The corpus contains various text types and texts with different stylistic levels. The texts have been marked up with word class tags, morphological analysis and lemma, as well as some structural and functional information, such as author names.

References

File Size Modified Licence
sv-covid-19.xml.bz2
this file contains a scrambled version of the corpus Information (XML)
200.6 MB 2023-05-29 CC BY 4.0
attribution
stats_sv-covid-19.csv
Word statistics: Information (CSV)
12.47 MB 2023-05-29 CC BY 4.0
attribution

Type

  • Corpus

Language

Swedish

Size

Sentences: 488,246
Tokens: 8,130,201

Keywords

  • news texts
  • social media
  • scientific articles
  • medical articles

Contact

Språkbanken
sb-info@svenska.gu.se