Skip to main content
Språkbanken Text is a part of Språkbanken.

SIC2 - Stockholm Internet Corpus

Data citation Information

Östling, Robert, Sjons, Johan, Bjerva, Johannes, & Berdicevskis, Aleksandrs (2020). SIC2 - Stockholm Internet Corpus (updated: 2020-11-25). [Data set]. Språkbanken Text. https://doi.org/10.23695/se5f-d274
BibTeX Additional ways to cite the dataset.
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.

The Stockholm Internet Corpus 2 (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities. Annotation was done by Robert Östling, Johan Sjons and Johannes Bjerva. Version 2 was created by Aleksandrs Berdicevskis by making minor changes in the annotation and the format (see below). The original version 1 can be found here. Version 2 uses an extended CoNLL-U format. See more in the readme. The corpus is distributed under the Creative Commons Attribution-ShareAlike 3.0 Unported license.

Accessible through

Access Platform Licence
CC BY 4.0

Download

File Size Modified Licence
sic2.xml.bz2
corpus Information (XML)
262.36 KB 2020-11-25 CC BY 4.0
stats_sic2.csv
Word statistics: Information (CSV)
177.44 KB 2021-08-12 CC BY 4.0
sic2.zip
corpus Information (XML)
CC BY 4.0
2.18 KB 2020-11-17 CC BY 4.0

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Sentences: 892
Tokens: 13,562

Creators

  • Östling, Robert
  • Sjons, Johan
  • Bjerva, Johannes
  • Berdicevskis, Aleksandrs

Updated

2020-11-25

Contact

sb-info@svenska.gu.se