Skip to main content
Språkbanken Text is a department within Språkbanken.

SIC2 - Stockholm Internet Corpus

Citation Information

Östling, Robert, Sjons, Johan, Bjerva, Johannes, & Berdicevskis, Aleksandrs (2020). SIC2 - Stockholm Internet Corpus (updated: 2020-11-25). [Data set]. Språkbanken Text. https://doi.org/10.23695/se5f-d274
BibTeX Additional ways to cite the dataset.
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.

The Stockholm Internet Corpus 2 (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities. Annotation was done by Robert Östling, Johan Sjons and Johannes Bjerva. Version 2 was created by Aleksandrs Berdicevskis by making minor changes in the annotation and the format (see below). The original version 1 can be found here. Version 2 uses an extended CoNLL-U format. See more in the readme. The corpus is distributed under the Creative Commons Attribution-ShareAlike 3.0 Unported license.

Accessible through

Download

File Size Modified Licence
sic2.xml.bz2
corpus Information (XML)
262.36 KB 2020-11-25 CC BY 4.0
attribution
stats_sic2.csv
Word statistics: Information (CSV)
177.44 KB 2021-08-12 CC BY 4.0
attribution
sic2.zip
corpus Information (XML)
CC BY 4.0
attribution
readme.txt
readme (txt)
2.18 KB 2020-11-17 CC BY 4.0
attribution

Type

  • Corpus
  • Training and evaluation data

Language

Swedish

Size

Sentences: 892
Tokens: 13,562

Creators

  • Östling, Robert
  • Sjons, Johan
  • Bjerva, Johannes
  • Berdicevskis, Aleksandrs

Updated

2020-11-25

Contact

Språkbanken
sb-info@svenska.gu.se