Skip to main content
Språkbanken Text is a part of Språkbanken.

Blog mix 2004

Data citation Information

Språkbanken Text (2017). Blog mix 2004 (updated: 2017-02-14). [Data set]. Språkbanken Text. https://doi.org/10.23695/8fq1-m879
BibTeX Additional ways to cite the dataset.
Material from a selection of Swedish blogs. Is updated regularly.

The blogs in the blogmix are selected through the lists Most visited private blogs, Most visited professional blogs, and the local lists for different regions, at bloggportalen.se.

More information, such as the location and age of the blogger is also retrieved from Bloggportalen. The material has not been manually checked, which means that spam may occur. Some English blogs have been removed when discovered, and some blogs have not been added for technical reasons.

The time of the blogs ranges from the first to the latest entries of the selected blogs, and the corpus is continually updated.

Accessible through

Access Platform Licence
CC BY 4.0
attribution

Download

File Size Modified Licence
bloggmix2004.xml.bz2
this file contains a scrambled version of the corpus Information (XML)
9.03 MB 2017-02-14 CC BY 4.0
attribution
stats_BLOGGMIX2004.txt
Word statistics: Information (CSV)
2.85 MB 2017-02-19 CC BY 4.0
attribution

Collection

Type

  • Corpus

Language

Swedish

Size

Sentences: 39,382
Tokens: 638,967

Updated

2017-02-14

Contact

Språkbanken
sb-info@svenska.gu.se