Hoppa till huvudinnehåll
Språkbanken Text är en avdelning inom Språkbanken.

New blog corpus in Korp

27 november 2020
SIC2, a small corpus of blogs with gold part-of-speech, morphosyntactic and named-entity tags, as well as basic info about the authors, has been added to Korp.

Any corpus is a welcome addition to Korp, but gold corpora (those where the annotation quality has been manually controlled) are particularly valuable. We have now added SIC2, a slightly modified version of the Stockholm Internet Corpus, originally created by Robert Östling et al. SIC2 is a small corpus of blogs, but it has gold part-of-speech, morphosyntactic and named-entity tags (SUC-style). In addition, basic information about the authors is also available. The corpus is downloadable.

The integration of SIC2 into Korp served also as a test drive for the new version of our annotation pipeline Sparv, to be released very soon.