Any corpus is a welcome addition to Korp, but gold corpora (those where the annotation quality has been manually controlled) are particularly valuable. We have now added SIC2, a slightly modified version of the Stockholm Internet Corpus, originally created by Robert Östling et al. SIC2 is a small corpus of blogs, but it has gold part-of-speech, morphosyntactic and named-entity tags (SUC-style). In addition, basic information about the authors is also available. The corpus is downloadable.
The integration of SIC2 into Korp served also as a test drive for the new version of our annotation pipeline Sparv, to be released very soon.