The release of the Eukalyptus Treebank of Written Swedish, v1.0.0

Den här sidan är inte översatt till svenska. Innehållet visas därför på engelska.

Eukalyptus contains almost 100 thousand tokens of written, contemporary Swedish of different text types/genres (novels, news texts, Wikipedia articles, blog texts and Europarl proceedings). Texts have been manually annotated with lemmata, word senses, parts of speech, multi-word units, and syntactic structure (constituents with grammatical functions).

The treebank – source texts and annotations – is released under a CC BY-SA 4.0 license, and is currently distributed in the TIGER-XML format.

For download details, please visit:

The download archive also contains documentation and publications related to the design of Eukalyptus.

We hope you find Eukalyptus useful in your work.