Skip to main content

The release of the Eukalyptus Treebank of Written Swedish, v1.0.0

3 October 2021

Eukalyptus contains almost 100 thousand tokens of written, contemporary Swedish of different text types/genres (novels, news texts, Wikipedia articles, blog texts and Europarl proceedings). Texts have been manually annotated with lemmata, word senses, parts of speech, multi-word units, and syntactic structure (constituents with grammatical functions).

The treebank – source texts and annotations – is released under a CC BY-SA 4.0 license, and is currently distributed in the TIGER-XML format.

For download details, please visit:

The download archive also contains documentation and publications related to the design of Eukalyptus.

We hope you find Eukalyptus useful in your work.