Eukalyptus contains almost 100 thousand tokens of written, contemporary Swedish of different text types/genres (novels, news texts, Wikipedia articles, blog texts and Europarl proceedings). Texts have been manually annotated with lemmata, word senses, parts of speech, multi-word units, and syntactic structure (constituents with grammatical functions).
The treebank – source texts and annotations – is released under a CC BY-SA 4.0 license, and is currently distributed in the TIGER-XML format.
For download details, please visit:
The download archive also contains documentation and publications related to the design of Eukalyptus.
We hope you find Eukalyptus useful in your work.