Hoppa till huvudinnehåll

The release of the Eukalyptus Treebank of Written Swedish, v1.0.0

3 oktober 2021

Eukalyptus contains almost 100 thousand tokens of written, contemporary Swedish of different text types/genres (novels, news texts, Wikipedia articles, blog texts and Europarl proceedings). Texts have been manually annotated with lemmata, word senses, parts of speech, multi-word units, and syntactic structure (constituents with grammatical functions).

The treebank – source texts and annotations – is released under a CC BY-SA 4.0 license, and is currently distributed in the TIGER-XML format.
 

For download details, please visit:
https://spraakbanken.gu.se/en/resources/eukalyptus

The download archive also contains documentation and publications related to the design of Eukalyptus.

We hope you find Eukalyptus useful in your work.