Introduction
Swesaurus is a free Swedish wordnet, based on so called fuzzy synonym sets (or fuzzy synsets). It reuses information about lexical-semantic relations in a number of freely available lexical resources for Swedish.
- SALDO - a lexico-semantic resource for Swedish
- Synlex - a Swedish list of synonyms with level of synonymy
- SDB (Semantic database) - a lexical database
- Wiktionary - a web-based project for collaboratively creating a free lexicon
- Princeton Core WordNet
We have explored two approaches for mapping Synlex synonym pairs to different word senses in SALDO. The first deals with transitive closure. If a relation between A and B, and between B and C, is also tru for A and C, then this is a transitive relation. For example, if we know that an elephant is larger than a camel, and that a camel is larger than a cat, then we automatically know that an elephant is larger than a cat. Synonymy is a transitive relation, which means that if we know that A is synonymous with B, and B with C, then automatically A is synonymous with C. In other words, the synonymy pair A-C can be deducted from the explicitly stated synonymy pairs A-B and B-C. The transitive closure is the number of objects we get from calculating all transitivity relations for one or several of the words. We can think of the transitive closure as a 'chain' of word senses, where each link of the chain is given by a synonymy pair. The result was a set of reasonable synonyms, but also one set of synonyms with several thousand senses. We reduced this large set by requiring that a synset only can contain words of the same part-of-speech, but there are still thousands of senses.
The second method to connect synonymy pairs is cliques. A clique is a set of words which are all synonyms to eachother. Calculating cliques did not result in abnormally large synsets, but rendered other problems. For example, some word senses appear in more than one synset, which violates the wordnet meaning of a word sense. We experiment with different ways of dealing with this. For example, starting with the cliques which have common word senses, we can deduce missing synonym pairs which, if they existed, would collapse the cliques into one clique. This results in pairs of good quality.
Download the development version of Swesaurus
We have made the synsets from the experiments with SALDO and Synlex available. Synlex was created by letting users of a webb lexicon judge synonym pairs, where 0 meant that they were not synonyms, and 5 meant that they had the same meaning. The 60% below refers to the set of word pairs which received an average rating of 3 or higher, while 100% means that the word pairs got an average rating of 5. -->