Skip to main content

Swesaurus

A Swedish WordNet

Introduction

Swesaurus is a free Swedish wordnet, based on so called fuzzy synonym sets (or fuzzy synsets). It reuses information about lexical-semantic relations in a number of freely available lexical resources for Swedish.

  • SALDO - a lexico-semantic resource for Swedish
  • Synlex - a Swedish list of synonyms with level of synonymy
  • SDB (Semantic database) - a lexical database
  • Wiktionary - a web-based project for collaboratively creating a free lexicon
  • Princeton Core WordNet

We have explored two approaches for mapping Synlex synonym pairs to different word senses in SALDO. The first deals with transitive closure. If a relation between A and B, and between B and C, is also tru for A and C, then this is a transitive relation. For example, if we know that an elephant is larger than a camel, and that a camel is larger than a cat, then we automatically know that an elephant is larger than a cat. Synonymy is a transitive relation, which means that if we know that A is synonymous with B, and B with C, then automatically A is synonymous with C. In other words, the synonymy pair A-C can be deducted from the explicitly stated synonymy pairs A-B and B-C. The transitive closure is the number of objects we get from calculating all transitivity relations for one or several of the words. We can think of the transitive closure as a 'chain' of word senses, where each link of the chain is given by a synonymy pair. The result was a set of reasonable synonyms, but also one set of synonyms with several thousand senses. We reduced this large set by requiring that a synset only can contain words of the same part-of-speech, but there are still thousands of senses.

The second method to connect synonymy pairs is cliques. A clique is a set of words which are all synonyms to eachother. Calculating cliques did not result in abnormally large synsets, but rendered other problems. For example, some word senses appear in more than one synset, which violates the wordnet meaning of a word sense. We experiment with different ways of dealing with this. For example, starting with the cliques which have common word senses, we can deduce missing synonym pairs which, if they existed, would collapse the cliques into one clique. This results in pairs of good quality.

Download the development version of Swesaurus

File Size Modified Licence
swesaurus.xml
lexicon (LMF)
12.16 MB 2017-09-19 CC BY 4.0
attribution

Type

  • Lexicon

Language

Swedish

Size

Entries: 15,010

Contact

Språkbanken
sb-info@svenska.gu.se