swe-lexical_classes_text-sparv-blingbring

Citation

Språkbanken Text (2017). swe-lexical_classes_text-sparv-blingbring (updated: 2017-09-21). [Analysis]. Språkbanken Text.

Standard reference

Lars Borin, Luis Nieto Piña, Richard Johansson (2015): Here be dragons? The perils and promises of inter-resource lexical-semantic mapping, in Linköping Electronic Conference Proceedings. Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities. Workshop at NODALIDA , May 11, 13-18 2015, Vilnius, volume 112, pages 1-11

Lexical classes from Blingbring on text-level

Tokens are looked up in Blingbring in order to enrich them with information about their lexical classes. Texts are then enriched with information about lexical classes based on which classes are relevant for the tokens within them.

The Blingbring frequency model (trained on Göteborgsposten 2008, SUC 3.0 and Bonniersromaner I (1976–77)) is used as reference for ranking the Blingbring classes occurring in each text. Using token-level lexical class information, it calculates and assigns the most relevant classes for each text. These classes are filtered and ranked based on their frequency and dominance compared to the reference material.

Dominance refers to the relative importance or prominence of a lexical class in a given text compared to a reference material. Dominance is derived by comparing the observed frequency of a lexical class in the text to its expected (relative) frequency in the reference material.

Blingbring (version 0.2) is based on the content of Bring's Svenskt ordförråd ordnat i begreppsklasser [The Swedish vocabulary arranged into conceptual classes] (1930). The entries in Blingbring have been linked to the corresponding SALDO word sense entries. The linkages are ambiguous in many cases, but disambiguation is planned for future versions of Blingbring.

Example

This analysis is used with Sparv. Check out Sparv's quick start guide to get started!

To use this analysis, add the following line under export.annotations in the Sparv corpus configuration file:

- <text>:lexical_classes.blingbring  # Lexical classes for text chunks from Blingbring

For more info on how to use Sparv, check out the Sparv documentation.

Example output:

<text blingbring="|brunt:352.54|uttryckslöshet:140.741|rött:135.333|">
  <token>Rödräv</token>
  <token>eller</token>
  <token>vanlig</token>
  <token>räv</token>
  <token>är</token>
  <token>ett</token>
  <token>hunddjur</token>
  <token>och</token>
  <token>den</token>
  <token>mest</token>
  <token>förekommande</token>
  <token>arten</token>
  <token>i</token>
  <token>rävsläktet</token>
  <token>.</token>
</text>

Other references

Lars Borin, Jens Allwood, Gerard de Melo (2014): Bring vs. MTRoget: Evaluating automatic thesaurus translation, in Proceedings of LREC 2014, May 26-31, 2014 Reykjavik, Iceland

swe-lexical_classes_text-sparv-blingbring

Citation

Standard reference

Example

Other references

Collection

Type

Task

Unit

Tool

Model

Tagset

Trained on

Created

Updated

Contact