Skip to main content

sbx-swe-readability-sparv-ovix

Analysis citation Information

Språkbanken Text (2018). sbx-swe-readability-sparv-ovix (updated: 2018-03-28). [Analysis]. Språkbanken Text. https://doi.org/10.23695/v8w3-pb64
BibTeX Additional ways to cite the dataset.
Annotation of Swedish texts with OVIX values which indicate the difficulty of the texts

OVIX (ordvariationsindex) is a readability measure based on how many words occur only once in the text chunk.

OVIX is calculated as log(tokens) / log(2 - (log(types) / log(tokens)))

A high value can be interpreted as frequently introducing new words to the reader. On the other hand, a low value may indicate a monotonous text.

Example

This analysis is used with Sparv. Check out Sparv's quick start guide to get started!

To use this analysis, add the following line under export.annotations in the Sparv corpus configuration file:

- <text>:readability.ovix  # OVIX values for text chunks

For more info on how to use Sparv, check out the Sparv documentation.

Example output:

<text ovix="inf">
  <token>Det</token>
  <token>här</token>
  <token>är</token>
  <token>en</token>
  <token>enkel</token>
  <token>mening</token>
  <token>.</token>
</text>
<text ovix="94.13">
  <token>LIX</token>
  <token>(</token>
  <token>Björnsson</token>
  <token>,</token>
  <token>1968</token>
  <token>)</token>
  <token>är</token>
  <token>ett</token>
  <token>läsbarhetsvärde</token>
  <token>beräknat</token>
  <token></token>
  <token>genomsnittligt</token>
  <token>antal</token>
  <token>ord</token>
  <token>per</token>
  <token>mening</token>
  <token>och</token>
  <token>andel</token>
  <token>långa</token>
  <token>ord</token>
  <token>(</token>
  <token>över</token>
  <token>sex</token>
  <token>bokstäver</token>
  <token>långa</token>
  <token>)</token>
  <token>.</token>
</text>

Type

  • Analysis

Task

  • readability measures

Unit

  • text

Created

2018-03-28

Updated

2018-03-28

Contact

sb-info@svenska.gu.se