Skip to content

Corpus description

Each corpus in Strix has a default metadata structure. Some corpora may also include additional annotations at the word and text levels. Below is an example of the basic structure of a corpus, using the Swedish party programs and election manifestos corpus as a reference.


Swedish party programs and election manifestos
  • Mode: Modern
  • Documents: 349
  • Corpus Size: 2,099,602 tokens
  • Word attributes:
    • Lemgram
    • Sense
    • Compound word forms
    • Compound lemgrams
    • Dependency relation
    • Dephead
    • Ref
    • Sentiment label
    • Text classification (blingbring)
    • Text classification (swefn)
    • Baseform
    • Msd
    • Part-of-speech
  • Text attributes:
    • Text classification (blingbring)
    • Text classification (swefn)
    • Readability measure (LIX)
    • Readability measure (ovix)
    • Readability measure (nk)
    • Id
    • Party
    • Type
    • Year
  • Structural attributes:
    • Name tag:
      • Expression
      • Name
      • Type
      • Subtype
    • Sentence
    • Location

This metadata structure provides a comprehensive overview of each corpus in Strix, enabling users to perform detailed analyses at both the word and text levels. For more information about how to use these attributes, refer to the relevant sections in the documentation.