Corpus description

Each corpus in Strix has a default metadata structure. Some corpora may also include additional annotations at the word and text levels. Below is an example of the basic structure of a corpus, using the Swedish party programs and election manifestos corpus as a reference.

Swedish party programs and election manifestos

Mode: Modern
Documents: 349
Corpus Size: 2,099,602 tokens
Word attributes:

Lemgram
Sense
Compound word forms
Compound lemgrams
Dependency relation
Dephead
Ref
Sentiment label
Text classification (blingbring)
Text classification (swefn)
Baseform
Msd
Part-of-speech

Text attributes:

Text classification (blingbring)
Text classification (swefn)
Readability measure (LIX)
Readability measure (ovix)
Readability measure (nk)
Id
Party
Type
Year

Structural attributes:

Name tag:
- Expression
- Name
- Type
- Subtype
Sentence
Location

This metadata structure provides a comprehensive overview of each corpus in Strix, enabling users to perform detailed analyses at both the word and text levels. For more information about how to use these attributes, refer to the relevant sections in the documentation.

Data Selection

Search

Data Visualization

Document view

Corpus description

Corpus description ​

Corpus description