Analysis of SALDO wordform compounds
Tokens and their POS tags are looked up in the SALDO lexicon in order to enrich them with compound information. More information (in Swedish) is found in the Språkbanken Text FAQ.
Tokens and their POS tags are looked up in the SALDO lexicon in order to enrich them with compound information. More information (in Swedish) is found in the Språkbanken Text FAQ.
This analysis is used with Sparv. Check out Sparv's quick start guide to get started!
To use this analysis, add the following line under export.annotations
in the Sparv corpus configuration file:
- <token>:saldo.compwf # Compound analysis using wordforms
For more info on how to use Sparv, check out the Sparv documentation.
Example output:
<token compwf="|">Språkbanken</token>
<token compwf="|">Text</token>
<token compwf="|">är</token>
<token compwf="|">en</token>
<token compwf="|forsknings+infrastruktur|">forskningsinfrastruktur</token>
<token compwf="|">för</token>
<token compwf="|">språkliga</token>
<token compwf="|">data</token>
<token compwf="|">och</token>
<token compwf="|">en</token>
<token compwf="|språk+teknologisk|">språkteknologisk</token>
<token compwf="|forsknings+enhet|">forskningsenhet</token>
<token compwf="|">.</token>