Named entity recognition (NER) recognises textual mentions of named entities that belong to a predefined set of categories, such as locations, and time expressions. HFST-SweNER is based on the conversion, modelling and adaptation of a Swedish NER system from a hybrid environment to the Helsinki Finite-State Transducer Technology (HFST) platform. HFST-SweNER is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers such as various n-gram-based named entity lists (gazetteers).
Citation
Språkbanken Text (2020). swe-namedentity-swener (updated: 2020-05-13). [Analysis]. Språkbanken Text.Named entity recognition (NER) recognises named entities such as locations, persons and time expressions in text.
Example
This analysis is used with Sparv. Check out Sparv's quick start guide to get started!
To use this analysis, add the following lines under export.annotations
in the Sparv corpus configuration file:
- swener.ne # Named entity segments from SweNER
- swener.ne:swener.name # Names in SweNER named entities
- swener.ne:swener.ex # Named entity expressions from SweNER
- swener.ne:swener.type # Named entity types from SweNER
- swener.ne:swener.subtype # Named entity sub types from SweNER
For more info on how to use Sparv, check out the Sparv documentation.
Example output:
<ne ex="ENAMEX" name="Alfred Bernhard Nobel" subtype="HUM" type="PRS">
<token>Alfred</token>
<token>Bernhard</token>
<token>Nobel</token>
</ne>
<token>,</token>
<token>född</token>
<ne ex="TIMEX" name="21 oktober 1833" subtype="DAT" type="TME">
<token>21</token>
<token>oktober</token>
<token>1833</token>
</ne>
<token>i</token>
<ne ex="ENAMEX" name="Stockholm" subtype="PPL" type="LOC">
<token>Stockholm</token>
</ne>
<token>,</token>
<ne ex="ENAMEX" name="Italien" subtype="PPL" type="LOC">
<token>Italien</token>
</ne>
<token>,</token>
<token>var</token>
<token>en</token>
<token>svensk</token>
<token>kemist</token>
<token>och</token>
<token>stiftare</token>
<token>av</token>
<ne ex="ENAMEX" name="Nobelpriset" subtype="PRZ" type="OBJ">
<token>Nobelpriset</token>
</ne>
Evaluation results
f-score between 91.33% to 27.48%, depending on the named entity category
Other references
Dimitrios Kokkinakis. 2004. Reducing the effect of name explosion
Download HFST-SweNER: https://www.kielipankki.fi/download/HFST-SweNER/