Skip to main content
Språkbanken Text is a part of Språkbanken.

swe-namedentity-swener

Citation Information

Språkbanken Text (2020). swe-namedentity-swener (updated: 2020-05-13). [Analysis]. Språkbanken Text.
BibTeX
Named entity recognition (NER) recognises named entities such as locations, persons and time expressions in text.

Named entity recognition (NER) recognises textual mentions of named entities that belong to a predefined set of categories, such as locations, and time expressions. HFST-SweNER is based on the conversion, modelling and adaptation of a Swedish NER system from a hybrid environment to the Helsinki Finite-State Transducer Technology (HFST) platform. HFST-SweNER is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers such as various n-gram-based named entity lists (gazetteers).

Example

This analysis is used with Sparv. Check out Sparv's quick start guide to get started!

To use this analysis, add the following lines under export.annotations in the Sparv corpus configuration file:

- swener.ne  # Named entity segments from SweNER
- swener.ne:swener.name  # Names in SweNER named entities
- swener.ne:swener.ex  # Named entity expressions from SweNER
- swener.ne:swener.type  # Named entity types from SweNER
- swener.ne:swener.subtype  # Named entity sub types from SweNER

For more info on how to use Sparv, check out the Sparv documentation.

Example output:

<ne ex="ENAMEX" name="Alfred Bernhard Nobel" subtype="HUM" type="PRS">
  <token>Alfred</token>
  <token>Bernhard</token>
  <token>Nobel</token>
</ne>
<token>,</token>
<token>född</token>
<ne ex="TIMEX" name="21 oktober 1833" subtype="DAT" type="TME">
  <token>21</token>
  <token>oktober</token>
  <token>1833</token>
</ne>
<token>i</token>
<ne ex="ENAMEX" name="Stockholm" subtype="PPL" type="LOC">
  <token>Stockholm</token>
</ne>
<token>,</token>
<ne ex="ENAMEX" name="Italien" subtype="PPL" type="LOC">
  <token>Italien</token>
</ne>
<token>,</token>
<token>var</token>
<token>en</token>
<token>svensk</token>
<token>kemist</token>
<token>och</token>
<token>stiftare</token>
<token>av</token>
<ne ex="ENAMEX" name="Nobelpriset" subtype="PRZ" type="OBJ">
  <token>Nobelpriset</token>
</ne>

Evaluation results

f-score between 91.33% to 27.48%, depending on the named entity category

Other references

Type

  • Analysis

Task

  • named entity recognition

Tool

HFST-SweNER

Model

Included in the tool

Created

2014-07-04

Updated

2020-05-13

Contact

Språkbanken Text
sb-info@svenska.gu.se