Skip to content

Sparv Classes

Sparv classes are used to represent various types of data in Sparv, such as source files, models, and input and output annotations. By using Sparv classes in the signatures of processors, Sparv knows the inputs and outputs of each processor and can build a dependency graph to determine the order in which processors should be run to produce the desired output. Additionally, Sparv classes provide methods for reading and writing annotations, allowing annotators to handle annotation files without needing to understand Sparv's internal data format. Below is a list of all available Sparv classes, including their parameters, properties, and public methods.

Classes used as default input for annotator functions.

AllSourceFilenames

AllSourceFilenames()

List with names of all source files.

This class provides an iterable containing the names of all source files. It is commonly used by exporter functions that need to combine annotations from multiple source files.

__getitem__

__getitem__(index)

Return item at index.

__len__

__len__()

Return number of source files.

Annotation

Annotation(name='', source_file=None)

Regular Annotation tied to one source file.

This class represents a regular annotation tied to a single source file. It is used when an annotation is required as input for a function, for example, Annotation("<token:word>").

PARAMETER DESCRIPTION
name

Name of the annotation.

TYPE: str DEFAULT: ''

source_file

Source file for the annotation.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__iter__

__iter__()

Get an iterator of values from the annotation.

This is a convenience method equivalent to read().

RETURNS DESCRIPTION
Iterator[str]

An iterator of values from the annotation.

__len__

__len__()

Get the number of values in the annotation.

RETURNS DESCRIPTION
int

The number of values in the annotation.

create_empty_attribute

create_empty_attribute()

Return a list filled with None of the same size as this annotation.

exists

exists()

Return True if annotation file exists.

get_child_values

get_child_values(
    child, append_orphans=False, orphan_alert=False
)

Get values of children of this annotation.

PARAMETER DESCRIPTION
child

Child annotation.

TYPE: BaseAnnotation

append_orphans

If True, append orphans to the end.

TYPE: bool DEFAULT: False

orphan_alert

If True, log a warning when a child has no parent.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Iterator[Iterator]

An iterator with one element for each parent. Each element is an iterator of values in the child annotation.

Iterator[Iterator]

If append_orphans is True, the last element is an iterator of orphans.

get_children

get_children(child, orphan_alert=False)

Get children of this annotation.

PARAMETER DESCRIPTION
child

Child annotation.

TYPE: BaseAnnotation

orphan_alert

If True, log a warning when a child has no parent.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
list

A tuple of two lists.

list

The first one is a list with n (= total number of parents) elements where every element is a list

tuple[list, list]

of indices in the child annotation.

tuple[list, list]

The second one is a list of orphans, i.e. containing indices in the child annotation that have no parent.

tuple[list, list]

Both parents and children are sorted according to their position in the source file.

get_parents

get_parents(parent, orphan_alert=False)

Get parents of this annotation.

PARAMETER DESCRIPTION
parent

Parent annotation.

TYPE: BaseAnnotation

orphan_alert

If True, log a warning when a child has no parent.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
list

A list with n (= total number of children) elements where every element is an index in the parent

list

annotation, or None when no parent is found.

get_size

get_size()

Get number of values.

Note

This method is deprecated and will be removed in future versions. Use len() instead.

RETURNS DESCRIPTION
int

The number of values in the annotation.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

read

read()

Get an iterator of values from the annotation.

RETURNS DESCRIPTION
Iterator[str]

An iterator of values from the annotation.

read_attributes

read_attributes(annotations, with_annotation_name=False)

Return an iterator of tuples of multiple attributes on the same annotation.

PARAMETER DESCRIPTION
annotations

List of annotations to read attributes from.

TYPE: list[BaseAnnotation] | tuple[BaseAnnotation, ...]

with_annotation_name

If True, return attributes with annotation name.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Iterator[tuple]

An iterator of tuples of attributes.

read_parents_and_children

read_parents_and_children(parent, child)

Read parent and child annotations.

Reorders them according to span position, but keeps original index information.

PARAMETER DESCRIPTION
parent

Parent annotation.

TYPE: BaseAnnotation

child

Child annotation.

TYPE: BaseAnnotation

RETURNS DESCRIPTION
tuple[Iterator, Iterator]

A tuple of iterators for parent and child annotations.

read_spans

read_spans(decimals=False, with_annotation_name=False)

Get an iterator of spans from the annotation.

PARAMETER DESCRIPTION
decimals

If True, return spans with decimals.

TYPE: bool DEFAULT: False

with_annotation_name

If True, return spans with annotation name.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Iterator[tuple]

An iterator of spans from the annotation.

read_text

read_text()

Get the source text of the annotation.

RETURNS DESCRIPTION
Iterator[str]

An iterator of the source text of the annotation.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

AnnotationAllSourceFiles

AnnotationAllSourceFiles(name='')

Regular annotation but source file must be specified for all actions.

Like Annotation, this class represents a regular annotation, but is used as input to an annotator to require the specified annotation for every source file in the corpus. By calling an instance of this class with a source file name as an argument, you can get an instance of Annotation for that source file.

Note

All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an instance of Annotation by passing a source file name as an argument, and use the methods of the Annotation class.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__call__

__call__(source_file)

Get an Annotation instance for the specified source file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
Annotation

An Annotation instance for the specified source file.

__len__

__len__()

Return length of name.

create_empty_attribute

create_empty_attribute(source_file)

Return a list filled with None of the same size as this annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

exists

exists(source_file)

Return True if annotation file exists.

get_child_values

get_child_values(
    source_file,
    child,
    append_orphans=False,
    orphan_alert=False,
)

Get values of children of this annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

child

Child annotation.

TYPE: BaseAnnotation

append_orphans

If True, append orphans to the end.

TYPE: bool DEFAULT: False

orphan_alert

If True, log a warning when a child has no parent.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Iterator[Iterator]

An iterator with one element for each parent. Each element is an iterator of values in the child annotation.

Iterator[Iterator]

If append_orphans is True, the last element is an iterator of orphans.

get_children

get_children(source_file, child, orphan_alert=False)

Get children of this annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

child

Child annotation.

TYPE: BaseAnnotation

orphan_alert

If True, log a warning when a child has no parent.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
list

A tuple of two lists.

list

The first one is a list with n (= total number of parents) elements where every element is a list

tuple[list, list]

of indices in the child annotation.

tuple[list, list]

The second one is a list of orphans, i.e. containing indices in the child annotation that have no parent.

tuple[list, list]

Both parents and children are sorted according to their position in the source file.

get_parents

get_parents(source_file, parent, orphan_alert=False)

Get parents of this annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

parent

Parent annotation.

TYPE: BaseAnnotation

orphan_alert

If True, log a warning when a child has no parent.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
list

A list with n (= total number of children) elements where every element is an index in the parent

list

annotation, or None when no parent is found.

get_size

get_size(source_file)

Get number of values.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
int

The number of values in the annotation.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

read

read(source_file)

Get an iterator of values from the annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
Iterator[str]

An iterator of values from the annotation.

read_attributes

read_attributes(
    source_file, annotations, with_annotation_name=False
)

Return an iterator of tuples of multiple attributes on the same annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

annotations

List of annotations to read attributes from.

TYPE: list[BaseAnnotation] | tuple[BaseAnnotation, ...]

with_annotation_name

If True, return attributes with annotation name.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Iterator

An iterator of tuples of attributes.

read_spans

read_spans(
    source_file, decimals=False, with_annotation_name=False
)

Get an iterator of spans from the annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

decimals

If True, return spans with decimals.

TYPE: bool DEFAULT: False

with_annotation_name

If True, return spans with annotation name.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Iterator

An iterator of spans from the annotation.

read_text

read_text(source_file)

Get the source text of the annotation.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
Iterator[str]

An iterator of the source text of the annotation.

remove

remove(source_file)

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

AnnotationCommonData

AnnotationCommonData(name='')

Data annotation for the whole corpus.

Like AnnotationData, this class represents an annotation with arbitrary data when used as input to an annotator. However, AnnotationCommonData is used for data that applies to the entire corpus, not tied to a specific source file.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

read

read()

Read arbitrary corpus-level data from the annotation file.

RETURNS DESCRIPTION
Any

The data of the annotation.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

AnnotationData

AnnotationData(name='', source_file=None)

Annotation of the data type, for one source file, not tied to spans in the corpus text.

This class represents an annotation holding arbitrary data, i.e., data that is not tied to spans in the corpus text. It is used as input to an annotator.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

source_file

The name of the source file.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

read

read()

Read arbitrary data from the annotation file.

RETURNS DESCRIPTION
Any

The data of the annotation.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

AnnotationDataAllSourceFiles

AnnotationDataAllSourceFiles(name='')

Data annotation but source file must be specified for all actions.

Similar to AnnotationData, this class is used for annotations holding arbitrary data, but it is used as input to an annotator to require the specified annotation for every source file in the corpus. By calling an instance of this class with a source file name as an argument, you can get an instance of AnnotationData for that source file.

Note

All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an instance of AnnotationData by passing a source file name as an argument, and use the methods of the AnnotationData class.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__call__

__call__(source_file)

Get an AnnotationData instance for the specified source file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
AnnotationData

An AnnotationData instance for the specified source file.

__len__

__len__()

Return length of name.

exists

exists(source_file)

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

read

read(source_file)

Read arbitrary data from annotation file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
Any

The data of the annotation.

remove

remove(source_file)

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

AnnotationName

AnnotationName(name='', source_file=None, is_input=None)

Class representing an Annotation name.

Use this class when only the name of an annotation is needed, not the actual data. The annotation will not be added as a prerequisite for the annotator, meaning that using AnnotationName will not automatically trigger the creation of the referenced annotation.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

source_file

The name of the source file.

TYPE: str | None DEFAULT: None

is_input

Deprecated, use AnnotationName instead of setting this to False.

TYPE: bool | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

Binary

Path to binary executable.

This class holds the path to a binary executable. The path can be either the name of a binary in the system's PATH, a full path to a binary, or a path relative to the Sparv data directory. It is often used to define a prerequisite for an annotator function.

PARAMETER DESCRIPTION
object

Path to the binary executable.

BinaryDir

Path to directory containing executable binaries.

This class holds the path to a directory containing executable binaries. The path can be either an absolute path or a path relative to the Sparv data directory.

PARAMETER DESCRIPTION
object

Path to the directory containing the executable binaries.

Config

Config(
    name,
    default=None,
    description=None,
    datatype=None,
    choices=None,
    pattern=None,
    min_len=None,
    max_len=None,
    min_value=None,
    max_value=None,
    const=None,
    conditions=None,
)

Class holding configuration key names.

This class represents a configuration key and optionally its default value. You can specify the datatype and allowed values, which will be used for validating the config and generating the Sparv config JSON schema.

For further information on how to use this class, see the Config Parameters section.

PARAMETER DESCRIPTION
name

The name of the configuration key.

TYPE: str

default

The optional default value of the configuration key.

TYPE: Any DEFAULT: None

description

A mandatory description of the configuration key.

TYPE: str | None DEFAULT: None

datatype

A type specifying the allowed datatype(s). Supported types are int, float, str, bool, None, type(None), list, and dict. For list and dict, you can specify the allowed types for the elements by using type arguments, like list[str] or dict[str, int]. More complex types (e.g., further nesting) than these are not supported. type(None) is used to allow None as a value. None is the default value, and means that any datatype is allowed.

TYPE: type | None DEFAULT: None

choices

An iterable of valid choices, or a function that returns such an iterable.

TYPE: Iterable | Callable | None DEFAULT: None

pattern

A regular expression matching valid values (only for the datatype str).

TYPE: str | None DEFAULT: None

min_len

An int representing the minimum length of the value.

TYPE: int | None DEFAULT: None

max_len

An int representing the maximum length of the value.

TYPE: int | None DEFAULT: None

min_value

An int or float representing the minimum numeric value.

TYPE: int | float | None DEFAULT: None

max_value

An int or float representing the maximum numeric value.

TYPE: int | float | None DEFAULT: None

const

Restrict the value to a constant.

TYPE: Any | None DEFAULT: None

conditions

A list of Config objects with conditions that must also be met.

TYPE: list[Config] | None DEFAULT: None

__new__

__new__(name, *args, **kwargs)

Create a new instance of the class.

PARAMETER DESCRIPTION
cls

The class to create an instance of.

name

The name of the configuration key.

TYPE: str

*args

Additional arguments.

TYPE: Any DEFAULT: ()

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

Corpus

A string representing the name (ID) of a corpus.

Export

A string containing the path to an export directory and filename.

Represents an export file, used to define the output of an exporter function.

PARAMETER DESCRIPTION
object

The export directory and filename. The export directory must include the module name as a prefix or be equal to the module name. The filename may include the wildcard {file} which will be replaced with the name of the source file. For example: "xml_export.pretty/{file}_export.xml".

ExportAnnotationNames

ExportAnnotationNames(config_name)

List of annotations to include in export.

An iterable containing annotations to be included in the export, as specified in the corpus configuration. Unlike ExportAnnotations, using this class will not add the annotations as dependencies. Use this class when you only need the annotation names, not the actual annotation files.

PARAMETER DESCRIPTION
config_name

The configuration variable specifying which annotations to include.

TYPE: str

__getitem__

__getitem__(index)

Return item at index.

Each item is a tuple of an Annotation and an optional export name.

__len__

__len__()

Return number of annotations.

ExportAnnotations

ExportAnnotations(config_name, is_input=None)

Iterable with annotations to include in export.

An iterable containing annotations to be included in the export, as specified in the corpus configuration. When using this class, annotation files for the current source file are automatically added as dependencies.

PARAMETER DESCRIPTION
config_name

The configuration variable specifying which annotations to include.

TYPE: str

is_input

Deprecated, use ExportAnnotationNames instead of setting this to False.

TYPE: bool | None DEFAULT: None

__getitem__

__getitem__(index)

Return item at index.

Each item is a tuple of an Annotation and an optional export name.

__len__

__len__()

Return number of annotations.

ExportAnnotationsAllSourceFiles

ExportAnnotationsAllSourceFiles(config_name)

List of annotations to include in export.

An iterable containing annotations to be included in the export, as specified in the corpus configuration. When using this class, annotation files for all source files will automatically be added as dependencies.

PARAMETER DESCRIPTION
config_name

The configuration variable specifying which annotations to include.

TYPE: str

__getitem__

__getitem__(index)

Return item at index.

__len__

__len__()

Return number of annotations.

ExportInput

ExportInput(val, all_files=False)

Export directory and filename pattern, used as input.

Represents the export directory and filename pattern used as input. Use this class when you need export files as input for another function.

PARAMETER DESCRIPTION
val

The export directory and filename pattern (e.g., "xml_export.pretty/{file}_export.xml").

TYPE: str

all_files

Set to True to get the export for all source files.

TYPE: bool DEFAULT: False

__new__

__new__(val, *args, **kwargs)

Create a new instance of the class.

PARAMETER DESCRIPTION
cls

The class to create an instance of.

val

The export directory and filename pattern (e.g., "xml_export.pretty/{file}_export.xml").

TYPE: str

*args

Additional arguments.

TYPE: Any DEFAULT: ()

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

HeaderAnnotations

HeaderAnnotations(config_name, source_file=None)

Header annotations to include in export.

An iterable containing header annotations from the source to be included in the export, as specified in the corpus configuration.

PARAMETER DESCRIPTION
config_name

The configuration variable that specifies which header annotations to include.

TYPE: str

source_file

Name of the source file.

TYPE: str | None DEFAULT: None

__getitem__

__getitem__(index)

Return item at index.

__len__

__len__()

Return number of annotations.

HeaderAnnotationsAllSourceFiles

HeaderAnnotationsAllSourceFiles(
    config_name, source_files=()
)

Header annotations to include in export.

An iterable containing header annotations from all source files to be included in the export, as specified in the corpus configuration. Unlike HeaderAnnotations, this class ensures that the header annotations file (created using Headers) for every source file is added as a dependency.

PARAMETER DESCRIPTION
config_name

The configuration variable specifying which source annotations to include.

TYPE: str

source_files

List of source files (internal use only).

TYPE: Iterable[str] DEFAULT: ()

__getitem__

__getitem__(index)

Return item at index.

__len__

__len__()

Return number of annotations.

Headers

Headers(source_file)

List of header annotation names.

Represents a list of header annotation names for a given source file, used as output for importers.

PARAMETER DESCRIPTION
source_file

The name of the source file.

TYPE: str

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if the headers file exists for this source file.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

read

read()

Read the headers file and return a list of header annotation names.

RETURNS DESCRIPTION
list[str]

A list of header annotation names.

remove

remove()

Remove the headers file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

write

write(header_annotations)

Write the headers file with the provided list of header annotation names.

PARAMETER DESCRIPTION
header_annotations

A list of header annotation names.

TYPE: list[str]

Language

The language of the corpus.

An instance of this class contains information about the language of the corpus. This information is retrieved from the corpus configuration and is specified using an ISO 639-1 language code.

Marker

Marker(name='')

A marker indicating that something has run.

Similar to AnnotationCommonData, but typically without any actual data. Used as input. Markers are used to make sure that something has been executed. Created using OutputMarker.

PARAMETER DESCRIPTION
name

The name of the marker.

TYPE: str DEFAULT: ''

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if marker file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

read

read()

Read arbitrary corpus-level string data from the marker file.

RETURNS DESCRIPTION
Iterator[str]

An iterator with the data of the marker.

remove

remove()

Remove the marker file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

MarkerOptional

MarkerOptional(name='')

Same as Marker, but if the marker file doesn't exist, it won't be created.

This is mainly used to get a reference to a marker that may or may not exist, to be able to remove markers from connected (un)installers without triggering the connected (un)installation. Otherwise, running an uninstaller without first having run the installer would needlessly trigger the installation first.

PARAMETER DESCRIPTION
name

The name of the marker.

TYPE: str DEFAULT: ''

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if marker file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

read

read()

Read arbitrary corpus-level string data from the marker file.

RETURNS DESCRIPTION
Iterator[str]

An iterator with the data of the marker.

remove

remove()

Remove the marker file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

Model

Model(name)

Path to a model file.

Represents a path to a model file. The path can be either an absolute path, a relative path, or a path relative to the Sparv model directory. Typically used as input to annotator functions.

PARAMETER DESCRIPTION
name

The path of the model file.

TYPE: str

path property

path

Get model path.

RETURNS DESCRIPTION
Path

Get the path to the model file as a Path object.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

download

download(url)

Download the file from the given URL and save to the model path.

PARAMETER DESCRIPTION
url

URL to download from.

TYPE: str

read

read()

Read arbitrary string data from the model file.

RETURNS DESCRIPTION
str

The data of the model.

read_pickle

read_pickle()

Read pickled data from model file.

RETURNS DESCRIPTION
Any

The data of the model.

remove

remove(raise_errors=False)

Remove model file from disk.

PARAMETER DESCRIPTION
raise_errors

If True, raise an error if the file cannot be removed (e.g., if it doesn't exist).

TYPE: bool DEFAULT: False

ungzip

ungzip(out)

Unzip gzip file in same directory as the model file.

PARAMETER DESCRIPTION
out

Path to output file.

TYPE: str

unzip

unzip()

Unzip zip file in same directory as the model file.

write

write(data)

Write arbitrary string data to the model file.

PARAMETER DESCRIPTION
data

The data to write.

TYPE: str

write_pickle

write_pickle(data, protocol=-1)

Dump arbitrary data to the model file in pickle format.

PARAMETER DESCRIPTION
data

The data to write.

TYPE: Any

protocol

Pickle protocol to use.

TYPE: int DEFAULT: -1

ModelOutput

ModelOutput(name, description=None)

Same as Model, but used as the output of a model builder.

PARAMETER DESCRIPTION
name

The name of the model file.

TYPE: str

description

Description of the model.

TYPE: str | None DEFAULT: None

path property

path

Get model path.

RETURNS DESCRIPTION
Path

Get the path to the model file as a Path object.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

download

download(url)

Download the file from the given URL and save to the model path.

PARAMETER DESCRIPTION
url

URL to download from.

TYPE: str

read

read()

Read arbitrary string data from the model file.

RETURNS DESCRIPTION
str

The data of the model.

read_pickle

read_pickle()

Read pickled data from model file.

RETURNS DESCRIPTION
Any

The data of the model.

remove

remove(raise_errors=False)

Remove model file from disk.

PARAMETER DESCRIPTION
raise_errors

If True, raise an error if the file cannot be removed (e.g., if it doesn't exist).

TYPE: bool DEFAULT: False

ungzip

ungzip(out)

Unzip gzip file in same directory as the model file.

PARAMETER DESCRIPTION
out

Path to output file.

TYPE: str

unzip

unzip()

Unzip zip file in same directory as the model file.

write

write(data)

Write arbitrary string data to the model file.

PARAMETER DESCRIPTION
data

The data to write.

TYPE: str

write_pickle

write_pickle(data, protocol=-1)

Dump arbitrary data to the model file in pickle format.

PARAMETER DESCRIPTION
data

The data to write.

TYPE: Any

protocol

Pickle protocol to use.

TYPE: int DEFAULT: -1

Namespaces

Namespaces(source_file)

Namespace mapping (URI to prefix) for a source file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

read

read()

Read namespace file and parse it into a dict.

RETURNS DESCRIPTION
dict[str, str]

A dict with prefixes as keys and URIs as values.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

write

write(namespaces)

Write namespace file.

PARAMETER DESCRIPTION
namespaces

A dict with prefixes as keys and URIs as values.

TYPE: dict[str, str]

Output

Output(
    name="", cls=None, description=None, source_file=None
)

Regular annotation or attribute used as output from an annotator function.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

cls

Optional annotation class of the output.

TYPE: str | None DEFAULT: None

description

Description of the annotation.

TYPE: str | None DEFAULT: None

source_file

The name of the source file.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if annotation file exists.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

write

write(values)

Write the annotation to a file, overwriting any existing annotation.

All values will be converted to strings.

PARAMETER DESCRIPTION
values

A list of values.

TYPE: list

OutputAllSourceFiles

OutputAllSourceFiles(name='', cls=None, description=None)

Regular annotation or attribute used as output, but not tied to a specific source file.

Similar to Output, this class represents a regular annotation or attribute used as output, but it is used when output should be produced for every source file in the corpus. By calling an instance of this class with a source file name as an argument, you can get an instance of Output for that source file.

Note

All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an instance of Output by passing a source file name as an argument, and use the methods of the Output class.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

cls

Optional annotation class of the output.

TYPE: str | None DEFAULT: None

description

Description of the annotation.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__call__

__call__(source_file)

Get an AnnotationData instance for the specified source file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
Output

An Output instance for the specified source file.

__len__

__len__()

Return length of name.

exists

exists(source_file)

Return True if annotation file exists.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

remove

remove(source_file)

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

write

write(values, source_file)

Write an annotation to file. Existing annotation will be overwritten.

PARAMETER DESCRIPTION
values

A list of values.

TYPE: list

source_file

Source file for the annotation.

TYPE: str

OutputCommonData

OutputCommonData(name='', cls=None, description=None)

Data annotation for the whole corpus.

Similar to OutputData, but for a data annotation that applies to the entire corpus.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

cls

Optional annotation class of the output.

TYPE: str | None DEFAULT: None

description

Description of the annotation.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

write

write(value)

Write arbitrary corpus-level data to the annotation file.

PARAMETER DESCRIPTION
value

The data to write.

TYPE: Any

OutputData

OutputData(
    name="", cls=None, description=None, source_file=None
)

An annotation holding arbitrary data that is used as output.

This data is not tied to spans in the corpus text.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

cls

Optional annotation class of the output.

TYPE: str | None DEFAULT: None

description

Description of the annotation.

TYPE: str | None DEFAULT: None

source_file

The name of the source file.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

write

write(value)

Write arbitrary corpus-level string data to the annotation file.

PARAMETER DESCRIPTION
value

The data to write.

TYPE: Any

OutputDataAllSourceFiles

OutputDataAllSourceFiles(
    name="", cls=None, description=None
)

Data annotation used as output, not tied to a specific source file.

Similar to OutputData, this class is used for output annotations holding arbitrary data, but it is used when output should be produced for every source file in the corpus. By calling an instance of this class with a source file name as an argument, you can get an instance of OutputData for that source file.

Note

All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an instance of OutputData by passing a source file name as an argument, and use the methods of the OutputData class.

PARAMETER DESCRIPTION
name

The name of the annotation.

TYPE: str DEFAULT: ''

cls

Optional annotation class of the output.

TYPE: str | None DEFAULT: None

description

Description of the annotation.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__call__

__call__(source_file)

Get an OutputData instance for the specified source file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
OutputData

An OutputData instance for the specified source file.

__len__

__len__()

Return length of name.

exists

exists(source_file)

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

read

read(source_file)

Read arbitrary string data from annotation file.

PARAMETER DESCRIPTION
source_file

Source file for the annotation.

TYPE: str

RETURNS DESCRIPTION
Any

The data of the annotation.

remove

remove(source_file)

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

write

write(value, source_file)

Write arbitrary data to annotation file.

PARAMETER DESCRIPTION
value

The data to write.

TYPE: Any

source_file

Source file for the annotation.

TYPE: str

OutputMarker

OutputMarker(name='', cls=None, description=None)

A class for creating a marker, indicating that something has run.

Similar to OutputCommonData, but typically without any actual data. Markers are used to indicate that something has been executed, often by functions that don't produce a natural output, such as installers and uninstallers.

PARAMETER DESCRIPTION
name

The name of the marker.

TYPE: str DEFAULT: ''

cls

Optional annotation class of the output.

TYPE: str | None DEFAULT: None

description

Description of the annotation.

TYPE: str | None DEFAULT: None

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

exists

exists()

Return True if annotation file exists.

has_attribute staticmethod

has_attribute()

Return False as this class does not have an attribute.

remove

remove()

Remove the annotation file.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and an empty string.

write

write(value='')

Create a marker, indicating that something has run.

This is used by functions that don't have any natural output, like installers and uninstallers.

PARAMETER DESCRIPTION
value

The data to write. Usually this should be left out.

TYPE: str DEFAULT: ''

Source

Source(source_dir='')

Path to the directory containing source files.

PARAMETER DESCRIPTION
source_dir

Path to the directory containing source files. Should usually be left blank, for Sparv to automatically get the path.

TYPE: str DEFAULT: ''

get_path

get_path(source_file, extension)

Get the path of a specific source file.

PARAMETER DESCRIPTION
source_file

The name of the source file.

TYPE: SourceFilename

extension

File extension to append to the source file.

TYPE: str

RETURNS DESCRIPTION
Path

The path to the source file.

SourceAnnotations

SourceAnnotations(
    config_name, source_file=None, _headers=False
)

An iterable containing source annotations to include in the export, as specified in the corpus configuration.

PARAMETER DESCRIPTION
config_name

The configuration variable specifying which source annotations to include.

TYPE: str

source_file

The name of the source file.

TYPE: str | None DEFAULT: None

_headers

If True, read headers instead of source structure.

TYPE: bool DEFAULT: False

__getitem__

__getitem__(index)

Return item at index.

__len__

__len__()

Return number of annotations.

SourceAnnotationsAllSourceFiles

SourceAnnotationsAllSourceFiles(
    config_name, source_files=(), headers=False
)

Iterable with source annotations to include in export.

An iterable containing source annotations to include in the export, as specified in the corpus configuration. Unlike SourceAnnotations, this class ensures that the source annotations structure file (created using SourceStructure) for every source file is added as a dependency.

PARAMETER DESCRIPTION
config_name

The configuration variable specifying which source annotations to include.

TYPE: str

source_files

List of source file names.

TYPE: Iterable[str] DEFAULT: ()

headers

If True, read headers instead of source structure.

TYPE: bool DEFAULT: False

__getitem__

__getitem__(index)

Return item at index.

__len__

__len__()

Return number of annotations.

SourceFilename

A string representing the name of a source file.

SourceStructure

SourceStructure(source_file)

Every annotation name available in a source file.

PARAMETER DESCRIPTION
source_file

Name of the source file.

TYPE: str

annotation_name property

annotation_name

Retrieve the plain annotation name (excluding name of any attribute).

RETURNS DESCRIPTION
str

The plain annotation name without any attribute.

attribute_name property

attribute_name

Retrieve the attribute name (excluding name of the span annotation).

RETURNS DESCRIPTION
str | None

The attribute name without the name of the span annotation.

__bool__

__bool__()

Return True if name is not empty.

__len__

__len__()

Return length of name.

has_attribute

has_attribute()

Return True if the annotation has an attribute.

read

read()

Read structure file to get a list of names of annotations in the source file.

RETURNS DESCRIPTION
list[str]

A list of annotation names.

split

split()

Split the name into plain annotation name and attribute.

RETURNS DESCRIPTION
tuple[str, str]

A tuple with the plain annotation name and attribute name.

write

write(structure)

Sort the source file's structural elements and write structure file.

PARAMETER DESCRIPTION
structure

A list of annotation names.

TYPE: list[str]

SourceStructureParser

SourceStructureParser(source_dir)

Abstract class that should be implemented by an importer's structure parser.

Note

This class is intended to be used by the wizard. The wizard functionality is going to be deprecated in a future version.

PARAMETER DESCRIPTION
source_dir

Path to corpus source files.

TYPE: Path

get_annotations abstractmethod

get_annotations(corpus_config)

Return a list of annotations including attributes.

Each value has the format 'annotation:attribute' or 'annotation'. Plain versions of each annotation ('annotation' without attribute) must be included as well.

get_plain_annotations

get_plain_annotations(corpus_config)

Return a list of plain annotations without attributes.

Each value has the format 'annotation'.

setup staticmethod

setup()

Return a list of wizard dictionaries with questions needed for setting up the class.

Answers to the questions will automatically be saved to self.answers.

Text

Text(source_file=None)

Represents the text content of a source file.

PARAMETER DESCRIPTION
source_file

The name of the source file.

TYPE: str | None DEFAULT: None

read

read()

Get the text content of the source file.

RETURNS DESCRIPTION
str

The corpus text.

write

write(text)

Write the provided text content to a file, overwriting any existing content.

PARAMETER DESCRIPTION
text

The text to write. Should be a unicode string.

TYPE: str

Wildcard

Wildcard(name, type=OTHER, description=None)

Class holding wildcard information.

Typically used in the wildcards list passed as an argument to the @annotator decorator, e.g.:

@annotator("Number {annotation} by relative position within {parent}", wildcards=[
    Wildcard("annotation", Wildcard.ANNOTATION),
    Wildcard("parent", Wildcard.ANNOTATION)
])
PARAMETER DESCRIPTION
name

The name of the wildcard.

TYPE: str

type

The type of the wildcard, by reference to the constants defined in this class (Wildcard.ANNOTATION, Wildcard.ATTRIBUTE, Wildcard.ANNOTATION_ATTRIBUTE, or Wildcard.OTHER). Wildcard.ANNOTATION is used for annotation names (spans), Wildcard.ATTRIBUTE is used for attribute names, and Wildcard.ANNOTATION_ATTRIBUTE is used for wildcards that cover both the annotation name and an attribute. Wildcard.OTHER is used for other types of wildcards, unrelated to annotations or attributes.

TYPE: int DEFAULT: OTHER

description

The description of the wildcard.

TYPE: str | None DEFAULT: None

__new__

__new__(name, *args, **kwargs)

Create a new instance of the class.

PARAMETER DESCRIPTION
cls

The class to create an instance of.

name

The name of the wildcard.

TYPE: str

*args

Additional arguments.

TYPE: Any DEFAULT: ()

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}