Sparv Classes¶
Sparv classes are used to represent various types of data in Sparv, such as source files, models, and input and output annotations. By using Sparv classes in the signatures of processors, Sparv knows the inputs and outputs of each processor and can build a dependency graph to determine the order in which processors should be run to produce the desired output. Additionally, Sparv classes provide methods for reading and writing annotations, allowing annotators to handle annotation files without needing to understand Sparv's internal data format. Below is a list of all available Sparv classes, including their parameters, properties, and public methods.
Classes used as default input for annotator functions.
AllSourceFilenames
¶
AllSourceFilenames()
List with names of all source files.
This class provides an iterable containing the names of all source files. It is commonly used by exporter functions that need to combine annotations from multiple source files.
Annotation
¶
Annotation(name='', source_file=None)
Regular Annotation tied to one source file.
This class represents a regular annotation tied to a single source file. It is used when an annotation is required
as input for a function, for example, Annotation("<token:word>")
.
PARAMETER | DESCRIPTION |
---|---|
name
|
Name of the annotation.
TYPE:
|
source_file
|
Source file for the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
__iter__
¶
__iter__()
Get an iterator of values from the annotation.
This is a convenience method equivalent to read().
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator of values from the annotation. |
__len__
¶
__len__()
Get the number of values in the annotation.
RETURNS | DESCRIPTION |
---|---|
int
|
The number of values in the annotation. |
create_empty_attribute
¶
create_empty_attribute()
Return a list filled with None
of the same size as this annotation.
get_child_values
¶
get_child_values(
child, append_orphans=False, orphan_alert=False
)
Get values of children of this annotation.
PARAMETER | DESCRIPTION |
---|---|
child
|
Child annotation.
TYPE:
|
append_orphans
|
If
TYPE:
|
orphan_alert
|
If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[Iterator]
|
An iterator with one element for each parent. Each element is an iterator of values in the child annotation. |
Iterator[Iterator]
|
If |
get_children
¶
get_children(child, orphan_alert=False)
Get children of this annotation.
PARAMETER | DESCRIPTION |
---|---|
child
|
Child annotation.
TYPE:
|
orphan_alert
|
If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list
|
A tuple of two lists. |
list
|
The first one is a list with n (= total number of parents) elements where every element is a list |
tuple[list, list]
|
of indices in the child annotation. |
tuple[list, list]
|
The second one is a list of orphans, i.e. containing indices in the child annotation that have no parent. |
tuple[list, list]
|
Both parents and children are sorted according to their position in the source file. |
get_parents
¶
get_parents(parent, orphan_alert=False)
Get parents of this annotation.
PARAMETER | DESCRIPTION |
---|---|
parent
|
Parent annotation.
TYPE:
|
orphan_alert
|
If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list
|
A list with n (= total number of children) elements where every element is an index in the parent |
list
|
annotation, or |
get_size
¶
get_size()
Get number of values.
Note
This method is deprecated and will be removed in future versions. Use len()
instead.
RETURNS | DESCRIPTION |
---|---|
int
|
The number of values in the annotation. |
read
¶
read()
Get an iterator of values from the annotation.
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator of values from the annotation. |
read_attributes
¶
read_attributes(annotations, with_annotation_name=False)
Return an iterator of tuples of multiple attributes on the same annotation.
PARAMETER | DESCRIPTION |
---|---|
annotations
|
List of annotations to read attributes from.
TYPE:
|
with_annotation_name
|
If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[tuple]
|
An iterator of tuples of attributes. |
read_parents_and_children
¶
read_parents_and_children(parent, child)
Read parent and child annotations.
Reorders them according to span position, but keeps original index information.
PARAMETER | DESCRIPTION |
---|---|
parent
|
Parent annotation.
TYPE:
|
child
|
Child annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple[Iterator, Iterator]
|
A tuple of iterators for parent and child annotations. |
read_spans
¶
read_spans(decimals=False, with_annotation_name=False)
Get an iterator of spans from the annotation.
PARAMETER | DESCRIPTION |
---|---|
decimals
|
If
TYPE:
|
with_annotation_name
|
If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[tuple]
|
An iterator of spans from the annotation. |
read_text
¶
read_text()
Get the source text of the annotation.
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator of the source text of the annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
AnnotationAllSourceFiles
¶
AnnotationAllSourceFiles(name='')
Regular annotation but source file must be specified for all actions.
Like Annotation
, this class represents a regular annotation, but is used as input
to an annotator to require the specified annotation for every source file in the corpus. By calling an instance of
this class with a source file name as an argument, you can get an instance of Annotation
for that source file.
Note
All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an
instance of Annotation
by passing a source file name as an argument, and use the methods of the Annotation
class.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
__call__
¶
__call__(source_file)
Get an Annotation instance for the specified source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Annotation
|
An Annotation instance for the specified source file. |
create_empty_attribute
¶
create_empty_attribute(source_file)
Return a list filled with None of the same size as this annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
get_child_values
¶
get_child_values(
source_file,
child,
append_orphans=False,
orphan_alert=False,
)
Get values of children of this annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
child
|
Child annotation.
TYPE:
|
append_orphans
|
If True, append orphans to the end.
TYPE:
|
orphan_alert
|
If True, log a warning when a child has no parent.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[Iterator]
|
An iterator with one element for each parent. Each element is an iterator of values in the child annotation. |
Iterator[Iterator]
|
If append_orphans is True, the last element is an iterator of orphans. |
get_children
¶
get_children(source_file, child, orphan_alert=False)
Get children of this annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
child
|
Child annotation.
TYPE:
|
orphan_alert
|
If True, log a warning when a child has no parent.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list
|
A tuple of two lists. |
list
|
The first one is a list with n (= total number of parents) elements where every element is a list |
tuple[list, list]
|
of indices in the child annotation. |
tuple[list, list]
|
The second one is a list of orphans, i.e. containing indices in the child annotation that have no parent. |
tuple[list, list]
|
Both parents and children are sorted according to their position in the source file. |
get_parents
¶
get_parents(source_file, parent, orphan_alert=False)
Get parents of this annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
parent
|
Parent annotation.
TYPE:
|
orphan_alert
|
If True, log a warning when a child has no parent.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list
|
A list with n (= total number of children) elements where every element is an index in the parent |
list
|
annotation, or None when no parent is found. |
get_size
¶
get_size(source_file)
Get number of values.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
The number of values in the annotation. |
read
¶
read(source_file)
Get an iterator of values from the annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator of values from the annotation. |
read_attributes
¶
read_attributes(
source_file, annotations, with_annotation_name=False
)
Return an iterator of tuples of multiple attributes on the same annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
annotations
|
List of annotations to read attributes from.
TYPE:
|
with_annotation_name
|
If True, return attributes with annotation name.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator
|
An iterator of tuples of attributes. |
read_spans
¶
read_spans(
source_file, decimals=False, with_annotation_name=False
)
Get an iterator of spans from the annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
decimals
|
If True, return spans with decimals.
TYPE:
|
with_annotation_name
|
If True, return spans with annotation name.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator
|
An iterator of spans from the annotation. |
read_text
¶
read_text(source_file)
Get the source text of the annotation.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator of the source text of the annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
AnnotationCommonData
¶
AnnotationCommonData(name='')
Data annotation for the whole corpus.
Like AnnotationData
, this class represents an annotation with arbitrary data
when used as input to an annotator. However, AnnotationCommonData
is used for data that applies to the entire
corpus, not tied to a specific source file.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read arbitrary corpus-level data from the annotation file.
RETURNS | DESCRIPTION |
---|---|
Any
|
The data of the annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
AnnotationData
¶
AnnotationData(name='', source_file=None)
Annotation of the data type, for one source file, not tied to spans in the corpus text.
This class represents an annotation holding arbitrary data, i.e., data that is not tied to spans in the corpus text. It is used as input to an annotator.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
source_file
|
The name of the source file.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read arbitrary data from the annotation file.
RETURNS | DESCRIPTION |
---|---|
Any
|
The data of the annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
AnnotationDataAllSourceFiles
¶
AnnotationDataAllSourceFiles(name='')
Data annotation but source file must be specified for all actions.
Similar to AnnotationData
, this class is used for annotations holding
arbitrary data, but it is used as input to an annotator to require the specified annotation for every source file
in the corpus. By calling an instance of this class with a source file name as an argument, you can get an instance
of AnnotationData
for that source file.
Note
All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an
instance of AnnotationData
by passing a source file name as an argument,
and use the methods of the AnnotationData
class.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
__call__
¶
__call__(source_file)
Get an AnnotationData instance for the specified source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnnotationData
|
An AnnotationData instance for the specified source file. |
read
¶
read(source_file)
Read arbitrary data from annotation file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Any
|
The data of the annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
AnnotationName
¶
AnnotationName(name='', source_file=None, is_input=None)
Class representing an Annotation name.
Use this class when only the name of an annotation is needed, not the actual data. The annotation will not be added
as a prerequisite for the annotator, meaning that using AnnotationName
will not automatically trigger the creation
of the referenced annotation.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
source_file
|
The name of the source file.
TYPE:
|
is_input
|
Deprecated, use AnnotationName instead of setting this to False.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
Binary
¶
Path to binary executable.
This class holds the path to a binary executable. The path can be either the name of a binary in the system's
PATH
, a full path to a binary, or a path relative to the Sparv data directory. It is often used to define a
prerequisite for an annotator function.
PARAMETER | DESCRIPTION |
---|---|
object
|
Path to the binary executable.
|
BinaryDir
¶
Path to directory containing executable binaries.
This class holds the path to a directory containing executable binaries. The path can be either an absolute path or a path relative to the Sparv data directory.
PARAMETER | DESCRIPTION |
---|---|
object
|
Path to the directory containing the executable binaries.
|
Config
¶
Config(
name,
default=None,
description=None,
datatype=None,
choices=None,
pattern=None,
min_len=None,
max_len=None,
min_value=None,
max_value=None,
const=None,
conditions=None,
)
Class holding configuration key names.
This class represents a configuration key and optionally its default value. You can specify the datatype and allowed values, which will be used for validating the config and generating the Sparv config JSON schema.
For further information on how to use this class, see the Config Parameters section.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the configuration key.
TYPE:
|
default
|
The optional default value of the configuration key.
TYPE:
|
description
|
A mandatory description of the configuration key.
TYPE:
|
datatype
|
A type specifying the allowed datatype(s). Supported types are
TYPE:
|
choices
|
An iterable of valid choices, or a function that returns such an iterable.
TYPE:
|
pattern
|
A regular expression matching valid values (only for the datatype
TYPE:
|
min_len
|
An
TYPE:
|
max_len
|
An
TYPE:
|
min_value
|
An
TYPE:
|
max_value
|
An
TYPE:
|
const
|
Restrict the value to a constant.
TYPE:
|
conditions
|
A list of
TYPE:
|
__new__
¶
__new__(name, *args, **kwargs)
Create a new instance of the class.
PARAMETER | DESCRIPTION |
---|---|
cls
|
The class to create an instance of.
|
name
|
The name of the configuration key.
TYPE:
|
*args
|
Additional arguments.
TYPE:
|
**kwargs
|
Additional keyword arguments.
TYPE:
|
Corpus
¶
A string representing the name (ID) of a corpus.
Export
¶
A string containing the path to an export directory and filename.
Represents an export file, used to define the output of an exporter function.
PARAMETER | DESCRIPTION |
---|---|
object
|
The export directory and filename. The export directory must include the module name as a prefix or be
equal to the module name. The filename may include the wildcard
|
ExportAnnotationNames
¶
ExportAnnotationNames(config_name)
List of annotations to include in export.
An iterable containing annotations to be included in the export, as specified in the corpus configuration. Unlike
ExportAnnotations
, using this class will not add the annotations as dependencies. Use this class when you only
need the annotation names, not the actual annotation files.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable specifying which annotations to include.
TYPE:
|
ExportAnnotations
¶
ExportAnnotations(config_name, is_input=None)
Iterable with annotations to include in export.
An iterable containing annotations to be included in the export, as specified in the corpus configuration. When using this class, annotation files for the current source file are automatically added as dependencies.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable specifying which annotations to include.
TYPE:
|
is_input
|
Deprecated, use
TYPE:
|
ExportAnnotationsAllSourceFiles
¶
ExportAnnotationsAllSourceFiles(config_name)
List of annotations to include in export.
An iterable containing annotations to be included in the export, as specified in the corpus configuration. When using this class, annotation files for all source files will automatically be added as dependencies.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable specifying which annotations to include.
TYPE:
|
ExportInput
¶
ExportInput(val, all_files=False)
Export directory and filename pattern, used as input.
Represents the export directory and filename pattern used as input. Use this class when you need export files as input for another function.
PARAMETER | DESCRIPTION |
---|---|
val
|
The export directory and filename pattern (e.g.,
TYPE:
|
all_files
|
Set to
TYPE:
|
__new__
¶
__new__(val, *args, **kwargs)
Create a new instance of the class.
PARAMETER | DESCRIPTION |
---|---|
cls
|
The class to create an instance of.
|
val
|
The export directory and filename pattern (e.g.,
TYPE:
|
*args
|
Additional arguments.
TYPE:
|
**kwargs
|
Additional keyword arguments.
TYPE:
|
HeaderAnnotations
¶
HeaderAnnotations(config_name, source_file=None)
Header annotations to include in export.
An iterable containing header annotations from the source to be included in the export, as specified in the corpus configuration.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable that specifies which header annotations to include.
TYPE:
|
source_file
|
Name of the source file.
TYPE:
|
HeaderAnnotationsAllSourceFiles
¶
HeaderAnnotationsAllSourceFiles(
config_name, source_files=()
)
Header annotations to include in export.
An iterable containing header annotations from all source files to be included in the export, as specified in the
corpus configuration. Unlike HeaderAnnotations
, this class ensures that the header annotations file (created using
Headers
) for every source file is added as a dependency.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable specifying which source annotations to include.
TYPE:
|
source_files
|
List of source files (internal use only).
TYPE:
|
Headers
¶
Headers(source_file)
List of header annotation names.
Represents a list of header annotation names for a given source file, used as output for importers.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
The name of the source file.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read the headers file and return a list of header annotation names.
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A list of header annotation names. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
write
¶
write(header_annotations)
Write the headers file with the provided list of header annotation names.
PARAMETER | DESCRIPTION |
---|---|
header_annotations
|
A list of header annotation names.
TYPE:
|
Language
¶
The language of the corpus.
An instance of this class contains information about the language of the corpus. This information is retrieved from the corpus configuration and is specified using an ISO 639-1 language code.
Marker
¶
Marker(name='')
A marker indicating that something has run.
Similar to AnnotationCommonData
, but typically without any actual data.
Used as input. Markers are used to make sure that something has been executed. Created using
OutputMarker
.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the marker.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read arbitrary corpus-level string data from the marker file.
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator with the data of the marker. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
MarkerOptional
¶
MarkerOptional(name='')
Same as Marker
, but if the marker file doesn't exist, it won't be created.
This is mainly used to get a reference to a marker that may or may not exist, to be able to remove markers from connected (un)installers without triggering the connected (un)installation. Otherwise, running an uninstaller without first having run the installer would needlessly trigger the installation first.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the marker.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read arbitrary corpus-level string data from the marker file.
RETURNS | DESCRIPTION |
---|---|
Iterator[str]
|
An iterator with the data of the marker. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
Model
¶
Model(name)
Path to a model file.
Represents a path to a model file. The path can be either an absolute path, a relative path, or a path relative to the Sparv model directory. Typically used as input to annotator functions.
PARAMETER | DESCRIPTION |
---|---|
name
|
The path of the model file.
TYPE:
|
path
property
¶
path
Get model path.
RETURNS | DESCRIPTION |
---|---|
Path
|
Get the path to the model file as a |
download
¶
download(url)
Download the file from the given URL and save to the model path.
PARAMETER | DESCRIPTION |
---|---|
url
|
URL to download from.
TYPE:
|
read
¶
read()
Read arbitrary string data from the model file.
RETURNS | DESCRIPTION |
---|---|
str
|
The data of the model. |
read_pickle
¶
read_pickle()
Read pickled data from model file.
RETURNS | DESCRIPTION |
---|---|
Any
|
The data of the model. |
remove
¶
remove(raise_errors=False)
Remove model file from disk.
PARAMETER | DESCRIPTION |
---|---|
raise_errors
|
If
TYPE:
|
ungzip
¶
ungzip(out)
Unzip gzip file in same directory as the model file.
PARAMETER | DESCRIPTION |
---|---|
out
|
Path to output file.
TYPE:
|
write
¶
write(data)
Write arbitrary string data to the model file.
PARAMETER | DESCRIPTION |
---|---|
data
|
The data to write.
TYPE:
|
write_pickle
¶
write_pickle(data, protocol=-1)
Dump arbitrary data to the model file in pickle format.
PARAMETER | DESCRIPTION |
---|---|
data
|
The data to write.
TYPE:
|
protocol
|
Pickle protocol to use.
TYPE:
|
ModelOutput
¶
ModelOutput(name, description=None)
Same as Model
, but used as the output of a model builder.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the model file.
TYPE:
|
description
|
Description of the model.
TYPE:
|
path
property
¶
path
Get model path.
RETURNS | DESCRIPTION |
---|---|
Path
|
Get the path to the model file as a |
download
¶
download(url)
Download the file from the given URL and save to the model path.
PARAMETER | DESCRIPTION |
---|---|
url
|
URL to download from.
TYPE:
|
read
¶
read()
Read arbitrary string data from the model file.
RETURNS | DESCRIPTION |
---|---|
str
|
The data of the model. |
read_pickle
¶
read_pickle()
Read pickled data from model file.
RETURNS | DESCRIPTION |
---|---|
Any
|
The data of the model. |
remove
¶
remove(raise_errors=False)
Remove model file from disk.
PARAMETER | DESCRIPTION |
---|---|
raise_errors
|
If
TYPE:
|
ungzip
¶
ungzip(out)
Unzip gzip file in same directory as the model file.
PARAMETER | DESCRIPTION |
---|---|
out
|
Path to output file.
TYPE:
|
write
¶
write(data)
Write arbitrary string data to the model file.
PARAMETER | DESCRIPTION |
---|---|
data
|
The data to write.
TYPE:
|
write_pickle
¶
write_pickle(data, protocol=-1)
Dump arbitrary data to the model file in pickle format.
PARAMETER | DESCRIPTION |
---|---|
data
|
The data to write.
TYPE:
|
protocol
|
Pickle protocol to use.
TYPE:
|
Namespaces
¶
Namespaces(source_file)
Namespace mapping (URI to prefix) for a source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read namespace file and parse it into a dict.
RETURNS | DESCRIPTION |
---|---|
dict[str, str]
|
A dict with prefixes as keys and URIs as values. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
write
¶
write(namespaces)
Write namespace file.
PARAMETER | DESCRIPTION |
---|---|
namespaces
|
A dict with prefixes as keys and URIs as values.
TYPE:
|
Output
¶
Output(
name="", cls=None, description=None, source_file=None
)
Regular annotation or attribute used as output from an annotator function.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
cls
|
Optional annotation class of the output.
TYPE:
|
description
|
Description of the annotation.
TYPE:
|
source_file
|
The name of the source file.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
write
¶
write(values)
Write the annotation to a file, overwriting any existing annotation.
All values will be converted to strings.
PARAMETER | DESCRIPTION |
---|---|
values
|
A list of values.
TYPE:
|
OutputAllSourceFiles
¶
OutputAllSourceFiles(name='', cls=None, description=None)
Regular annotation or attribute used as output, but not tied to a specific source file.
Similar to Output
, this class represents a regular annotation or attribute used as
output, but it is used when output should be produced for every source file in the corpus. By calling an instance
of this class with a source file name as an argument, you can get an instance of Output
for that source file.
Note
All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an
instance of Output
by passing a source file name as an argument, and use the methods of the Output
class.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
cls
|
Optional annotation class of the output.
TYPE:
|
description
|
Description of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
__call__
¶
__call__(source_file)
Get an AnnotationData instance for the specified source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Output
|
An Output instance for the specified source file. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
write
¶
write(values, source_file)
Write an annotation to file. Existing annotation will be overwritten.
PARAMETER | DESCRIPTION |
---|---|
values
|
A list of values.
TYPE:
|
source_file
|
Source file for the annotation.
TYPE:
|
OutputCommonData
¶
OutputCommonData(name='', cls=None, description=None)
Data annotation for the whole corpus.
Similar to OutputData
, but for a data annotation that applies to the entire
corpus.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
cls
|
Optional annotation class of the output.
TYPE:
|
description
|
Description of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
write
¶
write(value)
Write arbitrary corpus-level data to the annotation file.
PARAMETER | DESCRIPTION |
---|---|
value
|
The data to write.
TYPE:
|
OutputData
¶
OutputData(
name="", cls=None, description=None, source_file=None
)
An annotation holding arbitrary data that is used as output.
This data is not tied to spans in the corpus text.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
cls
|
Optional annotation class of the output.
TYPE:
|
description
|
Description of the annotation.
TYPE:
|
source_file
|
The name of the source file.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
write
¶
write(value)
Write arbitrary corpus-level string data to the annotation file.
PARAMETER | DESCRIPTION |
---|---|
value
|
The data to write.
TYPE:
|
OutputDataAllSourceFiles
¶
OutputDataAllSourceFiles(
name="", cls=None, description=None
)
Data annotation used as output, not tied to a specific source file.
Similar to OutputData
, this class is used for output annotations holding arbitrary
data, but it is used when output should be produced for every source file in the corpus. By calling an instance of
this class with a source file name as an argument, you can get an instance of OutputData
for that source file.
Note
All methods of this class are deprecated and will be removed in a future version of Sparv. Instead, create an
instance of OutputData
by passing a source file name as an argument, and use the methods of the OutputData
class.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the annotation.
TYPE:
|
cls
|
Optional annotation class of the output.
TYPE:
|
description
|
Description of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
__call__
¶
__call__(source_file)
Get an OutputData instance for the specified source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
OutputData
|
An OutputData instance for the specified source file. |
read
¶
read(source_file)
Read arbitrary string data from annotation file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Source file for the annotation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Any
|
The data of the annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
write
¶
write(value, source_file)
Write arbitrary data to annotation file.
PARAMETER | DESCRIPTION |
---|---|
value
|
The data to write.
TYPE:
|
source_file
|
Source file for the annotation.
TYPE:
|
OutputMarker
¶
OutputMarker(name='', cls=None, description=None)
A class for creating a marker, indicating that something has run.
Similar to OutputCommonData
, but typically without any actual data. Markers
are used to indicate that something has been executed, often by functions that don't produce a natural output, such
as installers and uninstallers.
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the marker.
TYPE:
|
cls
|
Optional annotation class of the output.
TYPE:
|
description
|
Description of the annotation.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and an empty string. |
write
¶
write(value='')
Create a marker, indicating that something has run.
This is used by functions that don't have any natural output, like installers and uninstallers.
PARAMETER | DESCRIPTION |
---|---|
value
|
The data to write. Usually this should be left out.
TYPE:
|
Source
¶
Source(source_dir='')
Path to the directory containing source files.
PARAMETER | DESCRIPTION |
---|---|
source_dir
|
Path to the directory containing source files. Should usually be left blank, for Sparv to automatically get the path.
TYPE:
|
get_path
¶
get_path(source_file, extension)
Get the path of a specific source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
The name of the source file.
TYPE:
|
extension
|
File extension to append to the source file.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Path
|
The path to the source file. |
SourceAnnotations
¶
SourceAnnotations(
config_name, source_file=None, _headers=False
)
An iterable containing source annotations to include in the export, as specified in the corpus configuration.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable specifying which source annotations to include.
TYPE:
|
source_file
|
The name of the source file.
TYPE:
|
_headers
|
If
TYPE:
|
SourceAnnotationsAllSourceFiles
¶
SourceAnnotationsAllSourceFiles(
config_name, source_files=(), headers=False
)
Iterable with source annotations to include in export.
An iterable containing source annotations to include in the export, as specified in the corpus configuration. Unlike
SourceAnnotations
, this class ensures that the source annotations structure file (created using SourceStructure
)
for every source file is added as a dependency.
PARAMETER | DESCRIPTION |
---|---|
config_name
|
The configuration variable specifying which source annotations to include.
TYPE:
|
source_files
|
List of source file names.
TYPE:
|
headers
|
If
TYPE:
|
SourceFilename
¶
A string representing the name of a source file.
SourceStructure
¶
SourceStructure(source_file)
Every annotation name available in a source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
Name of the source file.
TYPE:
|
annotation_name
property
¶
annotation_name
Retrieve the plain annotation name (excluding name of any attribute).
RETURNS | DESCRIPTION |
---|---|
str
|
The plain annotation name without any attribute. |
attribute_name
property
¶
attribute_name
Retrieve the attribute name (excluding name of the span annotation).
RETURNS | DESCRIPTION |
---|---|
str | None
|
The attribute name without the name of the span annotation. |
read
¶
read()
Read structure file to get a list of names of annotations in the source file.
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A list of annotation names. |
split
¶
split()
Split the name into plain annotation name and attribute.
RETURNS | DESCRIPTION |
---|---|
tuple[str, str]
|
A tuple with the plain annotation name and attribute name. |
write
¶
write(structure)
Sort the source file's structural elements and write structure file.
PARAMETER | DESCRIPTION |
---|---|
structure
|
A list of annotation names.
TYPE:
|
SourceStructureParser
¶
SourceStructureParser(source_dir)
Abstract class that should be implemented by an importer's structure parser.
Note
This class is intended to be used by the wizard. The wizard functionality is going to be deprecated in a future version.
PARAMETER | DESCRIPTION |
---|---|
source_dir
|
Path to corpus source files.
TYPE:
|
get_annotations
abstractmethod
¶
get_annotations(corpus_config)
Return a list of annotations including attributes.
Each value has the format 'annotation:attribute' or 'annotation'. Plain versions of each annotation ('annotation' without attribute) must be included as well.
get_plain_annotations
¶
get_plain_annotations(corpus_config)
Return a list of plain annotations without attributes.
Each value has the format 'annotation'.
setup
staticmethod
¶
setup()
Return a list of wizard dictionaries with questions needed for setting up the class.
Answers to the questions will automatically be saved to self.answers.
Text
¶
Text(source_file=None)
Represents the text content of a source file.
PARAMETER | DESCRIPTION |
---|---|
source_file
|
The name of the source file.
TYPE:
|
Wildcard
¶
Wildcard(name, type=OTHER, description=None)
Class holding wildcard information.
Typically used in the wildcards
list passed as an argument to the @annotator
decorator, e.g.:
@annotator("Number {annotation} by relative position within {parent}", wildcards=[
Wildcard("annotation", Wildcard.ANNOTATION),
Wildcard("parent", Wildcard.ANNOTATION)
])
PARAMETER | DESCRIPTION |
---|---|
name
|
The name of the wildcard.
TYPE:
|
type
|
The type of the wildcard, by reference to the constants defined in this class
(
TYPE:
|
description
|
The description of the wildcard.
TYPE:
|
__new__
¶
__new__(name, *args, **kwargs)
Create a new instance of the class.
PARAMETER | DESCRIPTION |
---|---|
cls
|
The class to create an instance of.
|
name
|
The name of the wildcard.
TYPE:
|
*args
|
Additional arguments.
TYPE:
|
**kwargs
|
Additional keyword arguments.
TYPE:
|