Utilities¶

Sparv provides a variety of utility functions, classes, and constants that are useful across different modules. These utilities are primarily imported from sparv.api.util and its submodules. For example:

from sparv.api.util.system import call_binary

Constants¶

The sparv.api.util.constants module includes several predefined constants that are used throughout the Sparv pipeline:

DELIM = "|": Delimiter character used to separate ambiguous results.
AFFIX = "|": Character used to enclose results, marking them as a set.
SCORESEP = ":": Character that separates an annotation from its score.
COMPSEP = "+": Character used to separate parts of a compound.
UNDEF = "__UNDEF__": Value representing undefined annotations.
OVERLAP_ATTR = "overlap": Name for automatically created overlap attributes.
SPARV_DEFAULT_NAMESPACE = "sparv": Default namespace used when annotation names collide and sparv_namespace is not set in the configuration.
UTF8 = "UTF-8": UTF-8 encoding.
LATIN1 = "ISO-8859-1": Latin-1 encoding.
HEADER_CONTENTS = "contents": Name of the annotation containing header contents.

Export Utils¶

sparv.api.util.export provides utility functions for preparing data for export.

gather_annotations ¶

gather_annotations(
    annotations,
    export_names,
    header_annotations=None,
    source_file=None,
    flatten=True,
    split_overlaps=False,
)

Calculate the span hierarchy and the annotation_dict containing all annotation elements and attributes.

PARAMETER	DESCRIPTION
`annotations`	List of annotations to include. TYPE: `list[Annotation]`
`export_names`	Dictionary that maps from annotation names to export names. TYPE: `dict[str, str]`
`header_annotations`	List of header annotations. TYPE: `list[Annotation] \| None` DEFAULT: `None`
`source_file`	The source filename. TYPE: `str \| None` DEFAULT: `None`
`flatten`	Whether to return the spans as a flat list. TYPE: `bool` DEFAULT: `True`
`split_overlaps`	Whether to split up overlapping spans. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`list[tuple]`	A `spans_dict` and an `annotation_dict` if `flatten` is `True`, otherwise returns `span_positions` and
`dict[str, dict]`	`annotation_dict`.

RAISES	DESCRIPTION
`SparvErrorMessage`	If the source file is not found for header annotations.

calculate_element_hierarchy ¶

calculate_element_hierarchy(source_file, spans_list)

Calculate the hierarchy for spans with identical start and end positions.

If two spans A and B have identical start and end positions, go through all occurrences of A and B and check which element is most often parent to the other.

PARAMETER	DESCRIPTION
`source_file`	The source filename. TYPE: `str`
`spans_list`	List of spans to check for hierarchy. TYPE: `list`

RETURNS	DESCRIPTION
`dict[str, dict[str, int]]`	A dictionary with the hierarchy of spans.

get_annotation_names ¶

get_annotation_names(
    annotations,
    source_annotations=None,
    source_file=None,
    token_name=None,
    remove_namespaces=False,
    keep_struct_names=False,
    sparv_namespace=None,
    source_namespace=None,
    xml_mode=False,
)

Get a list of annotations, token attributes, and a dictionary translating annotation names to export names.

PARAMETER	DESCRIPTION
`annotations`	List of elements:attributes (annotations) to include, with possible export names. TYPE: `ExportAnnotations \| ExportAnnotationsAllSourceFiles \| list[tuple[Annotation \| AnnotationAllSourceFiles, str \| None]]`
`source_annotations`	List of elements:attributes from the source file to include, with possible export names. If not specified, includes everything. TYPE: `SourceAnnotations \| SourceAnnotationsAllSourceFiles` DEFAULT: `None`
`source_file`	Name of the source file. TYPE: `str \| None` DEFAULT: `None`
`token_name`	Name of the token annotation. TYPE: `str \| None` DEFAULT: `None`
`remove_namespaces`	Set to `True` to remove all namespaces in `export_names` unless names are ambiguous. TYPE: `bool` DEFAULT: `False`
`keep_struct_names`	Set to `True` to include the annotation base name (everything before ":") in `export_names` for annotations that are not token attributes. TYPE: `bool` DEFAULT: `False`
`sparv_namespace`	Namespace to add to all Sparv annotations. TYPE: `str \| None` DEFAULT: `None`
`source_namespace`	Namespace to add to all annotations from the source file. TYPE: `str \| None` DEFAULT: `None`
`xml_mode`	Set to `True` to use XML namespaces in `export_names`. TYPE: `bool \| None` DEFAULT: `False`

RETURNS	DESCRIPTION
`list[Annotation \| AnnotationAllSourceFiles]`	A list of annotations, a list of token attribute names, a dictionary with translation from annotation names to
`list[str]`	export names.

get_header_names ¶

get_header_names(header_annotations, xml_namespaces)

Get a list of header annotations and a dictionary for renamed annotations.

PARAMETER	DESCRIPTION
`header_annotations`	List of header annotations from the source file to include. If not specified, includes everything. TYPE: `HeaderAnnotations \| None`
`xml_namespaces`	XML namespaces to use for the header annotations. TYPE: `dict[str, str]`

RETURNS	DESCRIPTION
`tuple[list[Annotation], dict[str, str]]`	A list of header annotations and a dictionary with translation from annotation names to export names.

scramble_spans ¶

scramble_spans(span_positions, chunk_name, chunk_order)

Reorder spans based on chunk_order and ensure tags are opened and closed correctly.

PARAMETER	DESCRIPTION
`span_positions`	Original span positions, typically obtained from `gather_annotations()`. TYPE: `list[tuple]`
`chunk_name`	Name of the annotation to reorder. TYPE: `str`
`chunk_order`	Annotation specifying the new order of the chunks. TYPE: `Annotation`

RETURNS	DESCRIPTION
`list[tuple]`	List of tuples with the new span positions and instructions.

Install/Uninstall Utils¶

sparv.api.util.install provides functions for installing and uninstalling corpora, either locally or remotely.

install_path ¶

install_path(source_path, host, target_path)

Transfer a file or the contents of a directory to a target destination, optionally on a different host.

PARAMETER	DESCRIPTION
`source_path`	Path to the local file or directory to sync. If a directory is specified, its contents are synced, not the directory itself, and any extraneous files in destination directories are deleted. TYPE: `str \| Path`
`host`	Remote host to install to. Set to `None` to install locally. TYPE: `str \| None`
`target_path`	Path to the target file or directory. TYPE: `str \| Path`

uninstall_path ¶

uninstall_path(path, host=None)

Remove a file or directory, optionally on a different host.

PARAMETER	DESCRIPTION
`path`	Path to the file or directory to remove. TYPE: `str \| Path`
`host`	Remote host where the file or directory is located. Set to `None` to remove locally. TYPE: `str \| None` DEFAULT: `None`

install_mysql ¶

install_mysql(host, db_name, sqlfile)

Insert tables and data from one or more SQL files into a local or remote MySQL database.

PARAMETER	DESCRIPTION
`host`	The remote host to install to. Set to `None` to install locally. TYPE: `str \| None`
`db_name`	The name of the database. TYPE: `str`
`sqlfile`	The path to a SQL file, or a list of paths to multiple SQL files. TYPE: `Path \| str \| list[Path \| str]`

install_mysql_dump ¶

install_mysql_dump(host, db_name, tables)

Copy selected tables, including their data, from a local MySQL database to a remote one.

PARAMETER	DESCRIPTION
`host`	The remote host to install to. TYPE: `str`
`db_name`	The name of the remote database. TYPE: `str`
`tables`	A table name or a list of table names. If a list is provided, the tables are separated by spaces. TYPE: `str \| Iterable[str]`

install_svn ¶

install_svn(source_file, svn_url, remove_existing=False)

Check in a file to an SVN repository.

If the file is already in the repository, it will be deleted and added again.

PARAMETER	DESCRIPTION
`source_file`	The file to check in. TYPE: `str \| Path`
`svn_url`	The URL to the SVN repository, including the path to the file. TYPE: `str`
`remove_existing`	If False, this function can only be used to add new files to the repository. If True, existing files will be deleted before the import. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`SparvErrorMessage`	If the source_file does not exist, if svn_url is not set, if it is not possible to list or delete the file in the SVN repository, if remove_existing is set to False and source_file already exists in the repository, or if it is not possible to import the file to the SVN repository.

uninstall_svn ¶

uninstall_svn(svn_url)

Delete a file from an SVN repository.

PARAMETER	DESCRIPTION
`svn_url`	The URL to the SVN repository including the name of the file to remove. TYPE: `str`

RAISES	DESCRIPTION
`SparvErrorMessage`	If svn_url is not set, or if deletion fails.

install_git ¶

install_git(source_file, repo_path, commit_message=None)

Copy a file to a local Git repository and make a commit.

PARAMETER	DESCRIPTION
`source_file`	The file to copy. TYPE: `str \| Path`
`repo_path`	The path to the local Git repository. TYPE: `str \| Path`
`commit_message`	The commit message. If not set, a default message will be used. TYPE: `str \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`SparvErrorMessage`	If the source file does not exist, if repo_path is not set, or if it is not possible to add the file to the Git repository.

uninstall_git ¶

uninstall_git(file_path, commit_message=None)

Remove a file from a local Git repository and make a commit.

PARAMETER	DESCRIPTION
`file_path`	The path to file to remove. TYPE: `str \| Path`
`commit_message`	The commit message. If not set, a default message will be used. TYPE: `str \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`SparvErrorMessage`	If repo_path is not set, if the file does not exist in the Git repository, or if it is not possible to remove the file from the Git repository.

System Utils¶

sparv.api.util.system provides functions for managing processes, creating directories, and more.

kill_process ¶

kill_process(process)

Terminate a process, ignoring any errors if the process is already terminated.

PARAMETER	DESCRIPTION
`process`	The process to be terminated. TYPE: `Popen`

RAISES	DESCRIPTION
`OSError`	If an error occurs while killing the process.

clear_directory ¶

clear_directory(path)

Create a new empty directory at the given path, and remove its contents if it already exists.

PARAMETER	DESCRIPTION
`path`	The path where the directory should be created. TYPE: `str \| Path`

call_java ¶

call_java(
    jar,
    arguments,
    options=(),
    stdin="",
    search_paths=(),
    encoding=None,
    verbose=False,
    return_command=False,
)

Execute a Java program using a specified jar file, command line arguments, and stdin input.

PARAMETER	DESCRIPTION
`jar`	The name of the jar file to execute. TYPE: `str`
`arguments`	A list of arguments to pass to the Java program. TYPE: `list \| tuple`
`options`	A list of Java options to include in the call. TYPE: `list \| tuple` DEFAULT: `()`
`stdin`	Input to pass to the program's `stdin`. TYPE: `str` DEFAULT: `''`
`search_paths`	Additional paths to search for the Java binary, in addition to the environment variable PATH. TYPE: `list \| tuple` DEFAULT: `()`
`encoding`	The encoding to use for `stdin` and `stdout`. TYPE: `str \| None` DEFAULT: `None`
`verbose`	If `True`, pipe `stderr` to `stderr` in the terminal, instead of returning it. TYPE: `bool` DEFAULT: `False`
`return_command`	If `True`, return the process instead of `stdout` and `stderr`. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`tuple[str, str] \| Popen`	A tuple with `stdout` and `stderr`, or the process if `return_command` is `True`.
`tuple[str, str] \| Popen`	If `verbose` is `True`, `stderr` is an empty string.

call_binary ¶

call_binary(
    name,
    arguments=(),
    stdin="",
    raw_command=None,
    search_paths=(),
    encoding=None,
    verbose=False,
    use_shell=False,
    allow_error=False,
    return_command=False,
)

Call a binary with specified arguments and stdin.

PARAMETER	DESCRIPTION
`name`	The binary to execute (can include absolute or relative path). Accepts a string, a Path, or an iterable of strings or Paths, using the first found binary. TYPE: `str \| Path \| Iterable[str \| Path]`
`arguments`	List of arguments to pass to the binary. TYPE: `list \| tuple` DEFAULT: `()`
`stdin`	Input to pass to the process's `stdin`. TYPE: `str \| list \| tuple` DEFAULT: `''`
`raw_command`	A raw command to execute through the shell (implies `use_shell=True`). TYPE: `str \| None` DEFAULT: `None`
`search_paths`	Additional paths to search for the binary, besides the environment variable `PATH`. TYPE: `list \| tuple` DEFAULT: `()`
`encoding`	Encoding to use for `stdin` and `stdout`. TYPE: `str \| None` DEFAULT: `None`
`verbose`	If `True`, pipe `stderr` to `stderr` in the terminal, instead of returning it. TYPE: `bool` DEFAULT: `False`
`use_shell`	If `True`, executes the command through the shell. Automatically set to `True` if `raw_command` is provided. TYPE: `bool` DEFAULT: `False`
`allow_error`	If `False` (default), raises an error if the binary returns a non-zero exit code, and log both `stdout` and `stderr`. TYPE: `bool` DEFAULT: `False`
`return_command`	If `True`, returns the process instead of `stdout` and `stderr`. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`tuple[str, str] \| Popen`	A tuple with `stdout` and `stderr`, or the process if `return_command` is `True`.
`tuple[str, str] \| Popen`	If `verbose` is `True`, `stderr` is an empty string.

RAISES	DESCRIPTION
`OSError`	If an error occurs while calling the binary.

find_binary ¶

find_binary(
    name,
    search_paths=(),
    executable=True,
    allow_dir=False,
    raise_error=False,
)

Locate the binary for a given program.

PARAMETER	DESCRIPTION
`name`	The name of the binary, either as a string or Path, or an iterable of strings or Paths with alternative names. TYPE: `str \| Path \| Iterable[str \| Path]`
`search_paths`	A list of additional paths to search, besides those in the environment variable `PATH`. TYPE: `list \| tuple` DEFAULT: `()`
`executable`	If `False`, does not fail when the binary is not executable. TYPE: `bool` DEFAULT: `True`
`allow_dir`	If `True`, allows the target to be a directory instead of a file. TYPE: `bool` DEFAULT: `False`
`raise_error`	If `True`, raises an error if the binary could not be found. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`str \| None`	The path to binary, or `None` if not found.

RAISES	DESCRIPTION
`SparvErrorMessage`	If `raise_error` is `True` and the binary could not be found.

rsync ¶

rsync(local, host, remote)

Transfer files and directories using rsync.

When syncing a directory, extraneous files in the destination directory are deleted, and it is always the contents of the source directory that are synced, not the directory itself (i.e. the rsync source directory is always suffixed with a slash).

PARAMETER	DESCRIPTION
`local`	The file or directory to transfer. TYPE: `str \| Path`
`host`	The remote host to transfer to. Set to `None` to transfer locally. TYPE: `str \| None`
`remote`	The path to the target file or directory. TYPE: `str \| Path`

remove_path ¶

remove_path(path, host=None)

Remove a file or directory, either locally or remotely.

PARAMETER	DESCRIPTION
`path`	The file or directory to remove. TYPE: `str \| Path`
`host`	The remote host to remove from. Leave as `None` to remove locally. TYPE: `str \| None` DEFAULT: `None`

gpus ¶

gpus(reorder=True)

Get a list of available GPUs, sorted by free memory in descending order.

Only works for NVIDIA GPUs, and requires the nvidia-smi utility to be installed.

If reorder is True (default), the GPUs are renumbered according to the order specified in the environment variable CUDA_VISIBLE_DEVICES. For example, if CUDA_VISIBLE_DEVICES=1,0, and the GPUs with most free memory are 0, 1, the function will return [1, 0].

This is needed for PyTorch, which uses the GPU indices as specified in CUDA_VISIBLE_DEVICES, not the actual GPU indices. In the example above, PyTorch would consider GPU 1 as GPU 0 and GPU 0 as GPU 1.

PARAMETER	DESCRIPTION
`reorder`	Whether to renumber the GPUs according to the order in the environment variable `CUDA_VISIBLE_DEVICES`. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`list[int] \| None`	A list of GPU indices, or None if no GPUs are available or if the nvidia-smi command failed.

call_svn ¶

call_svn(command, *args)

Call an SVN command.

Will try to authenticate with SVN_USERNAME and SVN_PASSWORD environment variables if set.

PARAMETER	DESCRIPTION
`command`	The SVN command to call. TYPE: `str`
`*args`	Additional arguments for the SVN command. TYPE: `str` DEFAULT: `()`

RETURNS	DESCRIPTION
`int`	The return code from the command. TYPE: `int`

RAISES	DESCRIPTION
`SparvErrorMessage`	If the SVN command fails.
`FileNotFoundError`	If the file is not found in SVN.

Tag Sets¶

The sparv.api.util.tagsets subpackage includes modules with functions and objects for tag set conversions.

Join a complex tag into a string.

The tag can be a dict {"pos": pos, "msd": msd} or a tuple (pos, msd).

PARAMETER	DESCRIPTION
`tag`	A tag to join. TYPE: `dict \| tuple`
`sep`	A separator between parts of the tag in the result. TYPE: `str` DEFAULT: `TAGSEP`

RETURNS	DESCRIPTION
`str`	The joined tag.

tagmappings.join_tag()¶

Convert a complex SUC or SALDO tag record into a string.

Parameters:

tag: The tag to convert, which can be a dictionary ({'pos': pos, 'msd': msd}) or a tuple ((pos, msd)).
sep: The separator to use. Default: "."

tagmappings.mappings¶

Mappings of part-of-speech tags between different tag sets.

pos_to_upos()¶

Map part-of-speech tags to Universal Dependency part-of-speech tags. This function only works if there is a conversion function in util.tagsets.pos_to_upos for the specified language and tag set.

Parameters:

pos: The part-of-speech tag to convert.
lang: The language code.
tagset: The name of the tag set to which pos belongs.

tagmappings.split_tag()¶

Split a SUC or Saldo tag string ('X.Y.Z') into a tuple ('X', 'Y.Z'), where 'X' is the part of speech and 'Y', 'Z', etc., are morphological features (i.e., MSD tags).

Parameters:

tag: The tag string to split into a tuple.
sep: The separator to split on. Default: "."

suc_to_feats()¶

Convert SUC MSD tags into a UCoNNL feature list (universal morphological features). Returns a list of universal features.

Parameters:

pos: The SUC part-of-speech tag.
msd: The SUC MSD tag.
delim: The delimiter separating the features in msd. Default: "."

tagmappings.tags¶

Different sets of part-of-speech tags.

Miscellaneous Utils¶

sparv.api.util.misc provides miscellaneous util functions.

PickledLexicon ¶

PickledLexicon(picklefile, verbose=True)

A class for reading a basic pickled lexicon and looking up keys.

PARAMETER	DESCRIPTION
`picklefile`	A `pathlib.Path` or `Model` object pointing to the pickled lexicon. TYPE: `Path \| Model`
`verbose`	Whether to log status updates while reading the lexicon. TYPE: `bool` DEFAULT: `True`

lookup ¶

lookup(key, default=None)

Lookup a key in the lexicon.

PARAMETER	DESCRIPTION
`key`	The key to look up. TYPE: `Any`
`default`	The default value to return if the key is not found. TYPE: `Any` DEFAULT: `None`

RETURNS	DESCRIPTION
`Any`	The value for the key, or the default value if the key is not found.

dump_yaml ¶

dump_yaml(
    data, resolve_alias=False, sort_keys=False, indent=2
)

Convert a dictionary to a YAML formatted string.

PARAMETER	DESCRIPTION
`data`	The dictionary to be converted. TYPE: `dict`
`resolve_alias`	Whether to replace aliases with their anchor's content. TYPE: `bool` DEFAULT: `False`
`sort_keys`	Whether to sort the keys alphabetically. TYPE: `bool` DEFAULT: `False`
`indent`	The number of spaces to use for indentation. TYPE: `int` DEFAULT: `2`

RETURNS	DESCRIPTION
`str`	The YAML document as a string.

cwbset ¶

cwbset(
    values,
    delimiter="|",
    affix="|",
    sort=False,
    maxlength=4095,
    encoding="UTF-8",
)

Take an iterable with strings and return a set in the format used by Corpus Workbench.

PARAMETER	DESCRIPTION
`values`	An iterable containing string values. TYPE: `Iterable[str]`
`delimiter`	The delimiter to be used between the values. TYPE: `str` DEFAULT: `'\|'`
`affix`	The affix enclosing the resulting string. TYPE: `str` DEFAULT: `'\|'`
`sort`	Whether to sort the values before joining them. TYPE: `bool` DEFAULT: `False`
`maxlength`	Maximum length of the resulting string. TYPE: `int` DEFAULT: `4095`
`encoding`	Encoding to use when calculating the length of the string. TYPE: `str` DEFAULT: `'UTF-8'`

RETURNS	DESCRIPTION
`str`	The joined string.

set_to_list ¶

set_to_list(setstring, delimiter='|', affix='|')

Convert a set-formatted string into a list.

PARAMETER	DESCRIPTION
`setstring`	The string to convert into a list. The string should be enclosed with `affix` characters and have elements separated by `delimiter`. TYPE: `str`
`delimiter`	The character used to separate elements in `setstring`. TYPE: `str` DEFAULT: `'\|'`
`affix`	The character that encloses `setstring`. TYPE: `str` DEFAULT: `'\|'`

RETURNS	DESCRIPTION
`list[str]`	A list of strings.

remove_control_characters ¶

remove_control_characters(text, keep=('\n', '\t', '\r'))

Remove control characters from the given text, except for those specified in keep.

The characters removed are those with the Unicode category "Cc" (control characters). https://www.unicode.org/reports/tr44/#GC_Values_Table

PARAMETER	DESCRIPTION
`text`	The string from which to remove control characters. TYPE: `str`
`keep`	An iterable of characters to keep. Default is newline, tab, and carriage return. TYPE: `Iterable[str]` DEFAULT: `('\n', '\t', '\r')`

RETURNS	DESCRIPTION
`str`	The text with control characters removed.

remove_formatting_characters ¶

remove_formatting_characters(text, keep=())

Remove formatting characters from the given text, except for those specified in 'keep'.

The characters removed are those with the Unicode category "Cf" (formatting characters). https://www.unicode.org/reports/tr44/#GC_Values_Table

PARAMETER	DESCRIPTION
`text`	The text from which to remove formatting characters. TYPE: `str`
`keep`	An iterable of characters to keep. TYPE: `Iterable[str]` DEFAULT: `()`

RETURNS	DESCRIPTION
`str`	The text with formatting characters removed.

remove_unassigned_characters ¶

remove_unassigned_characters(text, keep=())

Remove unassigned characters from the given text, except for those specified in 'keep'.

The characters removed are those with the Unicode category "Cn" (unassigned characters). https://www.unicode.org/reports/tr44/#GC_Values_Table

PARAMETER	DESCRIPTION
`text`	The text from which to remove unassigned characters. TYPE: `str`
`keep`	An iterable of characters to keep. TYPE: `Iterable[str]` DEFAULT: `()`

RETURNS	DESCRIPTION
`str`	The text with unassigned characters removed.

test_lexicon ¶

test_lexicon(lexicon, testwords)

Test the validity of a lexicon by checking if specific test words are present as keys.

This function takes a dictionary (lexicon) and a list of test words, printing the value associated with each test word.

PARAMETER	DESCRIPTION
`lexicon`	A dictionary representing the lexicon. TYPE: `dict`
`testwords`	An iterable of strings, each expected to be a key in the lexicon. TYPE: `Iterable[str]`

get_language_name_by_part3 ¶

get_language_name_by_part3(part3)

Return language name in English given an ISO 639-3 code.

PARAMETER	DESCRIPTION
`part3`	ISO 639-3 code. TYPE: `str`

RETURNS	DESCRIPTION
`str \| None`	Language name in English.

get_language_part1_by_part3 ¶

get_language_part1_by_part3(part3)

Return ISO 639-1 code given an ISO 639-3 code.

PARAMETER	DESCRIPTION
`part3`	ISO 639-3 code. TYPE: `str`

RETURNS	DESCRIPTION
`str \| None`	ISO 639-1 code.

parse_annotation_list ¶

parse_annotation_list(
    annotation_names,
    all_annotations=None,
    add_plain_annotations=True,
)

Take a list of annotation names and possible export names, and return a list of tuples.

Each item in the list is split into a tuple by the string ' as '. Each tuple will contain two elements. If ' as ' is not present in the string, the second element will be None.

If the list of annotation names includes the element '...', all annotations from all_annotations will be included in the result, except those explicitly excluded in the list of annotations by being prefixed with 'not '.

If an annotation occurs more than once in the list, only the last occurrence will be kept. Similarly, if an annotation is first included and then excluded (using 'not') it will be excluded from the result.

If a plain annotation (without attributes) is excluded, all its attributes will be excluded as well.

Plain annotations (without attributes) will be added if needed, unless add_plain_annotations is set to False. Make sure to disable add_plain_annotations if the annotation names may include classes or config variables.

PARAMETER	DESCRIPTION
`annotation_names`	A list of annotation names. TYPE: `Iterable[str] \| None`
`all_annotations`	A list of all possible annotations. TYPE: `Iterable[str] \| None` DEFAULT: `None`
`add_plain_annotations`	If `True`, plain annotations (without attributes) will be added if needed. Set to `False` if annotation names may include classes or config variables. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`list[tuple[str, str \| None]]`	A list of tuples with annotation names and export names.

Error Messages and Logging¶

The SparvErrorMessage exception and get_logger function are essential components of the Sparv pipeline. Unlike other utilities mentioned on this page, they are located directly under sparv.api.

SparvErrorMessage¶

This exception class is used to halt the pipeline, while notifying users of errors in a user-friendly manner without displaying a traceback. Its usage is detailed in the Writing Sparv Plugins section.

Note

When raising this exception in a Sparv module, only the message argument should be used.

PARAMETER	DESCRIPTION
`message`	User-friendly error message to display. TYPE: `str`
`module`	The name of the module where the error occurred (optional, not used in Sparv modules). TYPE: `str` DEFAULT: `''`
`function`	The name of the function where the error occurred (optional, not used in Sparv modules). TYPE: `str` DEFAULT: `''`

get_logger¶

This function retrieves a logger that is a child of sparv.modules. Its usage is explained in the Writing Sparv Plugins section.

PARAMETER	DESCRIPTION
`name`	The name of the current module (usually `__name__`). TYPE: `str`

RETURNS	DESCRIPTION
`Logger`	Logger object.