Skip to content

Use Sparv as a Library

Sparv is normally used as a command-line tool, but it can also be utilized as a library in Python. This is useful if you want to use Sparv's functionality in your own Python code, or if you want to use Sparv in a Jupyter notebook.

Note

This feature is still experimental and the API may change in future versions.

Usage

The API is very simple, with only one method: call. This method takes one argument, a list of strings, which are the same as the arguments you would use for the Sparv CLI. E.g. ["run", "xml_export:pretty", "--log"]. The method returns a context manager that can be used to either iterate over the output messages from Sparv, or simply wait for the pipeline to finish.

Just like when using Sparv from the command line, you need to have a corpus configuration file in the current working directory, or use the --dir argument to specify a directory containing a corpus configuration file.

Example:

import sparv

# Option 1: Iterate over log messages
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
    for log_message in sparv_call:
        print(log_message)

# Option 2: Just wait for the pipeline to finish
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
    sparv_call.wait()

Output

The output messages generated by Sparv when used as a library are the same as the messages displayed in the terminal when running Sparv as a CLI tool, except that log messages are provided in JSON format. Other outputs, such as the lists generated by various --list commands, are provided as plain text.

Waiting for the Pipeline to Finish

The context manager returned by call will always wait for the pipeline to finish before exiting. However, you can also use the wait method to wait for the pipeline to finish, without iterating over the output messages. This can be useful for clarity.

Checking for Success

To check whether the pipeline finished successfully, you can use the get_return_value method on the context manager. The return value is True if the pipeline finished successfully, and False if it failed. This may only be called after the pipeline has finished.

Additionally, the wait method also returns the return value of the Sparv process, providing a convenient way to check for success after waiting for the pipeline to finish.

Example with success check:

import sparv

# Option 1: Iterate over log messages and check for success
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
    for log_message in sparv_call:
        print(log_message)
    success = sparv_call.get_return_value()
    if success:
        print("Pipeline finished successfully")
    else:
        print("Pipeline failed")

# Option 2: Wait for the pipeline to finish and check for success
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
    success = sparv_call.wait()
    if success:
        print("Pipeline finished successfully")
    else:
        print("Pipeline failed")