Use Sparv as a Library¶
Sparv is normally used as a command-line tool, but it can also be utilized as a library in Python. This is useful if you want to use Sparv's functionality in your own Python code, or if you want to use Sparv in a Jupyter notebook.
Note
This feature is still experimental and the API may change in future versions.
Usage¶
The API is very simple, with only one method: call
. This method takes one argument, a list of strings, which are the
same as the arguments you would use for the Sparv CLI. E.g. ["run", "xml_export:pretty", "--log"]
. The method returns
a context manager that can be used to either iterate over the output messages from Sparv, or simply wait for the
pipeline to finish.
Just like when using Sparv from the command line, you need to have a corpus configuration file in the current working
directory, or use the --dir
argument to specify a directory containing a corpus configuration file.
Example:
import sparv
# Option 1: Iterate over log messages
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
for log_message in sparv_call:
print(log_message)
# Option 2: Just wait for the pipeline to finish
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
sparv_call.wait()
Output¶
The output messages generated by Sparv when used as a library are the same as the messages displayed in the terminal
when running Sparv as a CLI tool, except that log messages are provided in JSON format. Other outputs, such as the lists
generated by various --list
commands, are provided as plain text.
Waiting for the Pipeline to Finish¶
The context manager returned by call
will always wait for the pipeline to finish before exiting. However, you can
also use the wait
method to wait for the pipeline to finish, without iterating over the output messages. This can be
useful for clarity.
Checking for Success¶
To check whether the pipeline finished successfully, you can use the get_return_value
method on the context manager.
The return value is True
if the pipeline finished successfully, and False
if it failed. This may only be called
after the pipeline has finished.
Additionally, the wait
method also returns the return value of the Sparv process, providing a convenient way to check
for success after waiting for the pipeline to finish.
Example with success check:
import sparv
# Option 1: Iterate over log messages and check for success
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
for log_message in sparv_call:
print(log_message)
success = sparv_call.get_return_value()
if success:
print("Pipeline finished successfully")
else:
print("Pipeline failed")
# Option 2: Wait for the pipeline to finish and check for success
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
success = sparv_call.wait()
if success:
print("Pipeline finished successfully")
else:
print("Pipeline failed")