Use Sparv as a Library¶
Sparv is normally used as a command-line tool, but it can also be utilized as a library in Python. This is useful if you want to use Sparv's functionality in your own Python code, or if you want to use Sparv in a Jupyter notebook.
Note
This feature is still experimental and the API may change in future versions.
Usage¶
The API is very simple, with only one method: call. This method takes one argument, a list of strings, which are the
same as the arguments you would use for the Sparv CLI. E.g. ["run", "xml_export:pretty", "--log"]. The method returns
a context manager that can be used to either iterate over the output messages from Sparv, or simply wait for the
pipeline to finish.
Just like when using Sparv from the command line, you need to have a corpus configuration file in the current working
directory, or use the --dir argument to specify a directory containing a corpus configuration file.
Example:
import sparv
# Option 1: Iterate over log messages
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
for log_message in sparv_call:
print(log_message)
# Option 2: Just wait for the pipeline to finish
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
sparv_call.wait()
Output¶
The output messages generated by Sparv when used as a library are the same as the messages displayed in the terminal
when running Sparv as a CLI tool, except that log messages are provided in JSON format. Other outputs, such as the lists
generated by various --list commands, are provided as plain text.
Waiting for the Pipeline to Finish¶
The context manager returned by call will always wait for the pipeline to finish before exiting. However, you can
also use the wait method to wait for the pipeline to finish, without iterating over the output messages. This can be
useful for clarity.
Checking for Success¶
To check whether the pipeline finished successfully, you can use the get_return_value method on the context manager.
The return value is True if the pipeline finished successfully, and False if it failed. This may only be called
after the pipeline has finished.
Additionally, the wait method also returns the return value of the Sparv process, providing a convenient way to check
for success after waiting for the pipeline to finish.
Example with success check:
import sparv
# Option 1: Iterate over log messages and check for success
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
for log_message in sparv_call:
print(log_message)
success = sparv_call.get_return_value()
if success:
print("Pipeline finished successfully")
else:
print("Pipeline failed")
# Option 2: Wait for the pipeline to finish and check for success
with sparv.call(["run", "xml_export:pretty", "--log"]) as sparv_call:
success = sparv_call.wait()
if success:
print("Pipeline finished successfully")
else:
print("Pipeline failed")