Skip to content

Installation and Setup

This section walks you through setting up Sparv on your computer, including any additional software you may need to fully utilize all of Sparv's analysis features.

Prerequisites

To install Sparv, you'll need a Unix-like environment (e.g. Linux, macOS or Windows Subsystem for Linux) with Python 3.10 or later.

Note

While most Sparv features may work in a Windows environment, Sparv is not regularly tested on Windows, so compatibility is not guaranteed. Feel free to report any issues you encounter.

Installing Sparv

Sparv is available on PyPI and can be installed via pip or pipx. We recommend using pipx, as it installs Sparv in an isolated environment but allows it to be run from any location.

Begin by installing pipx:

python3 -m pip install --user pipx
python3 -m pipx ensurepath

Then, install Sparv:

pipx install sparv

To verify that Sparv was installed successfully, run the command sparv. You should see the Sparv help information displayed.

Note

If pipx stops working after a Python upgrade, try running pipx reinstall-all. If that fails, you may need to manually delete pipx's local environment directory (usually ~/.local/pipx) and reinstall Sparv.

Setting Up Sparv

Sparv Data Directory

Sparv requires a dedicated directory to store language models and configuration files. This is called the Sparv data directory. Run sparv setup to choose this directory, which will also populate it with default configurations and presets.

Tip

For a non-interactive setup (e.g. in a Docker container), you can use the --dir flag to specify the data directory path and perform the setup in one command:

sparv setup --dir /path/to/sparv-data

Tip

Instead of (or in addition to) setting the data directory path using sparv setup, you can use the environment variable SPARV_DATADIR. This overrides any path you may have previously configured using the setup process. This is useful if you want to have multiple Sparv installations with different data directories on the same machine. Note that you still have to run the setup command at least once to populate the selected directory, even when using the environment variable.

Optional: Pre-build Models

Sparv will automatically download and build the models needed for the analyses you want to perform. Optionally, you can also pre-build the models to speed up the annotation of your first corpus. This step is not required, and unless you have a specific reason to do so, we recommend skipping it, as it may download models that you won't use.

To pre-build the models, use the following command:

sparv build-models --all

If you run this command in a directory without a corpus config, you need to specify the language for which the models should be built. Use the --language flag followed by the three-letter language code (you can use the sparv languages command to see a list of available languages and their codes). For example, to build all Swedish models, run:

sparv build-models --all --language swe

Installing Additional Third-party Software

Sparv can be used together with several plugins and third-party software. The installation of the software listed below is optional and depends on the analyses you wish to perform with Sparv. Please note that different licenses may apply to different software.

Unless otherwise specified in the instructions, you won’t need to download any additional language models. If the software is installed correctly, Sparv will automatically download and install the necessary model files for you.

Sparv wsd

Purpose Swedish word-sense disambiguation. Recommended for standard Swedish annotations.
Download Sparv wsd
License MIT
Dependencies Java

Sparv wsd is developed by Språkbanken Text and is licensed under the same terms as Sparv. To use it within Sparv, simply download the saldowsd.jar file from the provided GitHub link and place it in the bin/wsd directory inside your Sparv data directory.

hfst-SweNER

Purpose Swedish named-entity recognition. Recommended for standard Swedish annotations.
Download hfst-SweNER
Version compatible with Sparv 0.9.3

Note

hfst-SweNER requires a Unix-like environment.

The current version of hfst-SweNER is written for Python 2, while Sparv uses Python 3. Therefore, it needs to be patched before installation. After extracting the archive, navigate to the hfst-swener-0.9.3/scripts directory and create a file named swener.patch with the following contents:

--- convert-namex-tags.py
+++ convert-namex-tags.py
@@ -1 +1 @@
-#! /usr/bin/env python
+#! /usr/bin/env python3
@@ -34 +34 @@
-        elif isinstance(files, basestring):
+        elif isinstance(files, str):
@@ -73 +73 @@
-        return [s[start:start+partlen] for start in xrange(0, len(s), partlen)]
+        return [s[start:start+partlen] for start in range(0, len(s), partlen)]
@@ -132,3 +131,0 @@
-    sys.stdin = codecs.getreader('utf-8')(sys.stdin)
-    sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
-    sys.stderr = codecs.getwriter('utf-8')(sys.stderr)

Run the following command to apply the patch:

patch < swener.patch

After applying the patch, follow the installation instructions provided by hfst-SweNER.

Hunpos

Purpose Alternative Swedish part-of-speech tagger (if you prefer not to use Stanza)
Download Hunpos on Google Code
License BSD-3-Clause
Version compatible with Sparv latest (1.0)

To install Hunpos, unpack the downloaded files and add the executables to your system path (you will need at least hunpos-tag). Alternatively, you can place the binaries inside the bin directory of your Sparv data directory.

If you are using a 64-bit operating system, you might need to install 32-bit compatibility libraries if Hunpos does not run:

sudo apt install lib32z1

For newer macOS versions, you may need to compile Hunpos from source. Instructions can be found in this GitHub repository.

When using Sparv with Hunpos on Windows, set the configuration variable hunpos.binary: hunpos-tag.exe in your corpus configuration. Additionally, ensure the cygwin1.dll file that comes with Hunpos is in your system path or copied into your bin directory within the Sparv data directory along with the Hunpos binaries.

MaltParser

Purpose Alternative Swedish dependency parser (if you prefer not to use Stanza)
Download MaltParser webpage
License MaltParser license (open source)
Version compatible with Sparv 1.7.2
Dependencies Java

Download and unpack the zip file from the MaltParser webpage and place the maltparser-1.7.2 directory inside the bin directory of your Sparv data directory.

Corpus Workbench

Purpose Creating Corpus Workbench binary files. Required if you want to search corpora using this tool.
Download Corpus Workbench on SourceForge
License GPL-3.0
Version compatible with Sparv beta 3.4.21 (likely compatible with newer versions)

Refer to the INSTALL text file for detailed instructions on how to build and install Corpus Workbench on your system.

Analyzing Languages Other Than Swedish

Sparv supports the analysis of corpora in multiple languages using various third-party tools. Below is a list of supported languages, their ISO 639-3 codes, and the tools Sparv can use for their analysis:

Language ISO 639-3 Code Analysis Tool
Asturian ast FreeLing
Bulgarian bul TreeTagger
Catalan cat FreeLing
Dutch nld TreeTagger
Estonian est TreeTagger
English eng FreeLing, Stanford Parser, TreeTagger
French fra FreeLing, TreeTagger
Finnish fin TreeTagger
Galician glg FreeLing
German deu FreeLing, TreeTagger
Italian ita FreeLing, TreeTagger
Latin lat TreeTagger
Norwegian Bokmål nob FreeLing
Polish pol TreeTagger
Portuguese por FreeLing
Romanian ron TreeTagger
Russian rus FreeLing, TreeTagger
Slovak slk TreeTagger
Slovenian slv FreeLing
Spanish spa FreeLing, TreeTagger
Swedish swe Sparv

TreeTagger

Purpose POS-tagging and lemmatization for various languages
Download TreeTagger webpage
License TreeTagger license (freely available for research, education, and evaluation)
Version compatible with Sparv 3.2.3 (may work with newer versions)

After downloading TreeTagger, ensure the tree-tagger binary is in your system path. Alternatively, you can place the tree-tagger binary in the bin directory within your Sparv data directory.

Stanford Parser

Purpose Various analyses for English
Download Stanford CoreNLP webpage
License GPL-2.0
Version compatible with Sparv 4.0.0 (may work with newer versions)
Dependencies Java

To use the Stanford Parser with Sparv, download and unzip the package from the Stanford CoreNLP webpage. Place the contents in the bin/stanford_parser directory within your Sparv data directory.

FreeLing

Purpose Tokenization, POS-tagging, lemmatization and named entity recognition for various languages
Download FreeLing on GitHub
License AGPL-3.0
Version compatible with Sparv 4.2

To install FreeLing, follow the instructions provided on their website. Ensure you download both the source and language data files and uncompress them in the same directory before compiling. Additionally, you will need to install the sparv-sbx-freeling plugin. Follow the setup instructions on the plugin's GitHub page to correctly configure it for use with Sparv.

Installing and Uninstalling Plugins

Sparv plugins are managed using the sparv plugins command. This command allows you to install, uninstall, and list plugins. Under the hood, plugins are standard Python packages, so Sparv relies on pip to handle installations. This means you can install plugins from any source supported by pip, such as PyPI, remote repositories, or local directories.

Installing Plugins

To install a plugin, use the following command:

sparv plugins install [plugin-source]

The [plugin-source] can refer to different locations:

Install from PyPI

To install a plugin published on the Python Package Index (PyPI), use its name, e.g.:

sparv plugins install sparv-sbx-uppercase

Install from a Remote Repository

To install a plugin from a remote repository (e.g., GitHub), provide the repository URL or archive link:

sparv plugins install https://github.com/spraakbanken/sparv-plugin-template/archive/main.zip

Install from a Local Directory

To install a plugin from a local directory, use the path to the directory:

sparv plugins install ./sparv-sbx-uppercase

Using the -e flag when installing from a local directory will install the plugin in editable mode, meaning that changes to the plugin code will immediately be available to Sparv without having to reinstall the plugin:

sparv plugins install -e ./sparv-sbx-uppercase

Listing Installed Plugins

To view all installed plugins, run:

sparv plugins list

Uninstalling Plugins

To uninstall a plugin, use the following command:

sparv plugins uninstall [plugin-name]

The [plugin-name] can be either the distribution name (e.g. sparv-sbx-uppercase), or the plugin name used within Sparv (e.g. sbx_uppercase).

For example:

sparv plugins uninstall sbx_uppercase

Uninstalling Sparv

To uninstall Sparv completely, follow these steps:

  1. Run sparv setup --reset to unset Sparv's data directory. The directory itself will not be removed, but its location (if available) will be printed.
  2. Manually delete the data directory.
  3. Run one of the following commands, depending on whether you installed Sparv using pipx or pip:

    pipx uninstall sparv
    
    pip uninstall sparv