New Mink features: sharing, analyses, audio and CoNLL-U

Inlagt av Arild Matsson 2025-12-15

Last week, we released a few new features for Mink, Språkbanken's data platform!

With mink-frontend 1.15.0 and mink-backend 2.1.0, we have added multi-player support and granular analysis selection, and you can now use audio as well as CoNLL-U as sources. Read on to understand more about these new abilities.

Sharing

You can now share a Mink corpus with your team mates. You can choose whether they should be able to edit and process the corpus (WRITE), or just inspect it and explore the result (READ).

Log into Mink and create or select a corpus
Find the new Sharing panel on the corpus overview page
Click the Manage access button to open the resource in the SB Auth authentication tool
Click Add user and enter the email of the person you want to invite, as well as the access level you want to grant them
Click Send invite to create the invitation. However, we cannot currently send automated emails, so the resulting page will show you a generated invitation URL that you are asked to copy and send youself (in private!)
The recipient can visit the URL to activate the invitation and then find the shared resource in Mink

Note that the READ permission is enough to view the corpus also in Korp or Strix if you install it there.

Screenshot from the resource invite page of SB Auth — The resource invite page of SB Auth

Individual analyses

In the corpus configuration, you can choose which analyses Mink should apply when processing the corpus. You could previously only select from a few pre-defined groupings of annotations: Morphology, Readability, etc. Now, you can instead select and deselect specific analyses.

Mink corpus analysis config — The new analysis selection page in Mink

If you are unsure about what to select, just select all of them. Or all except the slowest ones, Geotagging and Named entity recognition, which are deselected by default.

Analysis or annotation? In our terminology, an analysis is a programmed functionality that processes the corpus data in some way – typically by reading the text and perhaps some annotations, to produce new annotations and add them to the result.

Audio source

Utilizing the KB-Whisper model, Mink now accepts audio files as corpus sources. When processing an audio corpus, the Whisper model is first applied to generate a text transcription, and then further analysis is performed on that text.

The Sparv plugin used for this is sbx-swe-speech2text-transformers-kb_whisper_wav (or mp3 or ogg)

To see the generated plain-text representation in Mink, make sure you have first run the annotation processing, and then click the filename in the Source files panel.

Mink audio source view with plain text — After processing, the plain text can be viewed and downloaded on the source file page in Mink

CoNLL-U source

The CoNLL-U file format, used primarily in the Universal Dependencies project, has one token per line and morphosyntactic attributes in columns. A new importer plugin sbx-mul-import-sparv-conllu allows us to use CoNLL-U files as source for Mink corpora.

Note that some of the analyses available in Mink produce annotations similar to those that can be present in CoNLL-U source files. See the Configuration section in the plugin README.

New Mink features: sharing, analyses, audio and CoNLL-U

Sharing

Individual analyses

Audio source

CoNLL-U source

Etiketter