Skip to main content

Mink

What is Mink?

Språkbanken Text is a research infrastructure for language data. We provide digital text data suitable for research, and we develop analysis tools based on language technology.

With Mink, you can submit your own text data directly into our tools.

Use Mink at spraakbanken.gu.se/mink

Who can use Mink?

Anyone with an eduGAIN account can use that to login to Mink. This includes most people associated with a university or other academic institution.

Other users can create an account at eduID which, itself, is connected to eduGAIN.

We are working toward offering a demo version of Mink which will have some limitations, but will be available without having to log in.

What can Mink do?

This first version of Mink targets a specific workflow:

  1. Create a corpus of uploaded text or speech files
  2. Run automatic annotation
  3. Use the results in Korp or in Strix, or as XML/CSV files
  4. Optionally share the corpus with peers

Supported input text formats are plain text, XML, Microsoft Word (.docx), Open Document (.odt), PDF and CoNLL-U.

Additionally, you can upload audio files (WAV, MP3 or Ogg) and have them transcribed with automatic speech recognition (ASR).

The annotation pipeline includes:

  • Part-of-speech tags (POS)
  • Base form (lemma)
  • Morphosyntactic tags (MSD)
  • Dependencies
  • Sentiment labels
  • Named entity recognition

Upcoming features

Some future development goals for Mink are extended annotation settings, publishing, and workflows for other types of language data, such as lexicons.