Skip to main content

Mink

What is Mink?

Språkbanken Text is a research infrastructure for language data. We provide digital text data suitable for research, and we develop analysis tools based on language technology.

With Mink, you can submit your own text data directly into our tools.

Use Mink at spraakbanken.gu.se/mink

Who can use Mink?

Anyone with an eduGAIN account can use that to login to Mink. This includes most people associated with a university or other academic institution.

Other users can create an account at eduID which, itself, is connected to eduGAIN.

We are working toward offering a demo version of Mink which will have some limitations, but will be available without having to log in.

What can Mink do?

This first version of Mink targets a specific workflow:

  1. Create a corpus of uploaded text (or speech audio) files
  2. Run automatic annotation
  3. Use the results in Korp or in Strix, or as XML/CSV files

Supported formats for text files are:

  • plain text (.txt)
  • XML
  • Microsoft Word (.docx)
  • Open Document (.odt)
  • PDF

Additionally, you can upload audio files and have them transcribed with automatic speech recognition (ASR):

  • WAV
  • MP3
  • Ogg

The annotation pipeline includes:

  • Part-of-speech tags (POS)
  • Base form (lemma)
  • Morphosyntactic tags (MSD)
  • Dependencies
  • Sentiment labels
  • Named entity recognition

Upcoming features

Some future development goals for Mink are extended annotation settings, sharing and publishing, and workflows for other types of language data, such as lexicons.