What is Mink?
Språkbanken Text is a research infrastructure for language data. We provide digital text data suitable for research, and we develop analysis tools based on language technology.
With Mink, you can submit your own text data directly into our tools.
Use Mink at spraakbanken.gu.se/mink
Who can use Mink?
Anyone with an eduGAIN account can use that to login to Mink. This includes most people associated with a university or other academic institution.
Other users can create an account at eduID which, itself, is connected to eduGAIN.
We are working toward offering a demo version of Mink which will have some limitations, but will be available without having to log in.
What can Mink do?
This first version of Mink targets a specific workflow:
- Create a corpus of uploaded text (or speech audio) files
- Run automatic annotation
- Use the results in Korp or in Strix, or as XML/CSV files
Supported formats for text files are:
- plain text (.txt)
- XML
- Microsoft Word (.docx)
- Open Document (.odt)
Additionally, you can upload audio files and have them transcribed with automatic speech recognition (ASR):
- WAV
- MP3
- Ogg
The annotation pipeline includes:
- Part-of-speech tags (POS)
- Base form (lemma)
- Morphosyntactic tags (MSD)
- Dependencies
- Sentiment labels
- Named entity recognition
Upcoming features
Some future development goals for Mink are extended annotation settings, sharing and publishing, and workflows for other types of language data, such as lexicons.