
I have always loved occupational novels, that is, books where a specific job or workplace is central to the plot and themes. In a good occupational novel, the author has a deep inside knowledge about the profession, and generously shares this knowledge with the readers. Arthur Hailey, for instance, was not the greatest literary genius of all times, but I still read everything he wrote.
Nobody has yet written a novel about Språkbanken, but I believe this day will come. For future authors, here is some highly secret inside information about what and how we actually are doing in order to build and maintain the infrastructure supporting research based on language data.
Since 2025, our infrastructure work is happening in three groups: the platform group (responsible for the development of our research platforms), the outreach-and-collaboration group (responsible for our contact with the outside world) and the analysis group (responsible for the reference data and the analyses used in our infrastructure; the name was supposed to be "Data and analysis", but it got irreversibly shortened at some point). Put very roughly, the platform group creates the machinery (e.g., Korp), the analysis group creates the contents (e.g. the annotated corpora in Korp) and the outreach group tells the world about both.
The analysis group's results are perhaps least visible for the general public, which is why this chapter of our novel is about the analysis group and what it did in spring 2025.
So, in spring 2025 we...
- ...developed a pseudonymization tool for Swedish
- ...developed a frame-semantic parser for Swedish
- ...developed an aspect-based stance analyzer for Swedish
- ...partly annotated a gold-standard treebank of learner Swedish (to be released soon)
- ...created a dataset of Swedish given names where different variants of the same name are linked to each other (to be released soon)
- ...partly converted the Eukalyptus treebank to the Universal Dependencies
- ...created a corpus of texts about segregation (from the Riksdag and the Gothenburg City Council)
- ...and did some more work that is still in too preliminary stages to be reported.
All of the tools we developed are available as plugins for our analysis platform Sparv. A plugin can be used to perform a specific analysis (on top of the standard set of Sparv analyses or independently of them).
Page-turner already, isn't it? Stay tuned!