Hoppa till huvudinnehåll

Blog posts

Cassandra: a toolset for analyzing and visualizing language change

Within the Cassandra project we are using Korp to analyze numerous instances of language change: not one, not two, but dozens (and in the future, potentially hundreds). At this scale, it is impossible to perform searches (and process their results) manually. Fortunately, Korp has an API that makes an automatization of this process possible.

Documentation: a (fictional) sad story with a (real) happy ending

This post is based on joint work with Gerlof Bouma. Illustrations by Jan and Julija.

Here's a sad story (it's fictional, but sad nonetheless).

How native and non-native speakers talk to each other

We at Språkbanken Text have just released a new corpus of native (L1) and non-native (L2) speech in four languages: English, Spanish, French and Italian. The corpus contains more than 170 million words produced by more than 97 thousand speakers (size varies a lot across the four languages, though).

The five lives of Talbanken

This post is about Talbanken, one of the most widely used and important Swedish corpora. There exist at least five versions of this treebank, and the purpose of this post is to reduce ambiguity of the name "Talbanken", which sometimes leads to confusion. I am going to list the five versions, explain the basic differences between them and suggest unambiguous version names.

Grym och häftig ordförändring

Teckning

 

Ord kan förändra sina betydelser. Man behöver inte en doktorgrad i språkvetenskap för att upptäcka att grym i (1) betyder inte samma sak som i (2).