How native and non-native speakers talk to each other

We at Språkbanken Text have just released a new corpus of native (L1) and non-native (L2) speech in four languages: English, Spanish, French and Italian. The corpus contains more than 170 million words produced by more than 97 thousand speakers (size varies a lot across the four languages, though). The corpus has been created by scraping WordReference forums, where users discuss various questions about languages. Importantly, every user has to provide their native language, and this information, alongside with the nickname, is …

Argumentation Mining

What if you could find all arguments in a text without having to read it? Or, what if you could search a database for a controversial topic and immediately get arguments for and against it, gathered from text all around the internet? Or, imagine when writing an essay you would automatically get an estimation of how persuasive your arguments are. Scenarios such as these could be possible with techniques developed in the field of argumentation mining. The aim of this relatively new …

The Kubhist corpus of Swedish newspapers

Among the flurry of Språkbanken’s historical resources we find the Kubhist corpus – a diachronic collection of historical newspaper texts – in two versions: Kubhist 1 spanning the time period of 1750–1950, and Kubhist 2 spanning the time period of 1645–1926. Historical corpora of this kind, especially when available in searchable format, are valuable sources of information for learning about our history, language and culture. These are especially appealing for researchers coming from the digital humanities who study history, literature, linguistics, sociology …