Menu

All news

Page 1
I förra blogginlägget fick vi läsa om korsord, en populär sysselsättning så här under sommarmånaderna.

En relaterad hobby om man vill vara lite mer social är förstås att spela Scrabble – även känt under det svenska namnet Alfapet, samt i olika digitala versioner, bland annat Wordfeud. För den som mot förmodan inte känner till spelet går det ut på att lägga ut bokstäver för att forma ord på en spelplan i ett korsordsliknande rutmönster. Olika bokstäver ger olika poäng, och vissa rutor …
Fortsätt läsa ”Vilket ord är bäst?”

Till Språkbanksbloggen

SuperLim

2021-06-28
Språkbanken Text is now releasing SuperLim 1.0, a benchmark suite that can be used to evaluate and analyse Swedish language models. The release is part of the SuperLim project, a collaboration with various stakeholders in the fields of language technology and artificial intelligence.

Large-scale Swedish language models have long been missing. Recently, several such models have been developed (in particular, by KBLab, the datalab at the National Library of Sweden), and more are on the way. Language models are being trained on enormous quantities of text in order to learn to understand language and analyze texts. The models can perform various tasks, for instance, summarise texts, measure how similar texts are, or analyse the attitudes expressed in parts of the text. Language models can improve all applications of language technology to Swedish texts, making them useful in research and in the private and public sectors.

“While it’s great that we finally have Swedish language models, it is difficult to evaluate them. We have therefore now developed SuperLim, a suite of 13 resources that can be used as benchmarks for various tasks. Anybody can use the suite to test models and assess how well they understand the language,” explains Aleksandrs Berdicevskis, one of the project members.

The SuperLim project – a collaboration between Språkbanken Text, KBLab, RISE Research Institutes of Sweden and AI Sweden – partly follows the approach of the English benchmark SuperGLUE.

One important element of evaluation is to identify whether the models have any bias, and if yes, to deal with them appropriately.

“Previous studies demonstrate that language models are sensitive to the data on which they are trained and often reflect human prejudices included in the training data. For example, it may pick up racism or assume that a doctor is a man and a nurse a woman. It is important to evaluate and improve language models so that we can counter the prejudices that get encoded into the models,” says project team member Yvonne Adesam.

Learn more on the project website >>

SuperLim 1.0 >>

Snart är det semester, och då är korsord en klassiker.

Särskilt nu i isoleringstider när vi ändå inte bör umgås, vad är då bättre än att sitta i hammocken med en välvässad blyertspenna, ett bra sudd, SAOL-appen, och ett korsord? Det finns många tidningar att köpa med korsord av olika svårighetsgrader, för dig som tycker om att lösa korsord. Men det är lite svårare om du skulle vilja tillverka ditt alldeles egna korsord. Tills nu – som ett led i Språkbankens service …
Fortsätt läsa ”Gör ditt eget korsord!”

Till Språkbanksbloggen
Human languages are constantly evolving, but what drives these changes? July will see the launch of the project Cassandra, in which a group of researchers from Språkbanken Text will attempt to predict language change.

Language and how it changes is a matter of interest to both researchers and the public. One recent example of how language changes is the Swedish word grym, defined by the Swedish Academy Dictionary as meaning 'ruthless' or 'brutal', which has taken on a new, almost more dominant, positive meaning ('awesome') that is far removed from its original meaning. Another example in Swedish is the increasing use of the object-form of personal pronouns in comparative constructions (starkare än dig 'stronger than you' instead of the starkare än du with the subject-form).

“We know that languages are constantly changing, whether because we simplify the language in everyday use or because it evolves as it comes into contact with other languages. Our aim is to explain changes that have already occurred and at the same time see if it is possible to predict what will happen to the language in future,” explains Cassandra's principal investigator Aleksandrs Berdicevskis, who goes on:

“This question has received little attention in the field of linguistics, but we need to try and answer it in order to assess how plausible the existing explanations actually are. In Cassandra, we will also study what has gone wrong.”

The group will use large collections of text known as corpora, both in its quantitative study and to evaluate existing explanatory models.

“We will be using the data contained in our corpora, which consist of 20 years of social media posts from platforms such as Flashback, Familjeliv and Twitter,” says Aleksandrs Berdicevskis, who explains that when studying material from the first 15 years the researchers will pretend that they have no knowledge of what happened in the following five years.

“We will gather all of the data and all of the theories and then predict what will happen to the language over the coming five years. The results will not be one hundred per cent accurate; not all changes are dependent on the language, they may relate to things happening in society that we now have many words for, such as the coronavirus pandemic. That is difficult to predict.

Our hope and objective is to be able to predict some changes and that we can formalise those predictions,” says Aleksandrs Berdicevskis, who adds that the group will also be looking at interaction on social media and which social factors influence language change.

The goal of the Cassandra project is to provide both theoretical results – new language resources in which corpuses are enriched with information on both language changes and social networks and their structures – and methods.

“Hopefully these methods will be useful to all researchers with an interest in societal changes,” says Aleksandrs Berdicevskis.

Learn more on the project website >>

Aleksandrs Berdicevskis, forskare vid Språkbanken Text
Aleksandrs (Sasha) Berdicevskis, principal investigator 
Photo: Sven Lindström
 
About Cassandra
Ongoing
July 1 2021 – June 30 2024
Project members
Aleksandrs Berdicevskis (PI)
Evie Coussé 
Yvonne Adesam
Nina Tahmasebi
Funding
Marcus and Amalia Wallenberg Foundation (grant ref. no. MAW 2020.0060)

 

Lövet
Nationella språkbanken bjuder in till höstworkshop måndag 18 oktober 2021 i Stockholm. Temat för årets workshop är historia, i vid bemärkelse. Vi kommer att ta upp hur språk- och talteknologi används för att belysa historiska perspektiv, oavsett om det är inom mer traditionella historiska frågeställningar, diakronisk språkvetenskap, etologi och talets utveckling, eller samhällsutveckling.

Läs mer på språkbanken.se

Språkbanken Text is organizing two of the workshops at NoDaLiDa 2021: Sustainable language representations for a changing world and NLP4CALL.

NLP4CALL
In the past 10 years, NLP4CALL workshop has been a meeting place for researchers and company representatives working on automatic solutions for language learning and for research on language learning. This year, the workshop attracted more than 200 registrations. We enjoyed an invited talk by Cambridge Assessment researchers, Mark Brenchley and Kevin Cheung; and another invited talk by professor Johanna Monti. We celebrated the 10th anniversary by introducing a new session on Research Notes for those who want to discuss their projects/ideas without a publication - a format that turned out to be a success.

For further information, see the workshop website:
<https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-series/nlp4call2021>
 

Sustainable language representations for a changing world
In this workshop we discussed how language representations or language models can be built to be sustainable, in a very general sense. The topics ranged from how to adapt to minority languages and language varieties, to ethical and legal concerns about privacy, copyright and questions of liability. We had 75 participants who followed and actively took part in the discussions, as well as invited talks by Linda Mannila (Digismart, Finland), Elisabet Lobo (Chalmers University) and Stanley Greenstein and Peter Wahlgren (Stockholm University).

For further information, see the workshop website:
<https://spraakbanken.gu.se/aktuellt/konferenser-och-workshopar/sustainable-language-representations>

This post is based on joint work with Gerlof Bouma. Illustrations by Jan and Julija.

Here’s a sad story (it’s fictional, but sad nonetheless). Matthias, Pernilla and Ingvar were working as computational linguists, and within a certain project painstakingly created a ingenious dataset. The community, however, did not show much interest in the dataset and it was largely forgotten. Years went. Matthias died. Pernilla invented a clever algorithm and became a multi-billionaire. Ingvar moved to USA, happened to see a crime and …
Fortsätt läsa ”Documentation: a (fictional) sad story with a (real) happy ending”

Till Språkbanksbloggen
Nu kan du snart göra sökningar i sammanlagt 36 olika bibelöversättningar på en och samma gång. Fredagen 28 maj offentliggörs en digital textsamling där bland annat forskare från Göteborgs universitet har samlat och digitaliserat bibeltexter från 1300-talet till idag. Samlingen är unik i sitt slag och kan användas för att göra automatiserade jämförelser mellan olika tidsepoker och språk.

– Idén att koppla ihop bibeltexter är inte ny – det har gjorts tidigare – men de flesta samlingar är inte offentligt tillgängliga. Vi hoppas på att det arbeta vi lagt på upphovsrättsfrågor ska göra att fler kan använda den här resursen, säger Evie Coussé, forskare på institutionen för språk och litteraturer vid Göteborgs universitet som leder arbetet.

Bibeltexterna har samlats in inom ramen för forskningsprojektet Uppkomsten av komplexa verbkonstruktioner i germanska språk och täcker sammanlagt fyra olika språk: engelska, nederländska, svenska och tyska. Gerlof Bouma på Språkbanken Text vid Göteborgs universitet har varit med och byggt den digitala textsamlingen.

Läs hela nyheten på gu.se >>

May 26, 14:00–15:30 Magnus Sahlgren at RISE; Aleksandrs Berdicevskis, Yvonne Adesam, Gerlof Bouma, Dana Dannélls at Språkbanken Text, Gothenburg University. This seminar presents the outcome of the SuperLim project, which provides the first General Language Understanding Benchmark (GLUE) for Swedish.

More information and registration >>

Den 28 maj kl. 14.00–15.00 offentliggörs en helt ny korpus baserad på historiska bibelöversättningar i engelska, nederländska, tyska och svenska – från trettonhundratalet till idag.

Gerlof Bouma på Språkbanken Text har tillsammans med Evié Coussé och Nicoline van der Sijs sammanställt korpusen EDGeS Diachronic Bible Corpus inom ramen för forskningsprojektet Uppkomsten av komplexa verbkonstruktioner i germanska språk. Under presentationen får du veta mer om hur korpusen går att använda och vad som blir nästa steg.

Mer information och program för webinariet >>

Page 1