Korpusen finns även att ladda ner som omkastade meningsmängder från Språkbanken Texts sida med språkliga data.
Tillgången till forskningsdata är kritisk inom flera forskningsdomäner, men personligt innehåll hindrar ofta data från att vidareanvändas. Dataskyddsförordningen, GDPR (EU-kommissionen, 2016), föreslår pseudonymisering som en lösning för att säkra öppen tillgång till forskningsdata. Den största utmaningen är hur man effektivt pseudonymiserar data så att individer inte kan identifieras, samtidigt som man behåller data som är användbar för forskning inom bland annat datalingvistik, lingvistik och naturlig språkbehandling.
Under workshopen diskuteras flera utmaningar inom pseudonymisering.
RaPID-5@LREC-COLING2024: Resources and ProcessIng of linguistic, para-linguistic and extralinguistic Data from people with various forms of cognitive/psychiatric/developmental
Full day event: May 2024 (exact date TBA)
Location: Lingotto Conference Centre - Turin, Italy
More information: https://spraakbanken.gu.se/en/rapid-2024
The 5th RaPID Workshop (RaPID-5) is an interdisciplinary forum for researchers to share information, findings, methods, models and experience of the collection and processing of data produced by individuals with various forms of mental, cognitive, neuropsychiatric or neurodegenerative disabilities, such as aphasia, dementia, autism, Parkinson's disease or schizophrenia. Data includes spontaneous [continuous] speech and transcriptions, eye movement measurements, and various types of digital and multimodal biomarkers such as sensor data from mobile phones, smart watches, wearable devices, and the like.
A particular interest with RaPID-5 is studies on the relationship between different linguistic, paralinguistic and extralinguistic observations that can aid the identification, extraction, correlation, evaluation and modelling of different linguistic and/or multimodal phenotypes and measurements, which can be used to facilitate diagnosis, monitor development or predict individuals at risk of developing neurodegenerative or neuropsychiatric diseases.
RaPID-5 particularly welcomes contributions on multidisciplinary aspects of processing data from the aforementioned populations, and with a focus on the interaction between clinical/medical science/informatics, language technology, and computer science.
– Jag är inblandad i drift av datainfrastruktur på Språkbanken, och sköter om resurser som hjälper forskare att bedriva sitt arbete.
– Det är viktigt att kunna kommunicera, och det är fascinerande att det går att kommunicera även om det finns utmaningar såsom flertydighet.
Just nu ser Herbert till att Språkbanken Texts federerade inloggning fungerar på fler av webbtjänsterna, så att det räcker med en inloggning för de olika systemen.
Vad ni kanske inte visste om Herbert är att hen pluggade även Medeltidshögtyska på universitetet.
– Ett ord som jag tycker om är "ieman", som beroende på kontext kan betyda både "någon" eller "ingen".
Born and raised in Wrocław, Poland, she took her Master’s degree in linguistics in Heidelberg, Germany. As she started dating a Swede she planned a move to Sweden. At the same time, she started to look for something more practical to do with her linguistic knowledge.
– My friends from college became copywriters, translators and teachers. I started thinking about doing computational linguistics.
– I enjoy the academic stuff, it is a kind of a family tradition. Many of my family members were teachers or worked in academia so it is familiar to me.
Having graduated she started to look for work and was made aware of a PhD position at Språkbanken Text. It fitted what she had worked on before: corpus linguistics. She is working with Elena Volodina and her project Mormor Karl. One goal is to create algorithms for automatic pseudonimzation of research data. This has the benefit of increasing the accessability of data that contains sensitive information.
– Hopefully my work will give students an easier situation working with contemporary data than I had.
In her spare time Maria likes to play games, everything from computer to roleplaying games. She also enjoys going out to take pictures of Swedish wildlife.
– Now we play the new Swedish edition of Drakar & Demoner. It trains my Swedish, even though I mostly get better at the names of medieval arms!
Two months ago Emilie Francis arrived in Sweden. She is one of the newest PhD students at Språkbanken Text, Gothenburg University. Originally from Victoria, Canada, she likes Gothenburg, which she thinks is very similar to her home town.
“Emi”, as she likes to be called, was looking for a job related to NLP and data.
– I was not specifically looking for a PhD position this time, but I have been before. After another job opportunity fell through due to the COVID pandemic, I came across this position and applied.
What research are you going to do?
– I am going to study language in the media and bias, misinformation and the impact on politics and society. Being younger, I hear people talk about social media and all the scandals the algorithms are promoting. But have the radicals become more vocal? And has the activation of people and their participation in social media made people become more divisive on a lot of topics? My theory is that social media has made the gap wider.
Right now Emilie Francis is looking into current research on bias and factuality in media, and how certain organizations judge different publications.
– My current objective is to study the frameworks they use and see if it is applicable at a document level.
And on your computer screen right now?
– I am designing a statistics course for PhD students at Språkbanken Text.
She also has a dog called Nagi.
– He is an Akita Inu. He is 2 years old and has a lot of energy!