We at Språkbanken Text have just released a new corpus of native (L1) and non-native (L2) speech in four languages: English, Spanish, French and Italian.
The corpus contains more than 170 million words produced by more than 97 thousand speakers (size varies a lot across the four languages, though). The corpus has been created by scraping WordReference forums, where users discuss various questions about languages. Importantly, every user has to provide their native language, and this information, alongside with the nickname, is …
Fortsätt läsa ”How native and non-native speakers talk to each other”
9 december 2020