In recognition of the groundbreaking corpus linguistic work initiated by Sture Allén at the University of Gothenburg in the 1960s (which had resulted in the creation of one of the first large electronic text corpora in another language than English, Press-65, one million words of newstext), Språkbanken (the Swedish Language Bank) was instated in 1975 as a national center (and funded on the national level) with a remit to collect, process and store (Swedish) text corpora, and to make linguistic data extracted from the corpora available to researchers and to the public. Through Språkbanken such users have been able to access linguistic and statistical data about a diverse range of Swedish text since the 1970s.
Språkbanken is much used by scholars in Sweden and internationally (particularly in Finland, where Swedish is an official language), for empirical research on various aspects of the Swedish language. Språkbanken is also used as a resource in the teaching of Swedish linguistics at the university level in a number of Swedish and Finnish universities.
Since its beginnings in the 1970s, Språkbanken has developed into a nationally and internationally acknowledged research unit whose work focuses on the development of linguistic resources and tools, and methodologies for using the resources in research in language technology and a number of other disciplines. Today, Språkbanken possesses a unique combination of competences in the areas of Swedish linguistic resources and language technology tools for working with linguistic resources in an integrated research infrastructure.
Part of Språkbanken's activities are aimed at the collection, processing and presentation of text corpora. The bulk of the corpora represent modern Swedish newstext and fiction, but Swedish texts in other genres and from other time periods - in fact, most periods of written Swedish - are increasingly being incorporated in Språkbanken.
Språkbanken's presentation of corpora and linguistic data is primarily in the form of concordances, accessed through a search interface. This is the presentation mode of choice for the purposes of traditional linguistic research, whereas language technology researchers often need whole corpora, e.g. in order to apply machine learning algorithms. Such resources are increasingly available for downloading from Språkbanken.
Lars Borin
sb-info at svenska dot gu dot se