	abstract     = {We present two simple adaptation methods to train a dependency parser in the situation when there are multiple treebanks available, and these treebanks are annotated according to different linguistic conventions. To test the methods, we train parsers on the Talbanken and Syntag treebanks of Swedish. The results show that the  methods are effective for low-to-medium training set sizes.},
	The open lexical infrastructure of Språkbanken
	We present our ongoing work on Karp, Språkbanken's (the Swedish Language Bank) open lexical infrastructure, which has two main functions: (1) to support the work on creating, curating, and integrating our various lexical resources; and (2) to publish daily versions of the resources, making them searchable and downloadable. An important requirement on the lexical infrastructure is also that we maintain a strong bidirectional connection to our corpus infrastructure. At the heart of the infrastructure is the SweFN++ project with the goal to create free Swedish lexical resources geared towards language technology applications. The infrastructure currently hosts 15 Swedish lexical resources, including historical ones, some of which have been created from scratch using existing free resources, both external and in-house. The resources are integrated through links to a pivot lexical resource, SALDO, a large morphological and lexical-semantic resource for modern Swedish. SALDO has been selected as the pivot partly because of its size and quality, but also because its form and sense units have been assigned persistent identifiers (PIDs) to which the lexical information in other lexical resources and in corpora are linked.
	Ny studie visar hur information till patienter med kolorektal cancer kan förbättras
	Skriftligt informationsmaterial är ofta skrivet på för hög nivå och ställer höga krav på den tänkta läsaren (patienten).  Förutom läsbarhet finns det fler faktorer att utvärdera för att se om materialet är lämpligt. Innehåll, struktur, layout och typsnitt, illustrationer och lärande och motivation är sådant som bör tas hänsyn till. Ett lämpligare, bättre anpassat material kan hjälpa personer med sjukdom att ställa bättre frågor när de har samtal med vårdpersonal och det kan göra personen mindre osäker och orolig för det okända som väntar. En ny studie som ingår i forskningsprojektet PINCORE (personcentred information and communication in colorectal cancer care) syftar till att förbättra information och kommunikation vid kolorektal cancer.
	Cloud Logic Programming for Integrating Language Technology Resources
	The main goal of the CLT Cloud project is to equip lexica, morphological processors, parsers and other software components developed within CLT (Centre of Language Technology) with so called web API:s, thus making them available on the Internet in the form of web services. We present a proof-of-concept implementation of the CLT Cloud server where we use the logic programming language Prolog for composing and aggregating existing web services into new web services in a way that encourages creative exploration and rapid prototyping of LT applications.
	Growing a Swedish constructicon in lexical soil
	Transferring Frames: Utilization of Linked Lexical Resources
	In our experiment, we evaluate the transferability of  frames from Swedish to Finnish in parallel corpora. We evaluate both the theoretical possibility of transferring frames and the possibility of performing it using available lexical resources. We add the frame information to an extract of the Swedish side of the Kotus and JRC-Acquis corpora using an automatic frame labeler and copy it to the Finnish side. We focus on evaluating the results to get an estimation on how often the parallel sentences can be said to
express the same frame. This sheds light to the questions: Are the same situations in the two languages expressed using different frames, i.e. are the frames transferable even in theory? How well can the frame information of running text be transferred from language to another?
	Adding a constructicon to the Swedish resource network of Språkbanken
	This paper presents the integrated Swedish resource network of Språkbanken in general, and its latest addition – a constructicon – in particular. The constructicon, which is still in its early stages, is a collection of (partially) schematic multi-word units, constructions, developed as an addition to the Swedish FrameNet (SweFN). SweFN and the constructicon are integrated with other parts of Språkbanken, both lexical resources and corpora, through the lexical resource SALDO. In most respects, the constructicon is modeled on its English counterpart in Berkeley, and, thus, following the FrameNet format. The most striking differencies are the inclusion of so-called collostructional elements and the treatment of semantic roles, which are defined globally instead of locally as in FrameNet. Incorporating subprojects such as developing methods for automatic identification of constructions in authentic text on the one hand, and accounting for constructions problematic for L2 acquisition on the other, the approach is highly cross-disciplinary in nature, combining various theoretical linguistic perspectives on construction grammar with language technology, lexicography, and L2 research.
	Search Result Diversification Methods to Assist Lexicographers
	We show how the lexicographic task of finding informative and diverse example sentences can be cast as a search result diversification problem, where an objective based on relevance and diversity is maximized. This problem has been studied intensively in the information retrieval community during recent years, and efficient algorithms have been devised.

We finally show how the approach has been implemented in a
lexicographic project, and describe the relevance and diversity
functions used in that context.
	Ett svenskt konstruktikon. Utgångspunkter och preliminära ramar
	Linking and validating Nordic and Baltic wordnets
	Connecting European Women Writers. The Selma Lagerlöf Archive and Women Writers Database
	A best-first anagram hashing filter for approximate string matching with generalized edit distance
	This paper presents an efficient method for approximate string matching against a lexicon. We
define a filter that for each source word selects a small set of target lexical entries, from which
the best match is then selected using generalized edit distance, where edit operations can be
assigned an arbitrary weight. The filter combines a specialized hash function with best-first
search. Our work extends and improves upon a previously proposed hash-based filter, developed
for matching with uniform-weight edit distance. We evaluate an approximate matching system
implemented with the new best-first filter, by conducting several experiments on a historical
corpus and a set of weighted rules taken from the literature. We present running times and
discuss how performance varies using different stopping criteria and target lexica. The results
show that the filter is suitable for large rule sets and million word corpora, and encourage
further development.
	Global Features for Shallow Discourse Parsing
	A coherently related group of sentences may be referred to as a discourse. In this paper we address the problem of parsing coherence relations as defined in the Penn Discourse Tree Bank (PDTB). A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. We present techniques on using inter-sentential or sentence-level (global), data-driven, non-grammatical features in the task of parsing
discourse. The parser model follows up previous approach based on using token-level (local) features with conditional random fields for shallow discourse parsing, which is lacking in structural knowledge
of discourse. The parser adopts a two-stage approach where first the local constraints are applied and then global constraints are used on a reduced weighted search space (n-best). In the latter stage we experiment with different rerankers
trained on the first stage n-
	title        = {Two for the price of one: an LFG treatment of sentence initial object es in German.},
	abstract     = {    We present an analysis of sentence initial object es ‘it’ in German. The
weak pronoun es may only realize such an object under specific information
structural conditions. We follow recent work suggesting these conditions are
exactly those that licence the use of the presentational construction, marked
by a sentence initial dummy es. We propose that the initial objects are an
example of function amalgamation, show that only objects that may also
appear in the clause-internal postverbal domain can participate in this fusion
and make this precise in LFG. We end the paper with a contrastive discussion.
	title        = {Hur kan vi förbättra skriftligt informations- och utbildningsmaterial för patienter som opereras elektivt för kolorektal cancer?},
	abstract     = {Kolorektal cancer (KRC) är den tredje största cancerdiagnosen i Sverige med drygt 5500 drabbade årligen. Primär behandling är kirurgi kompletterad av pre- och postoperativ onkologisk behandling. Standardiserade koncept för accelererat vårdförlopp med kortare vårdtider lägger mycket fokus på fysisk rehabilitering, men mindre på den psykiska påfrestning det innebär att bli opererad för en cancerdiagnos. Patienter förväntas ta stort ansvar för sin rehabilitering, både på sjukhuset och hemma. För att vara förberedd behövs både skriftlig och muntlig information.
Syftet med studien var att kartlägga och karaktärisera det skriftliga informations- och utbildningsmaterial (IOU) som används till patienter som opereras elektivt för KRC. Vidare var syftet att beskriva patienters uppfattning om struktur och innehåll på IOU.
IOU från 28 kliniker som opererar patienter med KRC samlades in (totalt 220 st). För att kunna ge ett mått på texternas svårighetsgrad gjordes språkteknologisk analys på samtliga IOU, där bl.a. ordlängd, meningsbyggnad och jämförelse med annan typ av litteratur mättes På 117 st gjordes en suitabilityanalys med instrumentet SAM+CAM där domän som innehåll, läsbarhet, bilder, layout samt stimulans och motivation för lärande bedömdes. Fem fokusgrupper med patienter genomfördes där patienterna uppmanades att berätta om vad de tycker utmärker ett bra respektive dåligt IOU, vad de saknar i innehåll och när och på vilket sätt de vill ha materialet utlämnat.
Resultatet av språkteknologiska- och suitabilityanalysen visar att de flesta IOU bedömdes som ”adequate”, men spridningen var stor. Patienterna hade önskemål om mer nivåuppdelat/nivåriktat material, där man själv kan välja hur mycket information man vill ha vid ett visst tillfälle. Flera ämnen saknades, eller var för otydligt beskrivna för att patienterna skulle känna sig trygga vid hemgång. 
Resultatet av de tre analysmetoderna bör kunna användas för att utveckla en ”verktygslåda” för att i framtiden kunna utforma bättre riktat IOU för patientgruppen.
	title        = {Implementing Programming Languages},
	title        = {Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries},
	abstract     = {The META-NORD project has contributed to an open infrastructure for language resources (data and tools) under the META-NET umbrella. This paper presents the key objectives of META-NORD and reports on the results achieved in the first year of the project. META-NORD has mapped and described the national language technology landscape in the Nordic and Baltic countries in terms of language use, language technology and resources, main actors in the academy, industry, government and society; identified and collected the first batch of language resources in the Nordic and Baltic countries; documented, processed, linked, and upgraded the identified language resources to agreed standards and guidelines. The three horizontal multilingual actions in META-NORD are overviewed in this paper: linking and validating Nordic and Baltic wordnets, the harmonisation of multilingual Nordic and Baltic treebanks, and consolidating multilingual terminology resources across European countries. This paper also touches upon intellectual property rights for the sharing of language resources.
	title        = {Non-atomic Classification to Improve a Semantic Role Labeler for a Low-resource Language},
	abstract     = {Semantic role classification accuracy for most languages other than English is constrained by the small amount of annotated data. In this paper, we demonstrate how the frame-to-frame relations described in the FrameNet ontology can be used to improve the performance of a FrameNet-based semantic role classifier for Swedish, a low-resource language. In order to make use of the FrameNet relations, we cast the semantic
role classification task as a non-atomic label prediction task.

The experiments show that the cross-frame generalization methods lead to a 27% reduction in the number of errors made by the classifier. For previously unseen frames, the reduction is even more significant: 50%.
	title        = {Sibirientyska},
	abstract     = {German in Siberia are transcriptions of German spoken in the region of Krasnoyarsk (Russia). The corpus contains about 34 000 running words. Codeswitching to Russian and verb forms are annotated (Russian word forms in brackets like [vot], finite verb forms (FINIT), infinite verb forms (INFIN)). The transcription and annotation of the corpus have been established in collaboration with the Astafyev University Krasnoyarsk. The corpus is a part of a research project at the University of Gothenburg, see
The data base is currently in the test phase.

	title        = {A Type-Theoretical Wide-Coverage Computational Grammar for Swedish},
	title        = {Literacy Demands and Information to Cancer Patients},
	abstract     = {This study examines language complexity of written health information materials for patients undergoing colorectal cancer surgery. Written and printed patient information from 28 Swedish clinics are automatically analyzed by means of language technology. The analysis reveals different problematic issues that might have impact on readability. The study is a first step, and part of a larger project about patients’ health information seeking behavior in relation to written information material. Our study aims to provide support for producing more individualized, person centered information materials according to preferences for complex and detailed or legible texts and thus enhance a movement from receiving information and instructions to participating in knowing. In the near future the study will continue by integrating focus groups with patients that may provide valuable feedback and enhance our knowledge about patients’ use and preferences of different information material.},
	title        = {SCXML for Building Conversational Agents in the Dialog Web Lab},
	abstract     = {The W3C has selected Harel Statecharts, under the name of State Chart XML (SCXML), as the basis for future stan- dards in the area of (multimodal) dialog systems (Barnett et al. 2012). In an effort to educate people about SCXML we are building a web-based development environment where the dialogs of embodied, spoken conversational agents can be managed and controlled using SCXML, in a playful and interesting manner.},
	title        = {Machine Learning for Emergent Middleware},
	abstract     = {Highly dynamic and heterogeneous distributed systems are challenging today's middleware technologies. Existing middleware paradigms are unable to deliver on their most central promise, which is offering interoperability. In this paper, we argue for the need to dynamically synthesise distributed system infrastructures according to the current operating environment, thereby generating "Emergent Middleware'' to mediate interactions among heterogeneous networked systems that interact in an ad hoc way. The paper outlines the overall architecture of Enablers underlying Emergent Middleware, and in particular focuses on the key role of learning in supporting such a process, spanning statistical learning to infer the semantics of networked system functions and automata learning to extract the related behaviours of networked systems.},
	title        = {Modeling Topic Dependencies in Hierarchical Text Categorization},
	abstract     = {In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role of category relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state of the art.},
	title        = {Korp – the corpus infrastructure of Språkbanken},
	abstract     = {We present Korp, the corpus infrastructure of Språkbanken (the Swedish Language Bank). The infrastructure consists of three main components: the Korp corpus pipeline, the Korp backend, and the Korp frontend. The Korp corpus pipeline is used for importing corpora, annotating them, and then exporting the annotated corpora into different formats. An essential feature of the pipeline is the ability to leave existing annotations untouched, both structural and word level annotations, and to use the existing annotations as the foundation of other annotations. The Korp backend consists of a set of REST-based web services for searching in and retrieving information about the corpora. Finally, the Korp frontend is a graphical search interface that interacts with the Korp backend. The interface has been inspired by corpus search interfaces such as SketchEngine, Glossa, and DeepDict, and it uses State Chart XML (SCXML) in order to enable users to bookmark interaction states. We give a functional and technical overview of the three components, followed by a discussion of planned future work.
	title        = {Improving the Recall of a Discourse Parser by Constraint-based Postprocessing},
	abstract     = {We describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These methods use a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points.},
	title        = {Real-Time Persistent Queues and Deques with Logic Variables (Declarative Pearl)},
	abstract     = {             We present a Prolog implementation of real-time persistent
queues and double-ended queues. Our implementation is inspired by
Okasaki’s lazy-functional approach, but relies only on standard Prolog,
comprising of the pure subset plus if-then-else constructs to efficiently
implement guards and meta-calls for convenience. The resulting data
structure is a nice demonstration of the fact that the use of logic variables
to hold the outcome of an unfinished computation can sometimes give
the same kind of elegant and compact solutions as lazy evaluation.
	title        = {Toward language independent methodology for generating artwork descriptions – Exploring FrameNet information},
	abstract     = {Today museums and other cultural heritage institutions are increasingly storing object descriptions using semantic web domain ontologies. To make  this content accessible in a multilingual world, it will need to be conveyed in many languages, a language generation task which is domain specific and language dependent. This paper describes how semantic and syntactic information such as that provided in a framenet can contribute to solving this task. It is argued that the kind of information offered by such lexical resources enhances the output quality of a multilingual language generation application, in particular when generating domain specific content.
	title        = {Swedish Kelly: Technical Report.},
	title        = {Properties of phoneme N -grams across the world’s language families},
	abstract     = {In this article, we investigate the properties of phoneme N -grams across half of the world’s languages. The sizes of three different N -gram distributions of the world’s language families obey a power law. Further, the N -gram distributions of language families parallel the sizes of the families, which also follow a power law distribution. The correlation between N -gram distributions and language family sizes improves with increasing values of N . The study also raises some new questions about the use of N -gram distributions in linguistic research, which we hope to be able to investigate in the future.},
	title        = {Towards a system architecture for ICALL},
	abstract     = {In this paper, we present an on-going project whose overall aim is to develop open-source system architecture for supporting ICALL systems that will facilitate re-use of existing NLP tools and resources on a plug-and-play basis. We introduce the project, describe the approaches adopted by the two language teams, and present two applications being developed using the proposed architecture.},
	title        = {From Quantification to Conversation},
	title        = {Proceedings of the SLTC 2012 workshop on NLP for CALL},
	title        = {Contextualisation of functional symptoms in primary health care},
	abstract     = {Background: a number of patients consulting primary health care have physical symptoms that may be labeled “medically unexplained”, i.e. absence of a demonstrable organic etiology. Common functional somatic symptoms (FSS) are irritable bowel, tension headache and chronic fatigue. FSS-patients are generally frustrated with the inability of health care to alleviate their illness. Health care staff often also feel frustration. The communication between patient and care giver is the key for coming to terms with the problem. Objective: to investigate how complex, vague and long-standing symptoms with no identified organic cause are put into context, interpreted and acted upon in primary health-care interactions. Two types of interventions are envisaged (i) methods for early identification of patients at risk of entering a vicious circle of functional symptoms and (ii) methods for re-interpreting symptoms in alternative and more purposeful ways. Methods: the project studies interactions between patients and nurses giving advice over telephone, consultations between patients and physicians, interviews and study patients' medical case notes. Eligible patients (18-65 y.o.) contact their primary health care centre by telephone, have had at least eight physical consultations with nurses or physicians in the last 12 months and if a majority of the symptoms within this time span had no clear organic or psychiatric cause. The project contains a number of subprojects, according to the type of data collected. Several methods of analysis will be used, mainly critical discourse analysis, phenomenologic-hermeneutic and computation linguistic analyses. (Expected) Results: using the collected data, we describe characteristics of the communication that takes place in these settings and the way symptoms and diseases are represented. This will facilitate the development of future interventions aimed at decreasing the morbidity due to FSS and give further insights into the problem. 
	title        = {Core vocabulary: A useful but mystical concept in some kinds of linguistics},
	title        = {Introducing Swedish Kelly-list, a new free e-resource for Swedish},
	abstract     = {Frequency lists and/or lexicons contain information about the words and their statistics. They tend to find their “readers” among linguists, lexicographers, language teachers. Making them available in electronic format helps to expand the target group to cover language engineers, computer programmers and other specialists working in such areas as information retrieval, spam filtering, text readability analysis, test generation, etc. 
This article describes a new freely available electronic frequency list of modern Swedish that was created in the EU project KELLY. We describe the state of affairs for Swedish frequency lexicons; provide a short description of the KELLY project; mention the corpus the list has been derived from. Further, we dwell on the type of information the list contains, describe shortly the steps for list generation; provide information on the coverage and some other statistics over the items in the list. Finally, some practical information on the license for the Swedish Kelly-list distribution is given; potential application areas are suggested; and future plans for its expansion are mentioned. We hope that with some publicity we can help this list find its users.
	title        = {Svenska språket i den digitala tidsåldern},
	title        = {Developing an Open-Source Web-Based Exercise Generator for Swedish},
	abstract     = {This paper reports on the ongoing international project System architecture for
ICALL and the progress made by the Swedish partner. The Swedish team is developing a
web-based exercise generator reusing available annotated corpora and lexical resources.
Apart from the technical issues like implementation of the user interface and the
underlying processing machinery, a number of interesting pedagogical questions need
to be solved, e.g., adapting learner-oriented exercises to proficiency levels; selecting authentic examples of an appropriate difficulty level; automatically ranking corpus examples by their quality; providing feedback to the learner, and selecting vocabulary for training domain-specific, academic or general-purpose vocabulary. In this paper we describe what has been done so far, mention the exercise types that can be generated at
the moment as well as describe the tasks left for the future.
	title        = {Processing spelling variation in historical text},
	title        = {Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation.},
	abstract     = {The study presented here describes the results
of the initial evaluation of two sorting
approaches to automatic ranking of corpus
examples for Swedish. Representatives from
two potential target user groups have been
asked to rate top three hits per approach for
sixty search items from the point of view of the
needs of their professional target groups,
namely second/foreign language (L2) teachers
and lexicographers. This evaluation has shown, on the one hand, which of the two approaches to example rating (called in the text below algorithms #1 and #2) performs better in terms of finding better examples for each target user group; and on the other hand, which features evaluators associate with good examples. It has also facilitated statistic analysis of the “good” versus “bad” examples with reference to the measurable features, such as sentence length, word length, lexical frequency profiles, PoS constitution, dependency structure, etc. with a
potential to find out new reliable classifiers.},
	title        = {bokstaffua, bokstaffwa, bokstafwa, bokstaua, bokstawa... Towards lexical link-up for a corpus of Old Swedish},
	title        = {Building Corpus-Informed Word Lists for L2 Vocabulary Learning in Nine Languages},
	abstract     = {Lexical competence constitutes a crucial aspect in L2 learning, since building a rich repository of words is considered indispensable for successful communication. CALL practitioners have experimented with various kinds of computer-mediated glosses to facilitate L2 vocabulary building in the context of incidental vocabulary learning. Intentional learning, on the other hand, is generally underestimated, since it is considered out of fashion and not in line with the communicative L2 learning paradigm. Yet, work is still being done in this area and substantial body of research indicates that the usefulness of incidental vocabulary learning does not exclude the use of dedicated vocabulary study and that by using aids explicitly geared to building vocabularies (such as word lists and word cards) L2 learners exhibit good retention rates and faster learning gains. Intentional vocabulary study should, therefore, have its place in the instructional and learning context. Regardless of the approach, incidental or intentional, the crucial question with respect to vocabulary teaching/learning remains: which and how many words should we teach/learn at different language levels?  An attempt to answer the above question was made within the framework of the EU-funded project titled “KELLY” (Keywords for Language Learning for Young and Adults Alike) presented here. The project aimed at building corpus-informed vocabulary lists for L2 learners ranging from A1 to C2 levels for nine languages: Arabic, Chinese, English, Greek, Italian, Norwegian, Polish, Russian and Swedish. },
	title        = {Drug interests revealed by a public health portal},
	abstract     = {Online health information seeking has become an important part of people's everyday lives. However, studies have shown that many of those have problems forming effective queries. In order to develop better support and tools for assisting people in health-related query formation we have to gain a deeper understanding into their information seeking behaviour in relation to key issues, such as medication and drugs. The present study attempts to understand the semantics of the users' information needs with respect to medication-related information. Search log queries from the Swedish health portal were automatically annotated and categorized according to relevant background knowledge sources. Understanding the semantics of information needs can enable optimization and tailoring of (official) health related information presented to the online consumer, provide better terminology support and thematic coding of the queries and in the long run better models of consumers’ information needs.
	title        = {Visual Analytics and the Language of Web Query Logs - A Terminology Perspective},
	abstract     = {This paper explores means to integrate natural language processing methods for terminology and entity identification in medical web session logs with visual analytics techniques. The aim of the study is to examine whether the vocabulary used in queries posted to a Swedish regional health web site can be assessed in a way that will enable a terminologist or medical data analysts to instantly identify new term candidates and their relations based on significant co-occurrence patterns. We provide an example application in order to illustrate how the visualizations of co-occurrence relationships between medical and general entities occurring in such logs can be visualized, accessed and explored. To enable a visual exploration of the generated co-occurrence graphs, we employ a general purpose social network analysis tool, Visone (, that permits to visualize and analyze various types of graph structures. Our examples show that visual analytics based on co-occurrence analysis provides insights into the use of layman language in relation to established (professional) terminologies, which may help terminologists decide which terms to include in future terminologies. Increased understanding of the used querying language is also of interest in the context of public health web sites. The query results should reflect the intentions of the information seekers, who may express themselves in layman language that differs from the one used on the available web sites provided by medical professionals.},
	title        = {The Journal of the Swedish Medical Association - a Corpus Resource for Biomedical Text Mining in Swedish.},
	abstract     = {Biomedical text mining applications are largely dependent on high quality knowledge resources. Traditionally, these include lexical databases, terminologies, nomenclatures and ontologies and, during the last decade, also corpora of various sizes, variety and diversity. Some of these corpora are annotated with an expanding range of information types and metadata while others become available with a minimal set of annotations. At the same time, it is of great importance that biomedical corpora for lesser-spoken languages also get developed in order to support and facilitate the implementation of practical applications for such languages and to stimulate the development of language technology research and innovation infrastructures in the domain. This paper provides a detailed description of a Swedish biomedical corpus based on the electronic editions of the Journal of the Swedish Medical Association "Läkartidningen" of the years 1996-2010. The corpus consists of a variety of documents that can be related to different medical domains, developed as a response to the increasing needs for large and reliable medical information for Swedish biomedical NLP. The corpus has been structurally annotated with a minimal set of meta information and automatically indexed with the largest and systematically organised computer processable collection of medical terminology, the Swedish SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms). This way topic-focused subcorpora, e.g. with diabetes-related content, can be easily developed.},
	title        = {Men, Women and Gods: Distant Reading in Literary Collections - Combining Visual Analytics with Language Technology},
	abstract     = {The volumes of digitized literary collections in various languages increase at a rapid pace and so increases the need to computationally support the analysis of such data. Literature can be studied in a number of different ways and from many different perspectives and text analysis make up a central component of literature studies. If such analysis can be integrated with advanced visual methods and fed back to the daily work of the literature researcher, then it is likely to reveal the presence of useful and nuanced insights into the complex daily lives, ideas and beliefs of the main characters found in many of the literary works. In this paper we describe the combination of robust text analysis with visual analytics and bring a new set of tools to literary analysis. As a show case, we analyzed a small subset (13 novels of a single author) taken from a large literary collection, the Swedish Literature Bank <!om/inenglish>. The analysis is based upon two levels of inquiry, namely by focusing on mentions of theistic beings (e.g. Gods' names) as well as mentions of persons' names, including their gender and their normalized, linked variant forms, and examining their appearance in sentences, paragraphs and chapters. The case study shows several successful applications of visual analytics methods to various literature problems and demonstrates the advantages of the implementation of visual literature fingerprinting. Our work is inspired by the notion of distant reading or macronalysis for the analyses of literature collections. We start by recognizing all characters in the novels using a mature language technology (named entity recognition) which can be turned into a tool in aid of text analysis in this field. We apply context cues, lists of animacy and gender markers and inspired by the document centered approach and the labelled consistency principle which is a form of on-line learning from documents under processing which looks at unambiguous usages of words or names for assigning annotations in ambiguous words or names. For instance, if in an unambiguous context where there is a strong gender indicator, such as 'Mrs Alexander' the name 'Alexander' is assigned a feminine gender, then subsequent mentions of the same name in the same discourse will be assigned the feminine gender as well unless there is a conflict with another person with the same name. We argue, that the integration of text analysis such as the one briefly outlined and visualization techniques, such as higher resolution pixel-based fingerprinting, could be put to effective use also in literature studies. We also see an opportunity to devise new ways of exploring the large volumes of literary texts being made available through national cultural heritage digitization projects, for instance by exploring the possibility to show several literary texts (novels) at once. We will illustrate some of the applied techniques using several examples from our case study, such as summary plots based on all the characters in these novels as well as fingerprints based on the distribution of characters across the novels.},
	title        = {Semantic Role Labeling with the Swedish FrameNet},
	abstract     = {We present the first results on semantic role labeling using the Swedish FrameNet, which is a lexical resource currently in development. Several aspects of the task are investigated, including the selection of machine learning features, the effect of choice of syntactic parser, and the ability of the system to generalize to new frames and new genres.
In addition, we evaluate two methods to make the role label classifier more robust: cross-frame generalization and cluster-based features.
Although the small amount of training data limits the performance achievable at the moment, we reach promising results. In particular, the classifier that extracts the boundaries of arguments works well for new frames, which suggests that it already at this stage can be useful in a semi-automatic setting.},
	title        = {Initial Experiments of Medication Event Extraction Using Frame Semantics},
	abstract     = {Semantic annotation of text corpora for mining complex relations and events has gained a considerable growing attention in the medical domain. The goal of this paper is to present a snapshot of ongoing work that aims to develop and apply an appropriate infrastructure for automatic event labelling and extraction in the Swedish medical domain. Annotated text samples, appropriate lexical resources (e.g. term lists and the Swedish Frame-Net++) and hybrid techniques are currently developed in order to alleviate some of the difficulties of the task. As a case study this paper presents a pilot approach based on the application of the theory of frame semantics to automatically identify and extract detailed medication information from medical texts. Medication information is often written in narrative form (e.g. in clinical records) and is therefore difficult to be acquired and used in computerized systems (e.g. decision support). Currently our approach uses a combination of generic entity and terminology taggers, specifically designed medical frames and various frame-related patterns. Future work intends to improve and enhance current results by using more annotated samples, more medically-relevant frames and combination of supervised learning techniques with the regular expression patterns.},
	title        = {Waste not, want not: Towards a system architecture for ICALL based on NLP component re-use},
	title        = { Advanced Visual Analytics Methods for Literature Analysis},
	abstract     = {The volumes of digitized literary collections in various languages increase at a rapid pace, which results also in a growing demand for computational support to analyze such linguistic data. This paper combines robust text analysis with advanced visual analytics and bring a new set of tools to literature analysis. Visual analytics techniques can offer new and unexpected insights and knowledge to the literary scholar. We analyzed a small subset of a large literary collection, the Swedish Literature Bank, by focusing on the extraction of persons’ names, their gender and their normalized, linked form, including mentions of theistic beings (e.g., Gods’ names and mythological figures), and examined their appearance over the course of the novel. A case study based on 13 novels, from the aforementioned collection, shows a number of interesting applications of visual analytics methods to literature problems, where named entities can play a prominent role, demonstrating the advantage of visual literature analysis. Our work is inspired by the notion of distant reading or macroanalysis for the analyses of large literature collections. },
	title        = {Automatic question generation for Swedish: The current state},
	abstract     = {The research area of question generation (QG), in its current form, has a relatively brief history within NLP. A description of the current question generation implementation for Swedish text built on schema parsing is here presented and exemplified. Underlying the current approach is the view of ‘all textual information as answers to questions.’ This paper discusses strategies for enhanced functionality for arbitrary Swedish text through extended question generation. It also brings up some theoretical issues regarding the nature of the task, and concerns practical considerations in an area such as Intelligent CALL (ICALL) where this type of application has been considered for English.

ISSN (print): 1650-3686, ISSN (online): 1650-3740},
	title        = {operAVoX - On PErson RApid VOice eXaminer},
	abstract     = {At present, objective analysis of voice quality using acoustic parameters is only possible within a voice laboratory using specialist hardware and software. We have developed an easy-to-use portable voice analysis and feedback application running on the Apple iPhone, iPad, or iPod Touch. OperaVOX™ combines the signal processing power, easy connectivity, user-friendly interface, high-quality microphones and portability of these handheld devices with novel acoustic voice analysis algorithms to provide a powerful voice quality measurement tool that you can carry in your pocket. OperaVOX™ is designed for anyone who is interested in measuring the quality of their voice, such as a patient recovering following a stroke, a professional voice user such as singers or an aspiring actor. Built into OperaVOX™ are the validated Voice Handicap Index questionnaires and the ability for the user to record their voice for acoustic and perceptual analysis both on board the device and externally in the voice laboratory. Furthermore, the user can instruct OperaVOX™ to automatically and confidentially send these data via email to their speech therapist, voice coach or researcher team. OperaVOX™ makes it easy for everyone to accurately measure changes in the quality of their voice every hour, day, or week and without having to travel to the hospital. Two versions of OperaVOX™ will soon be available on the Apple App Store, one for the general public and another for professionals such as speech and language therapists. We have also worked with world-leading University research teams both in the UK and North America to develop bespoke versions of OperaVOX™ specifically tailored for their research and clinical requirements.},
	title        = {Calculating the reliability of a likelihood ratio from a disputed utterance},
	title        = {Calculating the reliability of likelihood ratios: Addressing modelling problems related to small n and tails},
	abstract     = {In forensic speech science we are often faced with the problem of having a relatively small amount of data which is also
multivariate and distributionally complex. This results in a serious problem exactly in the scenario where potentially large
strengths of evidence could be obtained, i.e., when the trace data are on a tail of the distribution which models either the
prosecution or defence hypothesis and a large magnitude log likelihood ratio is calculated. By definition the sampling of
a distribution is sparse on its tails and this problem is compounded if the model is trained on a small amount of data – small
fluctuations in the training data can lead to large changes in the calculated likelihoods on the tails and thus large changes
in the calculated likelihood ratios for trace data on the tails. Large-magnitude calculated log likelihood ratios are therefore
inherently unreliable.},
	title        = {Maskininlärningsbaserad indexering av digitaliserade museiartefakter - projektrapport},
	abstract     = {Projektet har genomfört försök med maskinbaserad analys och maskininlärning för 
automatisk indexering och analys av bilder som stöd för registrering av föremål i 
museibestånd. Resultaten visar att detta är möjligt för avgränsade delmängder i kombination 
med maskininlärning som stöd för, men inte som ersättning för, manuell analys. Projektet har 
också funnit behov av utveckling av ett användargränssnitt för både text och bildsökning och 
utvecklat en prototyplösning för detta, vilket finns dokumenterat i denna rapport och i ett 
separat appendix till rapporten. Materialet utgör grundunderlag för implementeringar som 
innebär utökade sökmöjligheter, effektivare registrering samt ett användarvänligt gränssnitt. 
Arbetet ligger i framkant av forskningsområdets resultat och etablerade metoder och 
kombinerar statististiska, lingvistiska och datavetenskapliga metoder.  

Se länk till rapport och även länk till appendix längre ned. 
	title        = {Acoustic and perceptual characteristics of speech in 22q11 deletion syndrome: Measures of voice onset time and syllable durations related to articulation and prosody.},
	abstract     = {Without abstract},
	title        = {Voice Onset Time before and after STN-surgery in patients with Parkinson’s disease},
	abstract     = {Without abstract},
	title        = {Swedish Test of Intelligibility (STI) – Development of computerized assessment of word and sentence intelligibility and the performance of adult control speakers},
	abstract     = {Without abstract},
	title        = {Neural processing of familiar and unfamiliar voices},
	title        = {Adverbialkarakteristik för praktisk informationsextraktion i svensk text - Projektrapport},
	abstract     = {Den aktuella rapporten beskriver ett projekt som i första hand har inneburit ett praktiskt arbete syftande till att skapa en automatiserad process som returnerar frågeled, t.ex. varifrån, för adverbialled, t.ex. inifrån rummet, i svensk digital text. Det är en utbytesprocess som behövs av rent praktiska skäl i uppgiften frågegenerering, vilken innebär att en samling frågor som en text besvarar genereras snabbt automatiskt. Denna process finner sin plats i program som på olika sätt syftar till att ge informationsåtkomst i godtycklig okänd svensk text. Det är i detta tillämpningsfall fråga om att på något sätt öppna upp för den stora informationsmängd som i datalogiskt perspektiv ligger ’ostrukturerad’, dvs. i naturligt språk-form.

Syftet med att avgöra lämpliga frågeled (ofta till en hv-form) för förekommande satsled i text har dock förmodligen en mer allmän relevans än användning i nämnda programtyp. Förutom att också behövas i andra liknande datalingvistiska applikationer kan själva frågeställningen rymmas inom ramarna för grundforskningen. De vanliga semantiskt grundade adverbialkategorierna (vilka skiljer sig åt mellan olika grammatikor) definierar gärna adverbialkategorier just genom att beskriva vilka slags frågor de besvarar. Att som här sikta på att avgöra frågeled för adverbial är en mer detaljerad uppgift än att avgöra adverbialkategori.

Den praktiska metod som implementerats i projektet kan sönderdelas i ett antal steg som antas vara allmängiltiga och svåra att undgå med det aktuella syftet. Indata till programmet är ett i princip godtyckligt adverbialled som användaren i prototypprogrammet kan skriva in. De nämnda steg som tar vid är de följande. 1) En uppmärkning med ordklass- och annan grammatisk information för varje löpord inleder. Detta sker med en statistisk trigrambaserad s.k. Hidden Markov-modell. 2/3) Ett avgörande av vilken strukturtyp som ledet har (bisats, PP, etc.) görs utifrån löporden med informationen i föregående steg. Intimt förknippat med denna uppgift är bestämning av huvudord, och för flera led även bestämning av andra signifikanta komponenter som rektionshuvudord. Lösningen till detta delsteg heter rangbaserad chunkning. 4) De steg som följer härefter skiljer sig mycket åt beroende på den aktuella strukturtypen. För prepositionsfraser undersöks t.ex. preposition och, beroende på vilken preposition det är fråga om, rektionshuvudord, dess grundform och andra ingående textsegment. I arbetet har t.ex. SweFN (Borin, Dannélls, Forsberg, Toporowska Gronostaj, & Kokkinakis, 2010) delvis undersökts för att eventuellt förbättra avgörandet av substantivsemantik, vilket ofta blir relevant för PP-adverbial.

Rapporten visar hur uppgiften praktiskt sett varierar mycket i svårighetsgrad, från de fall där adverbialet utgörs av t.ex. particip-, adverbfraser eller bisatser, då en mappning till motsvarande frågeled ofta kan ske direkt utifrån huvudordet – till de mest komplicerade fallen av PP och s.k. som-fraser där kombinationer av huvudord, rektionshuvudord, dess grundform samt annan syntaktisk och semantisk information krävs för att urskilja förekomsters särskilda frågemotsvarigheter. Ett återkommande tema i det praktiska arbetet är undantag som behöver kännas igen. Exempelvis kategorin satsadverbial, som kan anta många olika strukturella former men som ändå oftast renderar resultatet ’ingen frågemotsvarighet’, måste kännas igen uttryckligen (ev. tillsammans med andra med samma frågeledsresultat). Även processen som helhet bygger emellertid programmeringstekniskt på grundfall och undantag. I många fall, som t.ex. för i-PP finns det en mängd olika motsvarigheter och vad som får utgöra grundfall i programmet blir en empirisk/heuristisk fråga under det att regler skrivs mot faktiska förekomster av adverbial i Stockholm Umeå Corpus (Hädanefter SUC). Att i liksom andra prepositioner kan sägas ha en prototypisk riktningsbetydelse betyder inte att var nödvändigtvis ska fungera som utgångsfall. Det förekommer ’lager’ av undantag inom olika strukturslag i programmet men även externt motiverade sådana utgående från huvudverbet, som genom valensmatchning kan klargöra att ett adverbial är ’prepositionsobjekt’ och därmed får andra omfrågningsegenskaper. De användargränssnitt som skapats och använts för regelskrivande utifrån faktiska exempel har tillåtit viss omedelbar regeluppdatering och återkontroll vid åsynen av felaktiga resultat. Det är också genom tillägg av nya undantagsregler i någon mening som programmet rimligen ska kunna förbättras framöver från den aktuella kvalitetsnivån. Korrektheten som uppnåtts hittills är inte kvantitativt övertygande men detta arbete som saknar föregångare möjliggör kontinuerlig förbättring genom programmet. 

Projektet visar att mappningsuppgiften…},
	title        = {The Rocky Road towards a Swedish FrameNet – Creating SweFN},
	abstract     = {The Swedish FrameNet project, SweFN, is a lexical resource under development, designed to support both humans and different
applications within language technology, such as text generation, text understanding and information extraction. SweFN is constructed
in line with the Berkeley FrameNet and the project is aiming to make it a free, full-scale, multi-functional lexical resource covering
morphological, syntactic, and semantic descriptions of 50,000 entries. Frames populated by lexical units belonging to the general
vocabulary dominate in SweFN, but there are also frames from the medical and the art domain. As Swedish is a language with very
productive compounding, special attention is paid to semantic relations within the one word compounds which populate the frames.
This is of relevance for understanding the meaning of the compounds and for capturing the semantic and syntactic alternations which
are brought about in the course of compounding. SweFN is a component within a complex of modern and historical lexicon resources
named SweFN++, available at <>.},
