Skip to main content


	title        = {Resolving power of search keys in MedEval, a Swedish medical test collection with user groups: doctors and patients},
	abstract     = {This thesis describes the making of a Swedish medical text collection, unique in its kind in providing a possibility to choose user group: doctors or patients. The thesis also describes a series of pilot studies which demonstrate what kind of studies can be performed with such a collection. The pilot studies are focused on search key effectivity: What makes a search key good, and what makes a search key bad? The need to bring linguistics and consideration of terminology into the information retrieval research field is demonstrated. Most information retrieval is about finding free text documents. Documents are built of terms, as are topics and search queries. It is important to understand the functions and features of these terms and not treat them like featureless objects. The thesis concludes that terms are not equal, but show very different behavior. The thesis addresses the problem of compounds, which, if used as search keys, will not match corresponding simplex words in the documents, while simplex words as search keys will not match corresponding compounds in the documents. The thesis discusses how compounds can be split to obtain more matches, without lowering the quality of a search. Another important aspect of the thesis is that it considers how different language registers, in this case those of doctors and patients, can be utilized to find documents written with one of the groups in mind. As the test collection contains a large set of documents marked for intended target group, doctors or patients, the language differences can be and are studied. The author comes up with suggestions of how to choose search keys if documents from one category or the other are desired. Information retrieval is a multi-disciplinary research field. It involves computer science, information science, and natural language processing. There is a substantial amount of research behind the algorithms of modern search engines, but even with the best possible search algorithm the result of a search will not be successful without an effective query constructed with effective search keys.},
	author       = {Friberg Heppin, Karin},
	year         = {2010},
	publisher    = {University of Gothenburg},
	address      = {Göteborg},
	ISBN         = {978-91-87850-41-7},

	title        = {MedEval — A Swedish medical test collection with doctors and patients user groups},
	abstract     = {Background

Test collections for information retrieval are scarce. Domain specific test collections even more so, and medical test collections in the Swedish language non-existent prior to the making of the MedEval test collection. Most research in information retrieval has been performed in the English language, thus most test collections contain English documents. However, English is morphologically poor compared to many other European languages and a number of interesting and important aspects have not been investigated. Building a medical test collection in Swedish opens new research opportunities.


This article describes the making of and potential uses of MedEval, a Swedish medical test collection with assessments, not only for topical relevance, but also for target reader group: Doctors or Patients. A user of the test collection may choose if she wishes to search in the Doctors or the Patients scenario where the topical relevance assessments have been adjusted with consideration to user group, or to search in a scenario which regards only topical relevance.

In addition to having three user groups, MedEval, in its present form, has two indexes, one where the terms are lemmatized and one where the terms are lemmatized and the compounds split and the constituents indexed together with the whole compound.


Differences discovered between the documents written for medical professionals and documents written for laypersons are presented. These differences may be utilized in further studies of retrieval of documents aimed at certain groups of readers. Differences between the groups of documents are, for example, that professional documents have a higher ratio of compounds, have a greater average word length and contain more multi-word expressions.

An experiment is described where the user scenarios have been utilized, searching with expert terms and lay terms, separately and in combination in the different scenarios. The tendency discovered is that the medical expert gets best results using expert terms and the lay person best results using lay terms, but also quite good results using expert terms or lay and expert terms in combination.


The many features of MedEval gives a variety of research possibilities, such as comparing the effectiveness of search terms when it comes to retrieving documents aimed at the different user groups or to study the effect of compound decomposition in retrieval of documents. As Swedish, the language of MedEval, is a morphologically more complex language than English, it is possible to study additional aspects of the effect of natural language processing in information retrieval, for example utilizing different inflectional word forms in the retrieval of expert vs lay documents. MedEval is the first Swedish test collection of the medical domain.


The Department of Swedish at the University of Gothenburg is in the process of making the MedEval test collection available to academic researchers. },
	journal      = {Journal of Biomedical Semantics},
	author       = {Friberg Heppin, Karin},
	year         = {2011},
	volume       = {2},
	number       = {3},

	title        = {Towards Improving Search Results for Medical Experts and Laypersons},
	abstract     = {In a domain such as medicine, it is important that individuals’ infor-mation needs are met with information on a suitable level of difficulty and ex-pertise. This paper focuses on facilitating medical information access through reformulating queries and re-ranking result lists utilizing features typical for the language written for professionals or for laypersons. The aim is to produce re-sult lists where the ranking is better suited for the expertise level of the user. We will explore the possibility of using features such as trigger phrases for que-ry reformulation and document length, average word length or compound ratio for re-ranking.
The Swedish medical IR test collection, MedEval, from Språkbanken, Uni-versity of Gothenburg, will be used to find features specific for professional language and lay language and to study the effectiveness of these features in re-formulating queries and re-ranking search results based on the target group. The test collection contains 42,250 documents from the medical corpus MedLex , collected from all types of written medical information found in electronic for-mat, except patient records. The collection contains 62 topics. In total, 7,044 documents have been assessed both for relevance to these topics and for tar-get group.
Our experiments will be based on earlier explorative studies on medical ex-pert and lay language where some features were identified. It was found that documents written for professionals tended to have more tokens per document, longer words, and more compounds than lay documents.},
	booktitle    = {Proceedings of CLEFeHealth 2012},
	author       = {Friberg Heppin, Karin and Järvelin, Anni},
	year         = {2012},

	title        = {Assessors assessing assessments},
	abstract     = {This paper summarizes a questionnaire put to assessors of a
medical test collection adjusted to user groups: health care professionals
and lay persons. Three assessors were medical students, while two had
no medical training.  The study shows how persons with different level of expertise may reason when assigning relevance and target groups. A clear bias was found toward assigning target group of documents and topics to the assessors' own group.},
	booktitle    = {Proceedings of the 4th International Louhi Workshop on Health Document Text Mining and Information Analysis (Louhi 2013)},
	author       = {Friberg Heppin, Karin and Järvelin, Anni},
	year         = {2013},