A comment often received by the reviewers of manuscripts to scientific conferences and journals is one about the representative sample under scrutiny and whether there are any solid arguments for accepting that the population characteristics, and particularly the features extracted from the empirical data acquired from such a population (e.g. from speech production) provide sufficient or accurate enough information to use in various algorithmic approaches (e.g. in machine learning). State-of-the-art studies on computational methods to identify signs of cognitive deterioration in language or speech are usually based on limited amount of data that makes the generalizability and validity of the results questionable. Moreover, large-scale datasets, e.g. collection of spontaneous speech recordings in the domain of mental and neurological health, are scarce with few exceptions and very expensive to produce, with respect to both time and the extremely high operational costs.
One possible way to remedy for these limitations is to be able to participate in new or on-going, population-based cohort studies, and thus reduce the effects of these costs for collecting suitable datasets. Such as study is the latest in a series conducted in Gothenburg, Sweden, a longitudinal population-based cohort study with the name H70-1944, in which a defined population is followed up and observed longitudinally. This new, as well as the previous H70 cohort studies, provide empirical and high quality data that is critical to our understanding of the aging process as well as the risk factors involved in the development and progression of mental disorders and neurodegenerative diseases such as dementia. The new aspect of the latest cohort of H70 is the digital assessment of the neuropsychological tests (see ’Digital recordings in H70-1944’ below).
To improve health care for older persons, we need to learn more about the ageing process, e.g. identify protective factors and early markers for mental, cognitive and somatic diseases. The Gothenburg H70 Birth Cohort Studies (the H70-studies) are multidisciplinary epidemiological studies examining representative birth cohorts of older populations in Gothenburg, Sweden. The objectives of the H70-studies are, among other things, to make a longitudinal survey of the social and medical conditions of this population and to contribute to the knowledge of normal aging processes. The studies comprise a comprehensive, large range of tests, assessments and procedures, such as a psychiatric interview, physical examination, blood samples, brain imaging, spinal puncture, physical health, and neuropsychological tests. A large set of population characteristics for the participants, such as marital status, housing situation, family history and education are also included. So far, six birth cohorts of 70-year-olds have been examined over time. The birth cohort 1944 includes men and women born 1944 on specific dates, and registered as residents in Gothenburg. A total of 1203 subjects (559 men and 644 women) agreed to participate in the study. The studies have a significant clinical relevance, since the outcomes are related to prevention, early diagnosis, clinical course, experience of illness, understanding pathogenesis and prognosis for various diseases etc.
Language technology and the value of digital assessment
In the previous H70-studies, the performance of all cognitive and neuropsychological tests were noted on paper, since the value and impact of digital assessment for using recordings was not foreseen. However, recent international research has emphasized and demonstrated that the information contained in the speech signal and in the person’s linguistic timing can be very useful for separating people with various cognitive or mental impairment from healthy subjects, and at a very early stage. Language technology methods and tools seem to be a promising approach for the identification of preclinical stages of various neurogenerative and mental diseases.
Several studies have shown that the automatic extraction of various (acoustic, lexical, semantic, syntactic and pragmatic) variables from the speech signal such as prosodic features and sound properties (e.g. voice quality and pauses, articulation rate), or from the spoken signal’s (orthographic) transcription (e.g. grammatical errors), can provide important information that can improve the sensitivity and specificity of existing test instruments and screening tools for e.g. cognitive functioning, such as the Mini Mental State Examination (MMSE). All previous studies’ ”pen-paper” test protocol can not capture, for example, hesitations and other disfluency features, timing, restarts, pauses, unfinished sentences, etc. Valuable pieces of information that are extremely useful and complementary to the neuropsychological investigation are lost.
Digital recordings in H70-1944
During the latest H70 study (H70-1944) the neuropsychological assessment is digitally recorded using an iPad platform, while some language-based tests are even recorded using high quality audio with a digital voice recorder, these are described below.
The Boston Naming Test (BNT) was introduced in 1983 by E. Kaplan, H. Goodglass and S. Weintraub. BNT is a widely used neuropsychological assessment tool to measure confrontational word retrieval in individuals with dementing disorders such as Alzheimer’s disease. The BNT contains 30-60 line drawings graded in difficulty. Participants with naming difficulties (a symptom that makes it hard to recall the appropriate word to identify an object or a person’s name) often have greater difficulties with the naming of not only difficult and low frequency objects but also easy and high frequency ones. The examiner begins with item 1 and continues through the rest, unless the participant is in distress or refuses to continue. The participant is told to tell the examiner the name of each image and is given about 20 seconds to respond for each trial. If the participant fails to give the correct response, the examiner may give the participant a phonemic cue, which is the initial sound of the target word. (Wikipedia).
The Verbal fluency tests (VF) are psychological tests in which participants have to produce as many words as possible from a category in a given time (usually 60-120 seconds). This category can be semantic, including objects such as ’animals’ or ’fruits’, or phonemic, including words beginning with a specified letter, such as ’f’. The semantic fluency test is sometimes described as the category fluency test. (Wikipedia). Verbal fluency is a cognitive function that facilitates information retrieval from memory. Successful retrieval requires executive control over cognitive process such as selective attention, mental set shifting, internal response generation, and self-monitoring. Two VF tests are used in the H70-1944. For phonemic VF, the F-A-S test is used, while for the semantic VF, the category ’animals’ is used. In the F-A-S test an individual is requested to orally produce as many words as possible that begin with the letters ’F’, ’A’, and ’S’ within a prescribed time frame (60 sec). In the semantic verbal fluency, category ’animals’, test, an individual is requested to orally produce as many animal names as possible that belong to the category ’animal’ and also within a prescribed time frame (60 sec).
The Rey Auditory Verbal Learning Test (RAVLT), originally developed in 1964 by Andre Rey , evaluates a wide diversity of functions: short-term auditory-verbal memory, rate of learning, learning strategies, retroactive, and proactive interference, presence of confabulation of confusion in memory processes, retention of information, and differences between learning and retrieval. Five presentation trials of a 15-word list are given to the participants, each followed by attempted recall.
The Stroop test, developed in 1930s by John Ridley Stroop, evaluates the cognitive interference resolution as well as assesses the influence of selective attention (i.e. which information will be granted access to further processing and awareness and which will be ignored) on information processing speed, where a delay in the reaction time of a task occurs due to a mismatch in stimuli (the color and meaning of the word are incongruent). The task required the participants to read the written color names of the words independently of the color of the ink (for example, they would have to read ”purple” or ”red” no matter what the color of the font). (Wikipedia). This creates a conflict that the brain has to resolve. The reason why it takes longer is because the brain has to suppress the wrong answer that interferes with the right answer, before the right answer comes through.
The Trail Making Test A and B, developed in 1940s and was part of the ”Army Individual Test Battery”, consists of two parts in which the subject is instructed to connect a set of 25 dots as quickly as possible while still maintaining accuracy. The test can provide information about visual search speed, scanning, speed of processing, mental flexibility, as well as executive functioning. In Part A, circles on the screen are numbered 1 – 25, and the participant should draw lines to connect the numbers in ascending order. In Part B, the circles include both numbers (1 – 13) and letters (A – L); as in Part A, the participant draws lines to connect the circles in an ascending pattern, but with the added task of alternating between the numbers and letters (i.e., 1-A-2-B-3-C, etc.). The participant should be instructed to connect the circles as quickly as possible, without lifting the pen or pencil from the iPad. (Wikipedia).
In H70-1944 the BNT, VF, RAVLT, Stroop as well as the rest of the assessment are administered digitally by using the ’Δelta iPad app’ by ki elements UG, Germany. This implies that not only the spoken signal is registered but also the participant’s movements of the digital pen across the iPad’s screen, used in some of the tests, such as the ”Trail Making Test A and B”.
Future directions: linguistic and digital biomarkers
There is a growing interest and need to extract and use digitally assessed linguistic biomarkers for the assessment of disease severity, complications and prognosis of brain and mental disorders. Compared to classical pen-and-paper tests, the use of linguistically based and digital biomarkers have many advantages providing non-invasive, unobtrusive measurements of cognitive health, which are time- and cost-effective to utilize and even offer the possibility of continuous data gathering, for instance if parts of the assessment could be made available on a mobile phone or Ipad app.
Recent evidence suggests that combining results from different technologies may improve the accuracy for both the prediction of cognitive decline and the detection of e.g. dementia. Language technology and Artificial Intelligence tools use large volumes of multi-feature data to determine potential predictors of normal versus pathological changes in cognitive functioning. Digitally captured features are also less prone to human bias. For instance, acoustic features extracted from the recordings, e.g. salient features such as the frequency and duration of pauses and voice quality features such as fundamental frequency (F0), can unlock previous unobtainable sources of behavioral, social, and physiological variation with minimal effort required from the participants and the medical experts.
Previous research has shown that learning linguistic biomarkers from the utterances of individuals could provide important complementary information to help the clinical diagnosis of various early onset neurodegenerative diseases. However, there is a need to train the predictive and diagnostic models on larger datasets. In the majority of previous studies the sample sizes were too small to draw safe conclusions about the general population, the H70-1944 cohort is a population from which the experiments conducted pave the way for promising, future explorations and advances into the understanding of early signs of cognitive impairment with greater accuracy.