This methodology produces decision trees that target population groups more prone to suffering from mild cognitive impairment and are useful for cost-effective selective screening of the disease.
Mild cognitive impairment (MCI) is the first sign of dementia among elderly populations and its early detection is crucial in our aging societies. Common MCI tests are time-consuming such that indiscriminate massive screening would not be cost-effective. Here, we describe a protocol that uses machine learning techniques to rapidly select candidates for further screening via a question-based MCI test. This minimizes the number of resources required for screening because only patients who are potentially MCI positive are tested further.
This methodology was applied in an initial MCI research study that formed the starting point for the design of a selective screening decision tree. The initial study collected many demographic and lifestyle variables as well as details about patient medications. The Short Portable Mental Status Questionnaire (SPMSQ) and the Mini-Mental State Examination (MMSE) were used to detect possible cases of MCI. Finally, we used this method to design an efficient process for classifying individuals at risk of MCI. This work also provides insights into lifestyle-related factors associated with MCI that could be leveraged in the prevention and early detection of MCI among elderly populations.
Population aging is increasing the prevalence of chronic and degenerative diseases, especially degenerative dementias, which are expected to affect more than 131 million people worldwide by 20501. Among all the degenerative dementias, Alzheimer's disease (AD) is the most common with an overall prevalence in Europe of 6.88%2. Due to the ever-declining independence of AD patients, this group should start receiving support as soon as AD starts to manifest. Therefore, the early detection of prodromal signs of AD, such as mild cognitive impairment (MCI), is essential.
MCI is defined as an intermediate cognitive decline stage corresponding to normal aging and severe deterioration due to dementia3. According to estimates by Petersen et al.4, the prevalence of MCI is 8.4% among people aged 65-69 years and reaches 25.2% for those aged over 80 years. MCI results in individuals experiencing more difficulties than expected in the execution of low-level cognitive skills, especially those related to memory and language, but does not interfere with the activities of daily living.
Screening is not synonymous with diagnosis; the diagnosis of MCI will always be a clinical task whereas screening methods can only inform us that a patient has a higher probability of suffering from this pathology and that there is a well-founded suspicion of MCI that should be confirmed clinically. Hence, primary healthcare workers (doctors, pharmacists, nurses, etc.) could benefit from the availability of simple screening methods (brief cognitive tests) that can be applied in minutes. Ideally, these would objectively identify patients with a high probability of suffering an MCI so that they can then be clinically tested by general or specialized physicians.
Given that the early detection of MCI is becoming an essential task within the context of public health, this work aimed to identify which characteristics are useful in the targeted identification of MCI in screening tests of elderly populations. These groups would then be more thoroughly tested for MCI in tests administered by primary health care providers. This methodology provides a decision tree with the appropriate algorithms for identifying the population groups to target.
Among these characteristics, age is one of the most consistent factors associated with the development of this pathology. Other relevant characteristics are related to demographics or lifestyle5. Among the latter, some studies have identified the duration of daytime or nighttime sleep as a risk factor that can lead to the diagnosis of MCI5,6,7,8,9. The prolonged consumption of medications such as benzodiazepines, consumed by an estimated 20%-25% of older adults10,11, can also influence sleep hours and the development of MCI12,13. Indeed, prolonged treatments for chronic diseases may be important features useful in the pre-selection of individuals with a high risk of suffering from MCI.
Here, we developed data-based models that use automatic learning algorithms, a decision tree, and a predictive tool to increase the efficiency of the methodology for detecting MCI by discriminating which characteristics play an important role in the early detection of MCI. The resultant decision tree presented here was produced using a specific cohort of Spanish patients using community pharmacies. However, this method would also be useful among other populations with different characteristics.
This work was completed in collaboration with primary healthcare and specialized medical doctors. Community pharmacies were ideal for testing this algorithm because they are close to patients, have long opening hours, and are frequently visited and consulted. Degenerative dementias are complex conditions which are not always well understood by primary health care providers14. Therefore, becoming involved in the process will raise awareness of people suffering from MCI and dementias.
The methodology applied in this study has been previously published5 in work carried out at the University CEU Cardenal Herrera together with community pharmacies in the region of Valencia (Spain) associated with the Spanish Society of Family and Community Pharmacy (SEFAC). This current study was reviewed and approved by the Research Ethics Committee at the Universidad CEU Cardenal Herrera (approval no. CEI11/001) in March 2011. All individuals involved in the study gave their written informed consent to participation in accordance with the Declaration of Helsinki.
1. Selection of factors associated with mild cognitive impairment
2. Design of the questionnaires
3. Selection of tests for MCI screening
4. Subject recruitment
Figure 1: Flowchart of the research study and the proposed selective screening. The left side represents the initial study whose data were analyzed with machine-learning techniques to propose the selective screening for early detection of MCI shown in the right panel. This figure was modified from Climent34. Please click here to view a larger version of this figure.
5. Pharmacist researcher training
6. Study design
7. Interdisciplinary communication network, pharmacists, primary healthcare physicians, and specialists
Figure 2: Protocol for primary healthcare action. An example of primary healthcare actions that should be considered for early MCI detection before the patient is referred for a medical diagnosis by specialists. Please click here to view a larger version of this figure.
8. Statistical analysis and preprocessing
NOTE: Before applying machine-learning techniques a preparatory step is required to transform the original data into a new data set according to the final study objective and the procedures to be applied. For this transformation, several things should be considered, including the characteristics of the algorithms. This is because some of them are sensitive to a lack of variability or sharing of information across columns, although the algorithms used to generate decision trees are particularly robust against these problems. This initial phase aims to categorize qualitative variables and gather values with enough cases for each variable. For efficient screening it is important to choose variables whose acquisition is proven to be easy and accurate. Participants are selected by a short interview in which the algorithms used were constrained to a white-box model, making it easy to check the criteria used to decide if the individual should take the test. We suggest using the rpart29 package in R software for these algorithms, and implementing recursive partitioning.
9. Algorithms to create a decision tree
NOTE: Machine-learning algorithms must be properly parameterized to predict which individuals are likely to have a positive MCI test result. One of the main problems while screening for a condition is that the original data is expected to be imbalanced (i.e., few positive cases compared to the negative ones). To get models with balanced data we used a technique called down-sampling, or random sampling, to equalize the frequency with that of the lowest frequency class31. Efficient screening also requires reducing the number of false negatives as much as possible (i.e., increasing the sensitivity of the selection of participants suffering from MCI). One of the techniques used to achieve a greater sensitivity is the introduction of penalties in the calculation of Gini's impurity index (i.e., the index used by the algorithm to select the best split for the decision tree)32.
The participating pharmacies gathered data from 728 users and collected demographic variables in addition to the drugs prescribed to the participants. A univariate logistic regression was performed for all the variables34; the error bar graphs shown in Figure 3 and Figure 4 are convenient graphical representations of the confidence interval of the odds ratio (for qualitative variables) and the confidence interval of the coefficient of the logistic regression (for quantitative variables). Variables with p-values exceeding 0.01 (sex, age, education level, reading habit, time spent sleeping, depression, and memory complaints) were selected and used to generate a white-box model based on a decision tree. This decision tree was generated using a training data set comprising 583 individuals as an input and was validated with a test set of a cohort of 145 participants.
After using the caret33 library in R, the resultant tree assigned a probability of suffering MCI to each individual depending on their final node in the tree (depicted in Figure 5) as well as their answers to a few questions. To evaluate the forecasting capability of these probabilities, a ROC analysis of the test set was performed (Figure 6); its AUC was 0.763 and its 95% confidence interval was (0.6624, 0.8632). In addition to the probabilities, the tree shown in Figure 5 also used very simple questions about how long the person sleeps and how often they read, to recommend (with a sensitivity of 0.76 and specificity of 0.70) whether patients should take the MCI tests.
Using this decision tree and short interview to select users at risk of MCI we were able to significantly reduce the number of patients requiring MCI tests (administration is quite time-consuming). This reduction can be estimated by using data in the test set and interpreting the confusion matrix of the observed and predicted classes shown in Table 1. In this work, 55 out of 145 participants in the test set were identified by the decision tree for further MCI testing, (representing a reduction of 62% of users taking the tests) while also selecting most of the individuals (19 out of 25) who were positive for MCI.
Figure 3: Example of the variables selected during preprocessing. A 99% confidence interval of the odds ratio was calculated and is represented as an error bar. The base value for the logistic regression is indicated below the name of the variable at the top of every panel. For every value of the variable, an error bar represents the confidence interval of the odds ratio of taking that value versus taking the base value. Because the variables used to generate the tree were selected, the confidence intervals do not include the value 0 for some values as these showed significant differences. The scale of the vertical axis is logarithmic to help in comparisons across groups. Please click here to view a larger version of this figure.
Figure 4: Example of non-selected variables during preprocessing. A 99% Confidence Interval of the odds ratio was calculated and is represented with an error bar. The base value for the logistic regression is indicated below the name of the variable at the top of every panel. For every value of the variable, an error bar represents the confidence interval of the odds ratio of taking that value versus taking the base value. In contrast with the previous figure, all the confidence intervals of the selected variables include the value 0, since no significant differences were found to be included to generate the tree. The scale of the vertical axis is logarithmic to help comparison across groups. Please click here to view a larger version of this figure.
Figure 5: Proposed partition tree for selection of pharmacy users. The following tree shows the selection algorithm for MCI tests for individuals aged over 65 years. The text at the top of the box corresponds to the recommendation of taking the MCI screening tests, the two numbers below are the estimated probability of a negative or positive MCI testing outcome, respectively. The value at the bottom of the box is the percentage of individuals with these characteristics in the training set. The warmer the color of the box, the more likely the MCI tests was positive. The top node corresponds to the question about whether the participant has a memory complaint. If the individual does not have a memory complaint, the tree leads to the left branch and the ensuing questions ask about the individual's sex; patients with a memory complaint are asked about the amount of time they sleep per day. This figure was modified from Climent34. Please click here to view a larger version of this figure.
Figure 6: Receiver operating curves for the partition tree and sensitivity and specificity of the final decision in the test set. The graph represents the ROC curve of the probabilities assigned by the partition tree algorithm in the test set. The red surface corresponds to the AUC and the blue point on the curve shows the sensitivity and specificity of the final recommendation made by the tree. Please click here to view a larger version of this figure.
Reference | |||
No | Yes | ||
Prediction | No | 84 | 6 |
Yes | 36 | 19 |
Table 1: Confusion matrix. Confusion matrix of the predicted and observed values in the test set which were used to validate the proposed model.
After searching for terms associated with MCI in Cochrane studies in the PubMed database, a specific questionnaire was created for this study that used the most evident variables with a proven association with MCI. Demographic, lifestyle, and social factors, as well as the patient's pharmacotherapy and some relevant pathologies were also recorded. Additionally, the SPMSQ and MMSE MCI tests were also selected. Importantly, the SPMSQ was not affected by participants' level of schooling. Pharmacists were trained to administer this study and communication with primary and specialized care was assured via letters informing them of this work. Only specialized healthcare providers could definitively make a diagnosis if MCI was suspected as a result of these tests.
In conclusion, in this study we screened for MCI among a population with a low prevalence of the condition (17%). We designed a set of selection criteria for use with machine-learning techniques, which increased the percentage of MCI positives up to more than 30% among the selected users. Consequently, these tools help increase the screening efficiency and substantially reduce the cost of mass screening among the population group selected by the decision tree.
A limitation of this method is that the decision tree may become invalid in this specific cohort as the population changes and thus, will likely require periodic updates. For instance, many individuals in this population were illiterate, but the number of illiterate individuals aged over 65 years will decrease in the future. These demographic changes will affect the variables related to reading and will require future recalibration of the decision tree.
Remarkably, this data-driven model provided information about the most important variables (from among hundreds) in the construction of a concise yet informative and efficient model. Constructing a decision tree provides insight into the best variables to focus on and is both a cost-effective way to help select people for whom further MCI testing is recommended and furthers our knowledge of these populations in this context.
To increase the future percentage detection rate of MCI, we will require new cost-effective techniques that can assure increased effectiveness. This protocol is time-consuming and is difficult for pharmacists to integrate into their daily work. Thus, other tests such as the MoCA22 or SLUMS23 (both with adequate sensitivity and specificity) could be considered for fast the detection of MCI in the future.
A systematic evaluation of the trade-off between specificity and test duration should improve the effectiveness of the set of MCI tests used for screening. Moreover, relevant quantitative variables included in the study should have a wide range so that an efficient cut-off can be selected for them; a narrow range would exclude a large portion of the population from early detection. For instance, the age variable (which is always considered an important criteria in MCI diagnoses) was not considered relevant in this decision tree because the recruitment criteria (age over 65 years) was too conservative; inclusion of younger individuals in a future study would allow the optimal age for starting MCI screening to be calculated.
The authors have nothing to disclose.
This work was made possible by the support of the Know Alzheimer Foundation and help from the multimedia production service at the Universidad CEU Cardenal Herrera, especially Enrique Giner. We would like to recognize the work of all the participating pharmacies (SEFAC), and the collaborating doctors from the Society of Primary Care Doctors (SEMERGEN) and Neurology Society (SVN) who helped with the MCI diagnoses, especially Vicente Gassull, Rafael Sánchez, and Jordi Pérez. Finally, we thank all those who agreed to take part in this study.
caret | Max Kuhn | R package | |
rpart | Terry Therneau, Beth Atkinson, Brian Ripley | R package | |
SPMSQ in Spanish | Farmaceuticoscomunitarios.org | http://farmaceuticoscomunitarios.org/anexos/vol11_n1/ANEXO1.pdf | |
SPMSQ in English | geriatrics.stanford.edu | https://geriatrics.stanford.edu/culturemed/overview/assessment/assessment_toolkit/spmsq.html | |
MMSE in Spanish | Farmaceuticoscomunitarios.org | http://farmaceuticoscomunitarios.org/anexos/vol11_n1/ANEXO2.pdf | |
MMSE in English | oxfordmedicaleducation.com | http://www.oxfordmedicaleducation.com/geriatrics/mini-mental-state-examination-mmse/ |