Among all neurological diseases, the incidence of Parkinson’s disease (PD) has increased significantly. PD is typically diagnosed on the basis of motor nerve symptoms, such as resting tremors, rigidity, and bradykinesia. However, the detection of non-motor symptoms, such as constipation, apathy, loss of smell, and sleep disorders, could help in the early diagnosis of PD by several years to decades.
In a recent ACS Central Science study, scientists from the University of New South Wales (UNSW) discuss a machine learning (ML)-based tool that can detect PD years before the first onset of symptoms.
Study: Interpretable Machine Learning on Metabolomics Data Reveals Biomarkers for Parkinson’s Disease. Image Credit: SomYuZu / Shutterstock.com
At present, the overall diagnostic accuracy for PD based on motor symptoms is 80%. This accuracy could be increased if PD was diagnosed based on biomarkers rather than primarily depending on physical symptoms.
Several diseases are detected based on biomarkers associated with metabolic processes. Biometabolites from blood plasma or serum samples are assessed using analytical tools such as mass spectrometry (MS).
Non-invasive diagnostic methods using skin sebum and breath have recently gained popularity. Previous studies have shown that MS can project differential metabolite profiles between pre-PD candidates and healthy individuals.
This difference in metabolite profiles was observed up to 15 years prior to a clinical diagnosis of PD. Thus, metabolite biomarkers could be used to detect PD much earlier than recently used approaches.
ML approaches are widely used to develop accurate prediction models for disease diagnosis using large metabolomics data. However, the development of prediction models based on whole metabolomics data sets is associated with many disadvantages, including overtraining that could reduce diagnostic performance. The majority of models are developed using a smaller subset of features, which are pre-determined by traditional statistical methods.
Some ML approaches, such as a linear support vector machine (SVM) and partial least-squares-discriminant analysis (PLSDA) can fail to account for key features in metabolomics data sets. However, this limitation was resolved by advanced ML methods, such as neural networks (NN), which have been particularly designed for processing large data.
NN is used to develop models that have a non-linear effect. A key disadvantage of NN-based predictive models is the lack of mechanistic information and uninterpretable models.
Shapley additive explanations (SHAP) have recently been developed to interpret ML models. However, this technique has not yet been used to analyze metabolomics data sets.
About the study
In the current study, researchers evaluated blood samples obtained from the Spanish European Prospective Study on Nutrition and Cancer (EPIC) using different analytical tools such as gas chromatography-MS (GC-MS), capillary electrophoresis-MS (CE-MS) and liquid chromatography-MS (LC-MS).
The EPIC study provided metabolomics data from blood plasma samples obtained from both healthy candidates, as well as those who later developed PD up to 15 years later after their sample was originally collected.
Diane Zhang, a researcher at UNSW, developed an ML tool called Classification and Ranking Analysis using Neural Networks generates Knowledge from MS (CRANK-MS). This tool was built to interpret the NN-based framework to analyze the metabolomics dataset generated by the analytical tools.
CRANK-MS is comprised of several features, including integrated model parameters that offer high dimensionality of metabolomics data sets to be analyzed without requiring any preselecting chemical features.
CRANK-MS also includes SHAP to retrospectively explore and identify key chemical features that help in accurate model prediction. Moreover, SHAP enables benchmark testing with five well-known ML methods to compare diagnostic performance and validate chemical features.
The metabolomic data obtained from 39 patients who developed PD up to 15 years later were investigated through the newly developed ML-based tool. The metabolite profile of 39 pre-PD patients was compared with 39 matched control patients, which provided a unique combination of metabolites that could be used as an early warning sign for PD incidence. Notably, this ML approach exhibited a higher accuracy for predicting PD in advance of clinical diagnosis.
Five metabolites scored consistently high across all six ML models, thus indicating their potential utility for predicting the future development of PD. These metabolites’ classes included polyfluorinated alkyl substance (PFAS), triterpenoid, diacylglycerol, steroid, and cholestane steroid.
The detected diacylglycerol metabolite 1,2-diacylglycerol (34:2) isomers are certain vegetable oils like olive oil, which is frequently consumed in the Mediterranean diet. PFAS is an environmental neurotoxin that can alter neuronal cell processing, signaling, and function. Thus, both dietary and environmental factors may contribute to the development of PD.
CRANK-MS is publicly available to all researchers interested in disease diagnosis using the ML approach based on metabolomic data.
The application of CRANK-MS to detect Parkinson’s disease is just one example of how AI can improve the way we diagnose and monitor diseases. What’s exciting is that CRANK-MS can be readily applied to other diseases to identify new biomarkers of interest. She further claimed that this tool is user-friendly and can generate results “in less than 10 minutes on a conventional laptop.”
- Zhang, D. J., Xue, C., Kolachalama, V. B., & Donald, W. A. (2023) Interpretable Machine Learning on Metabolomics Data Reveals Biomarkers for Parkinson’s Disease. ACS Central Science. doi:10.1021/acscentsci.2c01468