In a recent study posted to Research Square*, researchers developed and tested a machine learning (ML)-based clinical decision support system (CDSS) to predict antibiotic resistance.
Background
Antibiotic resistance is a significant threat to global public health, exacerbated by the misuse of antibiotics. Initially, treatment is empirically prescribed until antibiograms and bacterial cultures are available. This can be challenging for the practitioner who has to balance the antibiotic spectrum and the susceptibility probability of suspected microbes according to illness severity and the risk of treatment failure. CDSSs may guide practitioners in selecting relevant antimicrobials. However, only a few ML-based CDSSs (ML-CDSSs) have been developed to predict antibiotic resistance and inform drug choices.
The study and findings
The present study developed and evaluated an ML-CDSS using historical data from a French Hospital. The researchers analyzed 30,975 antibiograms from over 13,000 patients between January 2014 and December 2020. Most bacteria were gram-negative rods and isolated from urine, respiratory tract, blood, or abscesses.
Major species isolated were Escherichia coli, Klebsiella pneumoniae, Staphylococcus aureus, Pseudomonas aeruginosa, and Enterococcus faecalis. The antibiogram data were (sub-)stratified into the 2014-19 and 2020 datasets. Culture types and species exhibited heterogeneity in susceptibility to single antibiotics. Susceptibility rates were also different across sample types; it was 56% to amoxicillin-clavulanate in urine samples but 35% in samples from the lower respiratory tract.
Susceptibility to cefotaxime/ceftriaxone was 49% for isolates from critical care patients but 77% for those from emergency room patients. Cefotaxime/ceftriaxone susceptibility was 27% for the carriage of multi-drug resistance (MDR) bacteria in the past three months, 54% if MDR carriage was not documented (for known patients), and 72% for isolates from unknown patients.
The susceptibility to single antibiotics was stable over time, except for amikacin, ertapenem, cefotaxime/ceftriaxone, and trimethoprim-sulfamethoxazole. Next, the team trained Bayesian (BAY) and frequentist (FRQ) inference models and ML algorithms using 2014-19 data (80% training and 20% test) and covariates, including sample origin/date, previous MDR bacteria carriage, and hospital ward type, to predict the susceptibility to antibiotics until antibiogram results.
ML algorithms included AdaBoost (ADA), gradient boosting (GBS), random forest (RF), bagging (BAG), extreme gradient boosting (XGB), neural networks (NN), and logistic regression (LR). Four stages were defined: 1) sampling, 2) direct examination, 3) culture and 4) species identification. Models were trained to predict antibiotic susceptibility probability to 22 single and 25 combination antibiotics for isolates of the 2020 validation dataset.
The receiver operating characteristic (ROC) curve was plotted, and the area under the curve (AUC) was estimated. The mean AUC of FRQ, ML, and BAY models increased from the sampling stage (0.594) to the species identification stage (0.847). The BAY, NN, XGB, and FRQ models performed better than other models at all stages.
Of note, the NN models had the highest mean AUC and the largest AUC at all stages except for the species identification stage, where the BAY model showed the highest AUC (0.918). The predictability was heterogeneous between antibiotics. Failure rates were null for BAY and ML models. NN models also had the lowest standard deviations.
Next, the team examined model performance for rare situations by plotting the least frequent situations only. The aggregate AUC decreased from 0.73 to 0.65 when limited to the rarest situations. LR performed the poorest at all stages, while BAY models performed fair, reaching high average AUC values for culture (0.824) and species (0.917) stages.
The mean AUC for NN models was less preserved, particularly for the first two stages (sampling and direct). Next, the Shapley additive explanations (SHAP) approach was applied to interpret NN model prediction. Sampling date/type had a less significant impact on predictions, while culture type contributed to performance at culture and species identification stages.
Conclusions
In summary, the models exhibited good predictability of antibiotic susceptibility, even at the early stages (sampling and direct examination). Prediction performance improved further in later stages (culture and species identification). BAY and NN models showed the best performance, with the highest mean AUC, even in rare situations.
Moreover, the lowest standard deviations in AUCs between antibiotics were also evident for BAY and NN models. Specifically, BAY models had stable AUCs in rare situations. FRQ model performance was impaired at later stages, while LR models performed poorly. ADA, BAG, RF, and GBS models had markedly lower performance than NN/BAY models.
*Important notice
Research Square publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.