This review evaluates AI’s ability to assess embryo health by analyzing images to predict chromosome conditions without invasive methods, offering potential advancements in non-invasive IVF screening.
Study: Non-invasive prediction of human embryonic ploidy using artificial intelligence: a systematic review and meta-analysis. Image Credit: Krakenimages.com / Shutterstock.com
In a recent study published in eClinicalMedicine, researchers evaluate the effectiveness of artificial intelligence (AI) algorithms in non-invasively predicting embryonic ploidy from embryonic images.
How is embryo aneuploidy detected?
Embryo aneuploidy is defined as an abnormal chromosome count that is a leading cause of implantation failure, pregnancy loss, and congenital abnormalities.
In in vitro fertilization (IVF), aneuploidy rates range from 25% to 40% in early-stage embryos, with its prevalence increasing with maternal age. Although preimplantation genetic testing for aneuploidy (PGT-A), a biopsy-based technique, improves IVF outcomes by determining embryo ploidy, it is costly, invasive, and restricted by ethical and legal limitations, thereby limiting its accessibility.
AI, through machine learning and deep learning models, has shown potential in accurately predicting embryo ploidy. However, further research is needed to enhance the predictive reliability and clinical applicability of these methods.
About the study
The current study was registered with International Prospective Register of Systematic Reviews (PROSPERO), followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) reporting guidelines.
Comprehensive literature searches were conducted across Publisher Medline (PubMed), Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica Database (Embase), Institute of Electrical and Electronics Engineers (IEEE), SCOPUS, Web of Science, and the Cochrane Central Register databases. This search identified studies on AI algorithms developed to assess human embryonic ploidy from medical imaging.
The search strategy included terms for AI, genetic testing, and chromosomal abnormalities. Studies published until August 10, 2024, were eligible if they reported diagnostic outcomes such as sensitivity, specificity, and predictive values or contained relevant 2×2 contingency data.
Articles were screened by two independent reviewers, with full-text retrieval and consultation with a third reviewer in the event of a discrepancy. Studies lacking AI models or those that used non-human samples, duplicates, and various publication types, such as editorials, were excluded from the analysis.
Two reviewers systematically extracted data using a standardized form to ensure accuracy. Diagnostic metrics like sensitivity and specificity were calculated from contingency tables when available.
Quality assessment was conducted using quality assessment of diagnostic accuracy studies for artificial intelligence (QUADAS-AI) criteria, and potential biases and applicability were evaluated, with any differences resolved by a third reviewer. Primary outcome measures including sensitivity (Se), specificity (Sp), and the area under the curve (AUC) were analyzed through hierarchical summary receiver-operating characteristic curves and a bivariate random effects model.
Heterogeneity was explored through meta-regression, with factors like algorithm type and geographical location evaluated. Deek’s funnel plot assessed publication bias, whereas subgroup analyses identified additional heterogeneity sources, such as AI model type, annotation method, and risk of bias.
Study findings
The initial search yielded 4,774 records, from which 1,543 duplicates were removed. Screening titles and abstracts excluded 2,837 studies, leaving 65 studies for full-text review.
Ultimately, 20 studies met inclusion criteria, 12 of which provided sufficient data for the meta-analysis. Sixteen of these studies were retrospective, two were prospective with double-blind AI model evaluation, and two did not specify research design. None of the studies utilized open-access images, whereas eight studies excluded low-quality images, and twelve did not address this factor.
External validation with non-sample datasets was performed in seven studies. Ten studies used deep learning (DL), five used machine learning (ML), and five employed both methods.
AI-driven decision support systems (DSSs) were classified into black-, matte-, and glass-box categories in four, five, and five studies, respectively. Four studies used either black- or matte-box models, whereas two used either matte-box or glass-box.
The pooled diagnostic performance of AI algorithms showed a Se of 0.67, Sp of 0.58, and AUC of 0.67. Selecting the highest-accuracy contingency tables across studies improved Se and Sp to 0.71 and 0.75, respectively, with an AUC of 0.80. Clinical utility analysis through a Fagan nomogram determined a 71% positive predictive value and 75% negative predictive value, assuming a 46% prevalence of euploid embryos.
Study quality was assessed using the QUADAS-AI tool, which indicated a high or unclear risk of bias in patient selection for 19 studies, primarily due to limited open-source data and lack of rigorous external validation. Heterogeneity analysis revealed significant variability, with an inconsistency index (I²) of 97.7% for Se and 92.2% for Sp. A threshold effect contributed to this heterogeneity, with variations in diagnostic cutoff values for euploid embryos.
Meta-regression identified factors influencing heterogeneity, including AI algorithm type, DSS category, annotation method, external validation, bias risk, maternal age, sample size, and publication year. Se and Sp were negatively correlated, which is frequently observed in diagnostic accuracy studies. Deek’s funnel plot showed no evidence of publication bias.
Subgroup analyses indicated that DL models had a higher AUC than ML models, at 0.71 and 0.63, respectively. Studies incorporating both image and clinical data showed enhanced performance, with an AUC of 0.71 compared to 0.62.
External validation, lower risk of bias, inclusion of maternal age, and larger sample sizes positively affected model outcomes. Newer studies were also associated with higher specificity and AUC, thus demonstrating improvements in AI model accuracy over time.
Conclusions
Although PGT-A is widely used to improve pregnancy outcomes by detecting chromosomal abnormalities, its invasiveness increases the risk of certain complications, including preeclampsia and placenta previa, with limited benefits on pregnancy or live birth rates. Thus, it is crucial to develop reliable and non-invasive ploidy prediction methods.
AI, which is already applied in various clinical fields, has the potential to support embryo assessments in assisted reproduction. However, existing AI models for ploidy prediction lack the accuracy required to replace PGT-A and should serve as support tools for embryo selection.
Journal reference:
- Xin, X., Wu, S., Xu, H., et al. (2024). Non-invasive prediction of human embryonic ploidy using artificial intelligence: a systematic review and meta-analysis. eClinicalMedicine. doi:10.1016/j.eclinm.2024.102897