Endoscopy 2022; 54(08): 785-786
DOI: 10.1055/a-1784-1772
Editorial

Artificial intelligence in endoscopy: transition from academic development to integration into daily endoscopic practice

Referring to Niikura R et al. p. 780–784
A. Jeroen de Groof
1   Department of Gastroenterology and Hepatology, Amsterdam UMC, Amsterdam, The Netherlands
› Author Affiliations

Over the past decade we have seen a rapid increase in the use of artificial intelligence (AI) systems in the endoscopic literature, most of which has been focused on colonic applications. The field is now moving forward rapidly, with implementation of the first clinical studies and an expansion of the research focus to a variety of endoscopic applications. Given its subtle endoscopic appearance and the corresponding diagnostic challenge, early gastric cancer is a logical application of AI. Several systems have recently been described, mainly focusing on primary detection systems [1] [2] [3].

In this issue of Endoscopy, Niikura and colleagues present results of a retrospective study in which they evaluated a previously trained computer-aided detection system for early gastric cancer [4]. For this evaluation, they applied an interesting experimental set-up using stratified matching of retrospectively collected imagery in a noninferiority comparison of AI versus expert assessment. The AI system was designed to first detect and then localize lesions on endoscopic white-light images by displaying a bounding box. The per-patient detection rate of 100 % by the AI system is impressive, as is its increased per-image detection rate when compared to expert endoscopists.

Where most studies aim for superior AI performance to that of general endoscopists, Niikura and colleagues report a noninferiority design comparing AI performance with expert endoscopists, which is an interesting approach. The argument could be made that AI performance that is noninferior to that of experts should suffice. AI systems are, however, usually designed to assist general endoscopists as a second reader. From a clinical perspective, stand-alone AI performance is therefore of lesser importance. In that respect, a more obvious comparison would be to evaluate performance of general endoscopists with AI assistance against that of expert endoscopists without AI assistance.

“Now that the field of AI in endoscopy is moving towards clinical testing, domain gaps between academic development and clinical application will become more apparent and need to be addressed”

Results appear promising, but we should consider important limitations of this study which are vital for successful clinical integration. Although the sensitivity of the AI system is high, there is no mention of specificity or the number of false-positive detections. Most AI studies primarily focus on sensitivity as the most important performance metric, given the direct and obvious clinical consequences of a missed lesion. It has been widely established that AI systems have the potential to outperform endoscopists in detecting additional pathology. High sensitivity in most studies is, however, generally associated with a significant false-positive rate. This is a serious threat for widespread acceptation and integration of AI systems into daily practice. False-positive detections may lead to distraction of the endoscopist, followed by “alert fatigue” or the collection of unnecessary biopsies and resection of benign lesions, increasing medical costs and patient burden. Given the selection bias of high-quality imagery reported in nearly all studies, and the corresponding lack of generalizability of reported performance, we can assume that in a real-world setting the number of false-positive detections will even be higher, further limiting successful integration of AI in the endoscopy suite.

Lack of generalizability of AI performance is a widespread problem. Virtually all endoscopic AI studies, including the paper by Niikura and colleagues [4], rely on heavily curated, high-quality imagery collected at high-volume academic centers by expert endoscopists. However, the quality of imaging procedures in community hospitals, where the bulk of surveillance endoscopies are performed, and where therefore – paradoxically – most AI systems will be employed, is considerably lower. This results in more heterogeneous input data than that used for the development and training of AI systems and the creation of hazardous blind spots, since AI systems will not detect what is not properly imaged. These AI systems are expected not to be robust against the data heterogeneity encountered in clinical practice. This lack of generalizability is often referred to as a domain gap. Now that the field of AI in endoscopy is moving towards clinical testing, domain gaps between academic development and clinical application will become more apparent.

Although challenging, domain gaps can be addressed. Robustness of AI systems may be enhanced by more heterogeneous training input from community centers or by the use of domain-specific pretraining. The implementation of quality assurance algorithms would set quality thresholds and thereby limit the heterogeneity of input data. The use of self-critical AI has recently been receiving increasing attention, where AI systems can assess the quality of their own predictions, enabling them to learn from interobserver variability. In the upcoming years we will likely see an increasing number of studies addressing domain gaps.

The feasibility of AI in endoscopy has been well established. The transition from academic development to clinical testing and subsequent integration into daily endoscopic practice, however, still holds many challenges. In the next few years we will see an increasing number of large-scale randomized clinical studies. Our critical appraisal should focus on operability in endoscopic workflow and the generalizability of results towards routine endoscopic practice, to ensure outcomes that are truly beneficial for our patients.



Publication History

Article published online:
04 May 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Gong D, Wu L, Zhang J. et al. Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study. Lancet Gastroenterol Hepatol 2020; 5: 352-361
  • 2 He X, Wu L, Dong Z. et al. Real-time use of artificial intelligence for diagnosing early gastric cancer by magnifying image-enhanced endoscopy: a multicenter, diagnostic study (with videos). Gastrointest Endosc 2021; DOI: 10.1016/j.gie.2021.11.040.
  • 3 Jin P, Ji X, Kang W. et al. Artificial intelligence in gastric cancer: a systematic review. J Cancer Res Clin Oncol 2020; 146: 2339-2350
  • 4 Niikura R, Aoki T, Shichijo S. et al. Artificial intelligence versus expert endoscopists for diagnosis of gastric cancer in patients who have undergone upper gastrointestinal endoscopy. Endoscopy 2022; 54: 780-784