Endoscopy 2021; 53(09): 902-904
DOI: 10.1055/a-1384-0485

Defining the next steps for artificial intelligence in colonoscopy

Referring to Ahmad OF et al. p. 893–901
1  Department of Gastroenterology and Hepatology, Westmead Hospital, Sydney, Australia
2  Westmead Clinical School, University of Sydney, Sydney, Australia
› Author Affiliations

Artificial intelligence (AI) in health care is rapidly advancing, driven by new developments in artificial or convolutional neural networks, which have allowed for an explosion in discriminative prediction complexity [1]. The use of AI in colonoscopy to improve patient outcomes makes intuitive sense. Diagnostic colonoscopy is performed in a standardized manner, images and videos are collected electronically, and the targets for detection and diagnosis are generally well established. More importantly, there is considerable variation in human performance, which has the potential to be improved by incorporating AI technologies [2].

Defining research questions and priorities is important in order to harness the international research energy being committed to AI and to translate this into clinically relevant outcomes sooner. In this issue of Endoscopy, Ahmad et al. present a consensus view from an international panel of experts on the key priorities for implementation of AI into clinical practice [3]. The group was made up of 12 endoscopists and 3 translational computer scientists/engineers, selected on the basis of involvement in collaborative AI research and publication history. Research questions were generated independently by group members. The anonymous questions were then reviewed and consolidated by a steering committee into a refined list comprising nine themes. These questions were then ranked using a modified Delphi process to arrive at the final prioritization list.

“AI will hopefully increase detection of all lesions but, just like humans, it may develop “blind spots” so needs large, independent, and clinically broad training datasets.”

All of the questions developed by this group are fascinating, and they cover a huge range of areas to be addressed in the clinical implementation of AI, from technical to ethical and legal. Owing to the composition of the expert panel and the focus of the group on implementation, the top ranked questions are skewed toward short-term technical barriers. A strength of the Delphi process is that it allows a coherent consensus view reflecting the views of the surveyed group. An inherent weakness, however, is that this view is heavily influenced by the selection process and composition of the group of experts. The authors point out that a wider group including endoscopists not involved in AI development, patients, public health researchers, and ethicists could have resulted in a different ranking of top priorities.

So, what is required to bring AI into routine clinical colonoscopy practice? As with any new technology, there are a host of barriers, but the critical question is clinical effectiveness: does it directly improve patient outcomes in the real world? Clinicians, patients, and health care organizations want to know that a new technology has an impact on defined clinical end points. This then allows judgments to be made on cost-effectiveness, prioritization against other resources, and balance against real or potential harms. Four of the top 10 clinical questions in the study by Ahmad et al. relate to defining and demonstrating clinical efficacy. AI in colonoscopy is at an early stage of this process, but there is already evidence that computer-aided polyp detection (CADe) improves adenoma detection rate (ADR) [4], a surrogate end point for colorectal cancer mortality [2]. An AI intervention with a proven clinical benefit is likely to quickly gain traction and become a standard of care. It is then likely that the adoption of additional AI interventions with indirect clinical benefits (workflow improvements, automatic quality measures) will swiftly follow. Cost-effectiveness was ranked just outside the top 10; however, this is an important consideration for implementation, as expensive systems with unproven clinical benefit will not be widely adopted.

Following efficacy, real or potential harms must be considered. Currently available AI systems have been developed by training using expert annotation of images. This training may be affected by bias and once embedded within an algorithm and disseminated, the bias is perpetuated. A system trained entirely on adenomas will be biased against sessile serrated lesions for example. Large flat or minimally elevated laterally spreading lesions are under-represented in existing training datasets but have a higher rate of covert malignancy or progression to cancer than diminutive polyps. CADe has been shown to increase the detection of diminutive polyps; however, detection of the inconspicuous laterally spreading lesions and advanced adenomas may have a greater influence on cancer risk reduction [5]. Endoscopists with high ADRs detect high rates of both advanced adenomas and diminutive polyps [6], but a recent meta-analysis of AI CADe only showed a trend toward increased detection of advanced adenomas [4]. This may simply reflect early data, but does it mean that there is a difference between an ADR and an AI-ADR? AI will hopefully increase detection of all lesions but, just like humans, it may develop “blind spots” so needs large, independent, and clinically broad training datasets. Future systems may be independently trained on prospectively collected “in vivo” data. This raises privacy and consent issues for patients and may create a risk that an algorithm independently generates a bias. AI systems should be developed with knowledge of these risks in mind, and new ways of acquiring patient consent for use of their data and monitoring the algorithm outputs must be considered for implementation [1]. Practicing endoscopists, who may be unfamiliar with these new technologies, have to be reassured that harms have been addressed, the systems are reliable, patient benefits are proven, and that they will not be drowned in alerts and false positives that compromise rather than improve clinical care.

Regulatory approval pathways remain a barrier to implementation; however, once again, these rely on assessments of efficacy, usually with a more stringent focus on avoidance of harm. Fortunately, recent updates to the CONSORT and SPIRIT reporting guidelines to incorporate AI technologies (CONSORT-AI and SPIRIT-AI) provide a framework for robust studies that meet many of the requirements of regulatory bodies [7] [8]. One major challenge for regulatory authorities is how to deal with AI systems that are regularly updated, or deep learning systems that independently refine algorithms [9]. The United States Food and Drug Administration is piloting new approaches to deal with this, where in addition to appraisal of the software, the company that produces the technology is evaluated in areas including clinical responsibility, cybersecurity, and track record [10]. This novel approach ensures responsibility for the appropriate governance of a complex and evolving system.

The potential of AI in colonoscopy is enormous, and once implemented in routine clinical practice, it is likely to become the standard of care. Widespread clinical implementation depends on research that focuses on delivering clinically relevant outcomes, using transparent methodology on broad and representative datasets. Although it is tempting to race ahead, researchers also need to consider the questions that were outside the top 10 in this study and consider the longer-term ethical, regulatory, and safety aspects of AI. This study raises several questions and defines the many challenges ahead, but with expert groups like this one leading the way, AI in clinical colonoscopy practice will be here sooner than we think.

Publication History

Publication Date:
26 August 2021 (online)

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany