Endoscopy 2022; 54(12): 1211-1231
DOI: 10.1055/a-1950-5694
Position Statement

Expected value of artificial intelligence in gastrointestinal endoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Position Statement

Helmut Messmann
 1   III Medizinische Klinik, Universitatsklinikum Augsburg, Augsburg, Germany
,
Raf Bisschops
 2   Department of Gastroenterology and Hepatology, Catholic University of Leuven (KUL), TARGID, University Hospital Leuven, Leuven, Belgium
,
Giulio Antonelli
 3   Gastroenterology and Digestive Endoscopy Unit, Ospedale dei Castelli Hospital, Ariccia, Rome, Italy
 4   Department of Anatomical, Histological, Forensic Medicine and Orthopedics Sciences, Sapienza University of Rome, Italy
,
 5   Department of Gastroenterology, Porto Comprehensive Cancer Center, and RISE@CI-IPOP (Health Research Network), Porto, Portugal
 6   MEDCIDS, Faculty of Medicine, University of Porto, Porto, Portugal
,
Pieter Sinonquel
 2   Department of Gastroenterology and Hepatology, Catholic University of Leuven (KUL), TARGID, University Hospital Leuven, Leuven, Belgium
,
Mohamed Abdelrahim
 7   Endoscopy Department, Portsmouth Hospitals University NHS Trust, Portsmouth, UK
,
 8   Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London Hospital, London, UK
 9   Division of Surgery and Interventional Sciences, University College London Hospital, London, UK
10   Gastrointestinal Services, University College London Hospital, London, UK
,
11   Gastroenterology Department, Portuguese Oncology Institute of Coimbra, Coimbra, Portugal
,
Jacques J. G. H. M. Bergman
12   Department of Gastroenterology and Hepatology, Amsterdam UMC, Amsterdam, The Netherlands
,
Pradeep Bhandari
 7   Endoscopy Department, Portsmouth Hospitals University NHS Trust, Portsmouth, UK
,
13   Digestive Endoscopy Unit, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
,
12   Department of Gastroenterology and Hepatology, Amsterdam UMC, Amsterdam, The Netherlands
,
Dirk Domagk
14   Department of Medicine I, Josephs-Hospital Warendorf, Academic Teaching Hospital, University of Muenster, Warendorf, Germany
,
Alanna Ebigbo
 1   III Medizinische Klinik, Universitatsklinikum Augsburg, Augsburg, Germany
,
Tom Eelbode
15   Department of Electrical Engineering (ESAT/PSI), Medical Imaging Research Center, KU Leuven, Leuven, Belgium
,
Rami Eliakim
16   Department of Gastroenterology, Sheba Medical Center Tel Hashomer & Sackler School of Medicine, Tel-Aviv University, Ramat Gan, Israel
,
Michael Häfner
17   2nd Medical Department, Barmherzige Schwestern Krankenhaus, Vienna, Austria
,
Rehan J. Haidry
 8   Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London Hospital, London, UK
 9   Division of Surgery and Interventional Sciences, University College London Hospital, London, UK
,
Rodrigo Jover
18   Servicio de Gastroenterología, Hospital General Universitario Dr. Balmis, Instituto de Investigación Biomédica de Alicante ISABIAL, Departamento de Medicina Clínica, Universidad Miguel Hernández, Alicante, Spain
,
Michal F. Kaminski
19   Clinical Effectiveness Research Group, University of Oslo, Oslo, Norway
20   Department of Gastroenterology, Hepatology and Clinical Oncology, Centre of Postgraduate Medical Education, Warsaw, Poland
21   Department of Oncological Gastroenterology and Department of Cancer Prevention, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
,
Roman Kuvaev **
22   Endoscopy Department, Yaroslavl Regional Cancer Hospital, Yaroslavl, Russian Federation
23   Department of Gastroenterology, Faculty of Additional Professional Education, N.A. Pirogov Russian National Research Medical University, Moscow, Russian Federation
,
19   Clinical Effectiveness Research Group, University of Oslo, Oslo, Norway
24   Digestive Disease Center, Showa University Northern Yokohama Hospital, Yokohama, Japan
,
Maxime Palazzo
25   European Hospital, Marseille, France
,
Alessandro Repici
26   Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy
27   IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
,
28   Gastroenterology Unit, Valduce Hospital, Como, Italy
,
29   North Tees and Hartlepool NHS Foundation Trust, Stockton-on-Tees, UK
30   Population Health Sciences Institute, Newcastle University, Newcastle, UK
,
Yutaka Saito
31   Endoscopy Division, National Cancer Center Hospital, Tokyo, Japan
,
Prateek Sharma
32   Gastroenterology and Hepatology Division, University of Kansas School of Medicine, Kansas, USA
33   Kansas City VA Medical Center, Kansas City, USA
,
Cristiano Spada
13   Digestive Endoscopy Unit, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
34   Digestive Endoscopy, Fondazione Poliambulanza Istituto Ospedaliero, Brescia, Italy
,
Marco Spadaccini
26   Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy
27   IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
,
Andrew Veitch
35   Department of Gastroenterology, Royal Wolverhampton Hospitals NHS Trust, Wolverhampton, UK
,
Ian M. Gralnek
36   Ellen and Pinchas Mamber Institute of Gastroenterology and Hepatology, Emek Medical Center, Afula, Israel
37   Rappaport Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel
,
Cesare Hassan***
26   Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy
27   IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
,
Mario Dinis-Ribeiro***
 5   Department of Gastroenterology, Porto Comprehensive Cancer Center, and RISE@CI-IPOP (Health Research Network), Porto, Portugal
› Author Affiliations
 

Abstract

This ESGE Position Statement defines the expected value of artificial intelligence (AI) for the diagnosis and management of gastrointestinal neoplasia within the framework of the performance measures already defined by ESGE. This is based on the clinical relevance of the expected task and the preliminary evidence regarding artificial intelligence in artificial or clinical settings.

Main recommendations: (1) For acceptance of AI in assessment of completeness of upper GI endoscopy, the adequate level of mucosal inspection with AI should be comparable to that assessed by experienced endoscopists. (2) For acceptance of AI in assessment of completeness of upper GI endoscopy, automated recognition and photodocumentation of relevant anatomical landmarks should be obtained in ≥90% of the procedures. (3) For acceptance of AI in the detection of Barrett’s high grade intraepithelial neoplasia or cancer, the AI-assisted detection rate for suspicious lesions for targeted biopsies should be comparable to that of experienced endoscopists with or without advanced imaging techniques. (4) For acceptance of AI in the management of Barrett’s neoplasia, AI-assisted selection of lesions amenable to endoscopic resection should be comparable to that of experienced endoscopists. (5) For acceptance of AI in the diagnosis of gastric precancerous conditions, AI-assisted diagnosis of atrophy and intestinal metaplasia should be comparable to that provided by the established biopsy protocol, including the estimation of extent, and consequent allocation to the correct endoscopic surveillance interval. (6) For acceptance of artificial intelligence for automated lesion detection in small-bowel capsule endoscopy (SBCE), the performance of AI-assisted reading should be comparable to that of experienced endoscopists for lesion detection, without increasing but possibly reducing the reading time of the operator. (7) For acceptance of AI in the detection of colorectal polyps, the AI-assisted adenoma detection rate should be comparable to that of experienced endoscopists. (8) For acceptance of AI optical diagnosis (computer-aided diagnosis [CADx]) of diminutive polyps (≤5 mm), AI-assisted characterization should match performance standards for implementing resect-and-discard and diagnose-and-leave strategies. (9) For acceptance of AI in the management of polyps ≥ 6 mm, AI-assisted characterization should be comparable to that of experienced endoscopists in selecting lesions amenable to endoscopic resection.


#

Abbreviations

ADR: adenoma detection rate
AI: artificial intelligence
BERN: Barrett’s esophagus-related neoplasia
BLI: blue-light imaging
CADe: computer-aided detection
CADx: computer-aided diagnosis
EAC: esophageal adenocarcinoma
EMR: endoscopic mucosal resection
ESCN: esophageal squamous cell neoplasia
ESD: endoscopic submucosal dissection
ESGE: European Society of Gastrointestinal Endoscopy
GI: gastrointestinal
IPCLs: intrapapillary capillary loops (IPCLs)
LDR: lesion detection rate
LGD: low grade dysplasia
LGIN: low grade gastrointestinal neoplasia
NBI: narrow-band imaging
PIVI: Preservation and Incorporation of Valuable Endoscopic Innovations
RCT: randomized controlled trial
SODA: Simple Optical Diagnosis Accuracy

Source and scope

This Position Statement from the European Society of Gastrointestinal Endoscopy (ESGE) defines the main outcomes of AI tasks in the setting of ESGE performance measures, anticipating minimum and desirable values that should be expected when implementing AI in our practice. It primarily focuses on the diagnosis and management of GI neoplasia, and on the possible impact on the already existing or new quality performance measures that have been defined by ESGE.

1 Introduction

Artificial intelligence (AI) represents a radical breakthrough in the performance of diagnostic endoscopy by assisting the endoscopist in well-defined narrow tasks, such as detection and characterization of GI neoplasia [1] [2] [3] [4] [5] [6]. This is based on real-time outputs from appropriately trained software – mainly based on deep learning architecture – that is able to recognize different endoscopic patterns of GI diseases in real time [7].

Because of the rapid completion and supply of AI systems by technology manufacturers, we may expect immediate implementation of AI by the endoscopy community before conclusive scientific evidence on its impact is available. However, AI benefits, as well as harms, can be predicted on the basis of the clinical relevance of the expected task and the preliminary evidence in artificial or clinical settings. In detail, the “expected value” of AI – that is, the value we can anticipate before well-designed clinical trials – depends on the clinical implications of pitfalls in endoscopic performance on the one hand, and the plausibility of AI’s compensating for such pitfalls on the other. This expected value is also affected by the possible harms of AI, such as the consequences of false-positive results, or AI-related deskilling of endoscopists. The value of AI is also affected by the training and level of expertise of the endoscopist, the prevalence and severity of a disease, and the interaction between AI and the human endoscopists. In addition, clear definition of reference standards is needed to in turn define the value of AI in the management of GI neoplasia.

The expected value of AI-related tasks is most naturally assessed within the framework of the quality performance measures already defined by the European Society of Gastrointestinal Endoscopy (ESGE) for specific techniques [8] [9] [10]. The main advantages are the availability of a clear definition and measurement method for each indicator, as well as a well-defined clinically relevant cutoff where appropriate. This setting is also likely to facilitate a more transparent assessment of the benefits and harms of AI systems in clinical practice, preventing pointless duplication of performance measures. However, due to the novelty of AI, we may expect that at least some AI-related tasks are not covered by the available performance measures, prompting the definition of new AI-orientated performance measures. This may be considered to be an additional benefit of AI for the quality of diagnostic endoscopy.

The aim of this ESGE Position Statement is to define the main outcomes of AI tasks in the setting of ESGE performance measures, anticipating minimum and desirable values that should be expected when implementing AI in our practice. The general assumption is that AI implementation may standardize the quality metrics in community endoscopy, and thus, clinical rather than technical validation is preferentially addressed. Similiarly to the performance measures documents, the main focus of this Position Statement is the detection and characterization of GI neoplasia.


#

2 Methodology

ESGE established a task force of experts to define the expected values of AI performance measures in endoscopy within the framework of the previously defined technique-based performance measures, and with a special focus on GI neoplasia. In addition, the clinical and technical information required in general to assess the value for any AI system was included. In order to match the AI tasks with ESGE performance measures, and to define their expected values, we adopted the following methodology:

(i) Task definition. For each procedure, we identified all the main specific and operational tasks performed by AI systems (e. g. assessment of the level of bowel preparation, recognition and localization of colorectal lesions for colonoscopy). This was done by technique-based literature searches (i. e., for upper or lower GI endoscopy), including both artificial and clinical studies. When meta-analyses or systematic reviews were available, these were prioritized; if they had not been updated, additional searches were performed. When literature data for a technique were too preliminary or absent (e. g. for hepatobiliary endoscopic procedures), we chose not to include that technique yet in the document, with the possibility of updating the Position Statement in the near future.

(ii) Integration of AI tasks with ESGE performance measures. Each AI task was placed into the corresponding domain among the seven in the ESGE quality documents [11], namely pre-procedure, completeness of procedure, identification of pathology, management of pathology, complications, patient experience, and post-procedure. For each task, a description of the main outcomes in terms of benefit and harms was provided. When possible, such outcomes were directly referred to the same methodology as that already adopted for the corresponding performance measure, such as rate of adequate bowel preparation, adenoma detection rate, or advanced image assessment. However, a characteristic of AI data is that most systems have been assessed only in an artificial setting. When such AI data could be clearly related to an ESGE performance measure, these were considered for the purpose of our document.

In the few cases when an AI task did not correspond to any of the quality performance measures for each domain, we devised new performance measures in order to assess the impact of AI in clinical practice, underpinned by a clear rationale.

(iii) Reference standard. For each outcome we defined one or more clinically relevant reference standards against which to assess the possible impact of the AI task. As AI is expected to standardize clinical practice, we generally adopted the term “comparable with the reference standard” (i. e., derived from experienced endoscopists or from pathology findings) to express equivalent performance. In detail, “comparable” was used to indicate a statistical noninferiority to the reference standard with a clinically appropriate noninferiority margin for the specific scenario.

Similarly, we decided to use the terms “experienced” and “less experienced” endoscopists throughout the text, discarding similar terms, such as “expert” and “non-expert.” When we use the term “experienced endoscopists” in describing a reference standard, it generally refers to a consensus of a panel of experienced endoscopists, possibly based on a validated consensus score.

In the case of assessment of resectability of neoplastic lesions, we generally preferred consensus by experienced endoscopists rather than post-resection pathology findings, as AI is primarily expected to drive the real-time endoscopic management of patients rather than predicting the post-resection pathology. On the other hand, in the areas where experts perform similarly to histopathological investigation (e. g. preneoplastic conditions in the stomach), histology was chosen to be the reference standard.

(iv) Expected value assessment . For each outcome, we defined an expected value for the AI performance measure according to two main factors, namely the clinical relevance of the performance measure related to the AI task (i. e., as a key or minor performance measure) and the expected benefit/harm of AI implementation. If this was different from the one proposed by the original performance measure, a clear methodology for measuring these values has been provided. In general, throughout the paper we have chosen to emphasize the role of AI in preventing overtreatment, usually unnecessary surgical referral, by prioritizing specificity rather than sensitivity values.

(v) Delphi agreement. For each statement, at least 80 % agreement was required for consensus to be reached. Where consensus was not reached, measures were reviewed in light of comments made and any additional evidence identified, and they were adjusted if required, followed by further voting rounds. If 80 % agreement was not reached after a maximum of three rounds of voting, consensus was considered to have been reached if > 50 % of participants voted in favor and < 20 % voted against the measure. Failure to meet this criterion resulted in the measure being discarded.


#

3 Domains and performance measures for AI in GI endoscopy

The domains and AI tasks are shown in [Fig. 1].

Zoom Image
Fig. 1 Incorporation of performance measures into the use of artificial intelligence in gastrointestinal endoscopy. GI, gastrointestinal; BERN, Barrett’s esophagus-related neoplasia; PIVI, Preservation and Incorporation of Valuable Endoscopic Innovations; SODA, Simple Optical Diagnosis Accuracy.

#

4 Upper GI tract

4.1 Completeness of procedure and quality control

Table 1

Upper gastrointestinal (GI) tract: Completeness of procedure.

Technique

Upper GI endoscopy

Domain

Completeness of procedure

AI task

Quality control of endoscopic inspection of upper GI mucosa

Description

  1. Real-time AI-based scanning of the mucosa to identify blind spots during upper GI mucosal inspection

  2. Automated recognition and photodocumentation of relevant anatomical landmarks

Performance measure

  1. Rate of complete inspection of the upper GI mucosa

  2. Percentage of reports with adequate photodocumentation

Rationale

Blind spots in mucosa inspection may result from rapid insertion or withdrawal of the scope or presence of bubbles and saliva (poor mucosal cleansing). This increases the miss rate for neoplastic lesions. By alerting to blind spots, AI assures a complete inspection of the entire mucosa.

Photodocumentation of anatomical landmarks is critical for an adequate report of the completeness of the procedure, and it may be supported by automatic recognition.

Reference standard

  1. Assessment of completeness of mucosal inspection by experienced endoscopists with standard scores

  2. Visualization of predefined anatomical landmarks

Expected value

  1. AI-assisted inspection of upper GI mucosa is comparable to completeness of procedure as defined by the reference standard

  2. Automated recognition and photodocumentation of relevant anatomical landmarks in ≥ 90 % of procedures

Recommendation

For acceptance of AI in assessment of completeness of upper GI endoscopy, the adequate level of mucosal inspection with AI should be comparable to that assessed by experienced endoscopists.

Agreement: 100 %

Recommendation

For acceptance of AI in assessment of completeness of upper GI endoscopy, automated recognition and photodocumentation of relevant anatomical landmarks should be obtained in ≥ 90 % of the procedures.

Agreement: 100 %

There is significant interobserver variability in the quality of mucosal assessment between endoscopists, and this is likely to contribute to neoplasia miss rates. Several studies have shown that in upper GI endoscopy the rate of missed cancers is substantial, ranging from 9.4 % to 11.3 % [12] [13] [14]. Completeness of an endoscopic procedure implies complete visualization of all normal structures and that any abnormal condition or lesion present is detected and described [8]. AI can assure quality control measures during upper GI endoscopy by multiple tasks, including alerting for blind spots, automated identification and photodocumentation of anatomical landmarks, recording of inspection time, and classification of mucosal visualization (visibility score) [15] [16] [17]. AI may alert the endoscopist if any of these factors are suboptimal with subsequent correction for any modifiable factors. Evidence supports that AI accurately identifies anatomical landmarks and significantly decreases the blind-spot rate compared to outcomes in control groups [16] [17] [18] [19] [20]. By standardizing the adequate inspection of upper GI mucosa, including reporting, AI can decrease the risk of missing neoplasia. A recent randomized controlled trial (RCT) showed a significant reduction (3.42 % vs. 22.46 %) in blind-spot rates during sedated conventional upper GI endoscopy that was assisted by AI [16]. A similar significant reduction was also seen when nonsedated upper GI endoscopy procedures were analyzed.


#

4.2 Detection of esophageal squamous cell neoplasia

Table 2

Detection of esophageal squamous cell neoplasia (ESCN).

Technique

Upper GI endoscopy

Domain

Identification of pathology

AI task

Detection of esophageal squamous cell neoplasia (ESCN)

Description

Real-time AI-assisted detection and localization of esophageal squamous neoplasia

Performance measure

Detection of esophageal squamous neoplasia

Rationale

AI-assisted detection of ESCN would reduce the neoplasia miss rate, especially for early lesions, and, subsequently, post-endoscopy cancer, especially in high risk patients.

It would also simplify the procedure as compared with Lugol’s chromoendoscopy, bearing in mind that widespread use of that technique is lacking in community endoscopy.

Reference standard

Detection of ESCN by experienced endoscopists with or without advanced imaging techniques

Expected value

AI-assisted endoscopist detection rate of ESCN comparable to the reference standard

Recommendation

For acceptance of AI in detection of esophageal squamous neoplasia, the AI-assisted detection rate should be comparable to that of experienced endoscopists with or without advanced imaging techniques.

Agreement: 100 %

With recent advances in endoscopic resection techniques, curative endoscopic resection can be performed for early ESCN. However, early detection of ESCN is challenging, with a significant miss rate [21], due to inadequate exposure of the mucosa and/or failure to detect subtle and flat lesions. Indeed, there is evidence that some missed lesions have actually been present in the visual field but not detected, and that the presence of a second observer could significantly improve detection rates [22] [23] [24]. Assessment with Lugol’s iodine is widely considered to be the current gold standard for ESCN detection, given its high sensitivity (albeit at the cost of lower specificity and some adverse events, namely severe esophageal spasm, chest pain, and caustic damage). Assessment using blue-light imaging (i. e., NBI) has been reported to have high sensitivity but again its specificity was low [25], though noninferior to Lugol assessment. However, advanced imaging is operator-dependent with variation in detection rates.

The application of AI to ESCN detection is at a relatively early stage compared to that for colorectal neoplasia. However, several studies have demonstrated its applicability and showed encouraging results. This task entails real-time AI-facilitated detection and localization of ESCN during endoscopy. AI algorithms can be trained to detect neoplasia at white light or at advanced imaging. Earlier studies have used conventional machine-learning approaches and focused on retrospectively collected images [26] [27]. More recent reports have applied deep neural networks to white-light images as well as to enhanced imaging including narrow-band imaging (NBI) and blue-light imaging (BLI) [28] [29] [30] [31] [32], with reported sensitivities as high as 100 % on nonmagnified enhanced images [33], and up to 96 % on magnification videos [29] [34]. However, data from large prospective and randomized studies are limited and further studies are urgently needed in this area. It would be desirable that AI-assisted endoscopy was compared with unassisted by using the same or similar modalities (i. e., white light, advanced imaging). The expected value of AI assistance is related to a decrease in the miss rate with less experienced endoscopists for those suspicious areas that would be detected by experienced endoscopists, with performance of targeted biopsy or appropriate treatment.


#

4.3 Management of esophageal squamous cell neoplasia

Table 3

Management of esophageal squamous cell neoplasia (ESCN).

Technique

Upper GI endoscopy

Domain

Management of pathology

AI task

Management of ESCN: selection of lesions amenable to endoscopic resection

Description

Real-time AI-assisted estimation of invasion depth of ESCN

Performance measure

Endoscopic prediction of lesion resectability

Rationale

Risk stratification and planning of therapy for an ESCN is crucial. It is usually done by evaluating a combination of lesion morphology and estimation of invasion depth, assessed by chromoendoscopy, with and without magnification, and/or expert-based evaluation of intrapapillary capillary loops (IPCLs). Certain characteristics are highly predictive of a high risk of lymph node metastasis and prompt an immediate surgical referral. However, many early lesions can be treated with curative intent by endoscopic resection performed by experienced endoscopists. Primarily, a false-positive diagnosis (leading directly to surgical resection) is to be avoided as it would result in surgical overtreatment.

Reference standard

Performance of experienced endoscopists in the selection of lesions that are amenable to endoscopic resection

Expected value

AI-assisted endoscopist selection for endoscopic resection of ESCN comparable to the reference standard

Recommendation

For acceptance of AI in the management of esophageal squamous neoplasia, AI-assisted selection of lesions amenable to endoscopic resection should be comparable to that of experienced endoscopists.

Agreement: 100 %

Accurate staging of ESCN is critical for risk stratification and planning of effective early therapeutic intervention. Recent advances in endoscopic resection techniques have enabled curative resection of early mucosal lesions with very low risk for lymph node metastasis. Current methods for staging of ESCN in the esophagus rely on enhanced imaging techniques such as blue-light endoscopy and magnification endoscopy performed by experienced endoscopists. A recent systematic review showed that magnification endoscopy with NBI showed sensitivity and specificity of 0.83 and 0.85, respectively [35]. However, these techniques need extensive training and have a steep learning curve. It may be anticipated that AI can fill this gap and support endoscopists, especially those with less experience and in low volume centers, to improve their staging of esophageal ESCN, with potential benefits in triage and referral practices as well as better risk stratification and selection of patients for endoscopic therapy.

The AI task involves the real-time AI-assisted staging of ESCN through determination of invasion depth, with white-light imaging or virtual chromoendoscopy with or without magnifying endoscopy, through the assessment of microvascular features such as intraepithelial papillary capillary loop (IPCL) patterns [36]. Several studies have demonstrated the feasibility of AI-assisted staging of ESCN on endoscopic images marked by experts. The standalone functioning of deep learning systems for ESCN staging has shown high performances, ranging from 88 % to 90 % sensitivity for deep submucosal invasion [28] [37]. Furthermore, recent in vivo studies have shown similar performances in real-time videos with nonmagnifying endoscopy [38]. The AI value depends on its expected efficacy in assisting less experienced endoscopists, with no or low experience in using advanced endoscopic imaging modalities, to achieve a rate of referral for ESCN endoscopic resection comparable to the reference standard. This, in turn, is expected to have a positive impact in reducing unnecessary referrals and inappropriate endoscopic resections.


#

4.4 Detection of Barrett’s esophagus-related neoplasia

Table 4

Detection of Barrett’s esophagus-related neoplasia (BERN).

Technique

Upper GI endoscopy

Domain

Identification of pathology

AI task

Detection of BERN

Description

Real-time AI-assisted detection and localization of Barrett’s esophagus neoplastic lesions

Performance measure

Detection rate of BERN

Rationale

Expert identification of Barrett’s neoplasia depends on targeted biopsies after the use of advanced imaging techniques, while random biopsies according to the Seattle protocol are usually taken in the community setting. However, the former requires appropriate training, while the latter is time-consuming, has a low diagnostic yield, and is associated with poor adherence in the community setting. During routine examinations, AI assistance may be expected to reduce the miss rate for visible neoplastic lesions, especially for small lesions, by targeting biopsies on suspected areas, and overcoming the poor adherence to the Seattle protocol and the high risk of missing BERN.

Reference standard

  1. Detection of Barrett’s high grade intraepithelial neoplasia (HGIN) or cancer by experienced endoscopists using advanced imaging techniques

  2. Detection of Barrett’s low grade dysplasia (LGD) when no lesions are visible, using the Seattle protocol

Expected value

  1. AI-assisted endoscopist detection rate of Barrett’s HGIN or cancer comparable to reference standard

  2. AI-assisted endoscopist detection rate of Barrett’s LGD with targeted biopsies, comparable to that from biopsies performed according to the Seattle protocol

Recommendation

For acceptance of AI in the detection of Barrett’s high grade intraepithelial neoplasia or cancer, the AI-assisted detection rate for suspicious lesions for targeted biopsies should be comparable to that of experienced endoscopists with or without advanced imaging techniques.

Agreement: 100 %

Recommendation

For acceptance of AI in the detection of Barrett’s non-visible neoplasia, the AI-assisted detection rate based on targeted biopsies should be comparable to that obtained with Seattle protocol biopsies.

Agreement: > 90 %

With the increasing incidence of Barrett’s esophagus (BE) and Barrett’s esophagus-related neoplasia (BERN), early diagnosis is critical for the prognosis and justifies efficient detection, characterization, and surveillance. The Seattle protocol is time-consuming and is limited in sensitivity because of sampling errors.

Since early BERN is often flat and difficult to recognize in the surrounding nondysplastic tissue, the aid of (virtual) chromoendoscopy remains limited in nonexpert hands. A recent meta-analysis showed high miss rates of approximately 25 % for high grade dysplasia [39]. AI could help endoscopists, especially nonexperts, to reduce these high miss rates and improve detection of small focal lesions of BERN, thereby facilitating therapeutic and surveillance strategies and improving overall prognosis.

There has recently been growing interest in the AI-assisted detection of BERN. AI systems can be trained to detect small or suspect lesions, based on white-light or image-enhancement modes. Different research groups have recently shown high sensitivity of AI systems for detecting BERN during real-time endoscopy, ranging from 83.7 % to 95.4 % [40] [41] [42]. Two systematic reviews and meta-analyses, pooling both real-time in vivo studies and standalone performance studies, have also shown high detection performances of between 88 % and 96 % [34] [43].

Regarding the second statement on LGD, it primarily addresses the detection of neoplasia in BE when no lesions are visible (i. e., where current guidelines recommend use of the Seattle biopsy protocol). It is known that a high proportion of community endoscopists do not follow the protocol. In addition, the risk of missing LGD is significant even with adherence to the protocol. If no lesions are visible, the replacement of the Seattle protocol with targeted biopsies would be an attractive application for AI.


#

4.5 Management of Barrett’s esophagus-related neoplasia

Table 5

Management of Barrett’s esophagus-related neoplasia (BERN).

Technique

Upper GI endoscopy

Domain

Management of pathology

AI task

Management of BERN: selection of lesions amenable to endoscopic resection

Description

Real-time AI-assisted estimation of invasion depth of BERN

Performance measure

Endoscopic prediction of lesion resectability

Rationale

Risk stratification and planning of therapy for BERN is crucial, and is usually done by assessment of lesion morphology (Paris classification, relation to the esophageal wall) and pit pattern analysis by chromoendoscopy with or without magnification. Lesions with deep submucosal invasion should be directly referred to surgery, while those with mucosal or sm1 invasion should be treated endoscopically. Primarily, a false-positive diagnosis (leading directly to surgical resection) is to be avoided as it would result in surgical overtreatment.

Reference standard

Performance of experienced endoscopists in the selection of lesions that are amenable to endoscopic resection

Expected value

AI-assisted endoscopist referral for endoscopic resection of BERN is comparable to reference standard

Recommendation

For acceptance of AI in the management of Barrett’s neoplasia, AI-assisted selection of lesions amenable to endoscopic resection should be comparable to that of experienced endoscopists.

Agreement: 100 %

Barrett’s esophagus (BE) is the precursor of esophageal adenocarcinoma (EAC), a cancer with increasing incidence, and with a poor prognosis if diagnosed at a late stage. Endoscopic surveillance of known BE in order to screen for EAC is done to identify patients earlier in the metaplasia–dysplasia–carcinoma sequence to enable endoscopic therapy.

Estimation of the depth of submucosal invasion is crucial for risk stratification, and for correct planning of treatment, for example endoscopic mucosal resection (EMR) versus endoscopic submucosal dissection (ESD), or endoscopic versus surgical therapy. Lesions with deep invasion should be referred for surgery, whereas those with T1a invasion should be endoscopically treated [44].

Estimation of submucosal invasion is usually performed with (virtual) chromoendoscopy with or without magnification and remains problematic in less experienced hands with sensitivities of 90 % [45]. AI-assisted estimation of invasion depth can potentially avoid the learning curves that are required by experienced endoscopists both in order to recognize suspect lesions and to use chromoendoscopy for invasion depth estimation. AI-assisted BERN characterization involves the assessment of mucosal and vascular characteristics to discriminate between dysplasia and nondysplasia and, more importantly, to estimate the depth of invasion.

When compared against experienced endoscopists, AI demonstrated a higher accuracy in classifying dysplastic and nondysplastic BE lesions on still images [46]. The estimation of invasion depth has recently been investigated. The sensitivity, specificity, and accuracy of the AI system in discriminating T1a versus T1b were 77 %, 64 %, and 71 %, respectively; not significantly different from those of experienced endoscopists [47].

We preferred an endoscopic to a pathology reference standard, as the real-time clinical decision-making may be more relevant than the correct prediction of the exact invasion depth. The expected value depends on the possibility of standardizing the treatment of BERN in community endoscopy, preferentially avoiding overtreatment (unnecessary surgery) of these lesions.


#

4.6 Detection of gastric neoplasia

Table 6

Detection of gastric neoplasia.

Technique

Upper GI endoscopy

Domain

Identification of pathology

AI task

Detection of gastric neoplastic lesions

Description

Real-time AI-assisted detection and localization of gastric neoplastic lesions

Performance measure

Detection of gastric neoplasia

Rationale

Given the low prevalence of such disease, which reduces the possibilities for training in recognition of subtle lesions and mucosal patterns, AI is expected to aid less experienced endoscopists to increase their detection of gastric neoplasia. AI-assisted detection of gastric neoplasia would reduce the miss rate, especially in high risk patients.

Reference standard

Detection rate for gastric neoplasia by experienced endoscopists with or without advanced imaging techniques

Expected value

AI-assisted endoscopist detection rate of gastric neoplasia comparable to the reference standard

Recommendation

For acceptance of AI in detection of neoplastic lesions in the stomach, the AI-assisted detection rate should be comparable to that of experienced endoscopists with or without advanced imaging techniques.

Agreement: 100 %

Miss rates for gastric neoplasia are approximately 10 % [13] [14]. The low incidence of gastric cancer in most caucasian populations reduces the likelihood of proper training in the recognition of subtle lesions and of mucosal patterns [48] [49]. Incomplete examination and failure to spot flat visible lesions are the main reasons for this miss rate.

AI can detect and localize gastric neoplastic lesions. In detail, detection assisted by AI (mainly based on deep-learning architecture and convolutional neural networks [CNNs]) has been shown to be feasible in offline studies, both in endoscopic videos and still images, with 88 % sensitivity and 89 % specificity [34].

Given the low prevalence of gastric cancer in most European countries and the current high negative predictive value (NPV) of upper GI endoscopy, it is expected that large cohorts will be needed to show a benefit of AI-assisted detection in clinical studies. Possibly, AI-assisted detection should be applied to high risk patients such as those receiving surveillance for precancerous conditions or hereditary syndromes.

The reference standard is based on the availability of advanced imaging, such as virtual or dye-spray chromoendoscopy with magnification, and the level of experience of the operator.


#

4.7 Optical diagnosis of gastric precancerous conditions

Table 7

Optical diagnosis of gastric precancerous conditions.

Technique

Upper GI endoscopy

Domain

Management of pathology

AI task

Optical diagnosis of precancerous conditions

Description

Real-time AI-assisted diagnosis of gastric preneoplastic conditions

Performance measure

Accuracy of optical diagnosis of gastric preneoplastic conditions

Rationale

The presence and extent of preneoplastic conditions (as categorized by both histological and endoscopic staging systems) is associated with gastric cancer risk, and patients at higher risk are recommended to undergo endoscopic surveillance. AI-assisted optical diagnosis may help in accurate endoscopic classification, possibly avoiding the need for biopsies (reducing costs and workload) and allowing immediate follow-up recommendations.

Reference standard

Correct allocation of patients to endoscopic surveillance/no surveillance, based on histology

Expected value

AI-assisted endoscopic allocation of patients to endoscopic surveillance/no surveillance is comparable to reference standard

Recommendation

For acceptance of AI in the diagnosis of gastric precancerous conditions, AI-assisted diagnosis of atrophy and intestinal metaplasia should be comparable to that provided by the established biopsy protocol, including the estimation of extent, and consequent allocation to the correct endoscopic surveillance interval.

Agreement: 87 %

Searching for gastric precancerous lesions and appropriate surveillance is the only preventive strategy applicable in populations at low–intermediate risk of gastric cancer [50] [51]. The most effective protocol currently available is performance of 5 standard biopsies for assessing extent in the gastric mucosa of atrophy (operative link on gastritis assessment [OLGA] system) or intestinal metaplasia (operative link on gastric intestinal metaplasia [OLGIM] system), or the use of advanced imaging for intestinal metaplasia (endoscopic grading of gastric intestinal metaplasia [EGGIM] system) [52] [53] [54]. However, the standard biopsy protocol is time-consuming and costly, and is widely unapplied in routine practice.

Some data suggest that AI has a high accuracy for atrophy detection in the stomach, with 100 % sensitivity and 87.5 % specificity [55]. By scanning the entire gastric mucosa, AI may allow both detection and grading of extent of precancerous lesions, identifying those patients for whom scheduled follow-up is warranted. This application of AI would also save biopsy-related time and costs.


#

4.8 Management of gastric neoplasia

Table 8

Management of gastric neoplasia.

Technique

Upper GI endoscopy

Domain

Management of pathology

AI task

Selection of gastric neoplasia lesions amenable to endoscopic resection

Description

Real time AI-assisted estimation of resectability of gastric neoplastic lesions

Performance measure

Endoscopic assessment of lesion resectability

Rationale

Risk stratification and planning of therapy for a gastric neoplastic lesion are crucial. They are usually done on the basis of a combination of lesion morphology and estimation of invasion depth using advanced endoscopic imaging. Certain characteristics are highly predictive of a high risk of lymph node metastasis and prompt immediate surgical referral. However, many early lesions can be treated with curative intent by endoscopic resection performed by experienced endoscopists. Primarily, false-positive diagnoses (leading directly to surgical resection) are to be avoided as it would result in surgical overtreatment.

Reference standard

Performance by experienced endoscopists in the selection of lesions that are amenable to endoscopic resection

Expected value

AI-assisted endoscopist referral for endoscopic resection of gastric neoplasia is comparable to reference standard

Recommendation

For acceptance of AI in the management of gastric neoplastic lesions, AI-assisted selection of lesions amenable to endoscopic resection should be comparable to that of experienced endoscopists.

Agreement: 100 %

Risk stratification and planning of therapy for a gastric neoplastic lesion is crucial, and is usually based on a combination of lesion morphology and estimation of invasion depth using advanced endoscopic imaging [1] [56]. Certain characteristics are highly predictive of a high risk of lymph node metastasis and prompt immediate surgical referral [57]. However, many early lesions can be treated with curative intent by endoscopic resection performed by experienced endoscopists. It has been estimated that approximately 20 % of ESD resections of such lesions are noncurative [58]. Some endoscopic features associated with noncurative resection (such as color changes/redness, nodularity, interruption/convergence of gastric folds, friability) have been identified [58], but these features are somewhat subjective with their observation being prone to interobserver variability, and it is difficult to estimate the probability of curative resection based on these factors. This is especially true in a nonexpert setting where the decision must be made between referral to an expert center or to surgery. In this regard, AI optical diagnosis may assist the endoscopist in making a confident endoscopic diagnosis of gastric neoplastic lesions. In detail, these systems may constitute a valid alternative to preoperative histology for selection of lesions amenable to complete removal by ESD, provided that they demonstrate high accuracy in the near future.


#
#

5 Small bowel

5.1 Quality of bowel cleansing in small-bowel capsule endoscopy (SBCE)

Table 9

Quality of bowel cleansing in small-bowel capsule endoscopy (SBCE).

Technique

SBCE

Domain

Pre-procedure

AI task

Scoring of the level of cleansing for full-length SBCE video

Description

AI-assisted endoscopist scoring according to validated scales

Performance measure

Rate of adequate preparation of the small bowel

Rationale

Mucosal visualization in SBCE should be adequate in more than 95 % of cases (a key performance indicator), since a suboptimally prepared small bowel is more likely to harbor unrecognized disease and lead to greater costs and patient discomfort because of exam repetition.

Reference standard

Scoring of preparation as adequate/inadequate by experienced endoscopists

Expected value

AI-assisted endoscopist scoring of bowel preparation is comparable with reference standard

Recommendation

For acceptance of AI in defining the quality of cleansing for small-bowel capsule endoscopy (SBCE), AI-assisted scoring should be comparable to the scoring by experienced endoscopists of bowel preparation for the full-length SBCE video.

Agreement: 100 %

A clean view of the small-bowel mucosa plays a crucial role in SBCE since capsule endoscopy is a completely passive examination and suboptimal cleansing cannot be improved through suction or irrigation. To ensure reliable small-bowel exploration, mucosal visualization in SBCE should be adequate in more than 95 % of cases (a key performance indicator). Unlike the situation with colonoscopy, although several scales have been developed [59], only a few have been fully validated, and they are not systematically implemented in everyday clinical practice. For these reasons, small-bowel cleansing still requires subjective validation by an expert reviewer.

Automated AI classification of bowel cleansing could help in standardization of bowel cleansing scales and decrease interobserver variability, providing endoscopists with an objective assessment of the level of cleansing according to a quantitative or semiquantitative scale.

Computer-aided scoring of the cleansing level after bowel preparation has been shown to be accurate in rating small-bowel cleansing as compared with expert SBCE readers [59] [60] [61] [62] [63] [64] [65]. However, computer-aided scoring of cleansing is mostly based on still single-frame analysis, and comprehensive automatic full-video rating, although theoretically feasible [64], is still unavailable. Finally, the implementation of automatic cleanliness assessment systems must address technological differences between SBCE platforms.


#

5.2 Completeness of SBCE procedure

Table 10

Completeness of small-bowel capsule endoscopy (SBCE) procedure.

Technique

SBCE

Domain

Completeness of procedure

AI task

Identification of the cecum/colon during SBCE

Description

AI-assisted endoscopist identification of the cecum/colon landmarks

Performance measure

Assessment of the completeness of SBCE examination

Rationale

Identification of the cecum/colon is required for the certification of complete small-bowel visualization (ESGE key performance indicator). This also affects the measurement of completion rate. Potentially, AI-trained software will be able to automatically identify the landmarks of cecum/colon or stoma anatomy, assisting the endoscopist in the proper definition of completeness of the procedure.

Reference standard

Definition of completeness of SBCE based on cecum/colon or stoma landmark identification by experienced endoscopists

Expected value

AI-assisted definition should be comparable to reference standard

ESGE, European Society of Gastrointestinal Endoscopy

Recommendation

For acceptance of AI in evaluating completeness of SBCE investigation, the AI-assisted definition of completeness should be comparable to identification of the cecum or the colon or stoma by experienced endoscopists.

Agreement: 100 %

Accurate identification of the passage of the capsule through the GI lumen with identification of cecum/colon or stoma landmarks is required for certification of completeness of SBCE. The percentage for complete examinations reported in previous guidelines ranges from 64 % to 96 %, with a median of 80 % [66]. A complete exam rate of less than 80 % may be a risk factor for missing significant disease, although these data are still unclear. Misinterpretation or incomplete visualization of the landmarks may result in an inappropriate definition of the completeness of the procedure, resulting in the risk of missed lesions in the nonvisualized tracts. Currently, the assessment of completeness of SBCE is based on image evaluation by expert readers who identify typical features of the cecum/colon or stoma landmarks. Because of the subjective nature of such assessment, a system that can quickly, automatically, and reliably do this task would be highly desirable. By identifying the main landmarks and performing a precise location of the capsule through the GI tract, the software could potentially help the endoscopist to correctly assess the completeness of the exam.

In addition, the identification of anatomic landmarks in SBCE is crucial for calculating the SBCE time-based transit indexes that have a major role in the planning of further small-bowel endoscopic examinations (i. e., deciding on a peroral or a peranal approach for device-assisted enteroscopy).

Computer-aided recognition of the cecum/colon or stoma, and consequently precise localization of the capsule as it passes through the GI tract, is still a challenge. Few studies have reported on the potential application of AI regarding automatic localization of the capsule [67] [68]. These studies are based on visual odometry, which is the process of determining the position and orientation of a device by analyzing the associated camera images.


#

5.3 SBCE reading and lesion detection

Table 11

SBCE reading and lesion detection.

Technique

SBCE

Domain

Identification of pathology

AI task

Automated reading and lesion detection

Description

AI-assisted reading of SBCE and AI-assisted lesion detection

Performance measure

  1. Reading time

  2. Detection of clinically significant small-bowel lesions

Rationale

The diagnostic yield of SBCE is strictly related to the adequate visualization of small-bowel mucosa, reflecting also the quality of the exam. The reading time for the operator is the major burden of SBCE. By selecting frames with suspected lesions, AI may increase the lesion detection rate (LDR) of individual endoscopists to above the recommended levels, while potentially reducing the operator’s reading time.

Reference standard

  1. Proportion of patients with a diagnosis or a finding considered significant and related to the indication, as identified by experienced endoscopists

  2. Reading time

Expected value

AI-assisted LDR comparable to reference standard

Recommendation

For acceptance of artificial intelligence for automated lesion detection in SBCE, the performance of AI-assisted reading should be comparable to that of experienced endoscopists for lesion detection, without increasing and possibly reducing the reading time of the operator.

Agreement: > 95 %

The lesion detection rate (LDR) reflects the quality of SBCE, although a high variability, mainly according to indications, has been reported. The diagnostic yield reported in ESGE and American Gastroenterological Association (AGA) guidelines [66] [69] ranges from 47 % to 71 % for patients examined for suspected or definite Crohn’s disease and from 30 % to 73 % in the case of obscure gastrointestinal bleeding (OGIB). Moreover, diagnostic yield has been shown to be significantly affected by expertise, varying between experts (> 500 SBCEs) and trainees [70]. The use of a computerized system to increase LDR, integrated into the SBCE reading platform, and allowing even the less experienced to achieve expert performance, is highly desirable.

Current evidence has shown that several algorithms have proven effective and reliable in identifying small-bowel mucosal abnormalities, with accuracy parameters similar to those of experts. However, most studies on this topic have evaluated the performance of computerized systems focusing only on one class of lesions at a time (e. g., blood, vascular lesions, or ulcers or protruding lesions) [71] [72] [73]. But in daily clinical practice, except for selected cases, the nature of possible lesions located in the small intestine is poorly predictable before ingestion of the capsule. Recently, multiclass detection algorithms have been shown to be capable not only of identifying different abnormalities, but also of selecting those with clinically relevant potential [74] [75] [76] [77], and some capsule manufacturers have already integrated the software into their platform.

Nevertheless, data from everyday clinical practice are still lacking at present. This gap must be addressed before automated systems are used to help analyze SBCE videos: the assessment of clinical relevance cannot be based solely on morphological appearance but on the integration of those findings with clinical, anamnestic, and laboratory data.

Analysis of SBCE videos is a long, time-consuming process that requires prolonged and focused attention. One of the major benefits expected from the application of artificial intelligence in SBCE evaluation is a reduction in the endoscopist reading time. Available studies have consistently shown that CNN-based systems are much faster at image processing and analysis than human readers [71] [72] [73] [74] [75] [76] [77]. However, whether this will result in a shorter SBCE reading process is still under discussion. Current studies have mostly focused on the sensitivity of SBCE automated detection systems, emphasizing the low miss rate observed with their use. Nevertheless, a high sensitivity could increase the number of selected areas of interest that need to be checked (or double-checked) by the reader, thus increasing the reading time. Furthermore, there is an ongoing debate on how these systems should be integrated into routine SBCE analysis: whether the AI support systems should be applied before the operator’s reading (by pre-selecting only the possible areas of interest), or during the operator’s reading (as happens with AI systems for colonoscopy), or rather to perform a review after the operator’s reading.


#
#

6 Lower GI tract

6.1 Quality of bowel preparation (scoring of cleansing level)

Table 12

Quality of bowel preparation: scoring cleansing level.

Technique

Colonoscopy

Domain

Intraprocedure

AI task

Scoring the level of cleansing for colonoscopy

Description

AI-assisted endoscopist real-time scoring according to validated scales

Performance measure

Assessment of the level of bowel cleansing (NEW)

Rationale

Reliable measurement of the rate of adequate bowel preparation depends on a standardized scoring of the level of cleansing for each of the main colorectal segments. Despite the implementation of semiquantitative scales, bias related to interobserver agreement and suboptimal training remains. This may result in a miscategorization of the patient as having adequate/inadequate cleansing, with relevant clinical implications. AI real-time assessment may reduce interobserver variability, helping to standardize the scoring of mucosal cleansing.

Reference standard

Experience-based scoring of bowel preparation as adequate/inadequate

Expected value

AI-assisted endoscopist scoring of bowel preparation is comparable with reference standard

Recommendation

For acceptance of AI in categorizing the level of cleansing at colonoscopy, AI-assisted scoring should be comparable to that of experienced endoscopists.

Agreement: 100 %

Rating of the level of bowel preparation is a key performance measure as individuals with inadequate cleansing have a higher risk of missed colorectal neoplasia and should repeat the procedure within 1 year. We are here primarily referring to assessment after intraprocedural cleansing, although this process may be associated with substantial waste of time and endoscopy resources. Assessment of the cleansing level depends on the subjective interpretation of the validated scales, such as the Boston Bowel Preparation Scale (BBPS). Thus, suboptimal interendoscopist agreement as well as noncompliance with the use of validated scales may result in an inappropriate scoring of the level of cleansing [78]. In this regard, intra- and interobserver agreement for BBPS ranges between 0.74 and 0.91 after training [78]. Inappropriate scoring is clinically relevant as the final level of cleansing affects post-colonoscopy management, namely the assessment of the need for early repetition, the assigned surveillance intervals, and the audit of the endoscopy service.

Of note, scoring of the cleansing level should not be confused with the rate of adequate preparation. The latter is considered to be a pre-procedure key performance measure, while the former must be regarded as a new intraprocedure performance measure merely limited to the adequate scoring of the cleansing level.

The main advantage of AI in this case is to provide a homogeneous and automatic feedback, assisting the endoscopist to score the cleansing level. Such computer-aided scoring of cleansing level after bowel preparation has been shown to be feasible with AI systems based on supervised deep learning. The task specifically consists of AI-based scoring of one or more consecutive frames of real-time colonoscopy. AI is expected to provide the endoscopist with an objective assessment of the level of cleansing according to a quantitative or semiquantitative scale. Using the BBPS, one AI system has been tested in an artificial setting against findings from human experts (providing the “ground truth” data); its average accuracy was 91.9 % [79] [80]. These AI systems should be evaluated in prospective studies in real-world clinical settings. A recent RCT showed that when an AI system developed for real-time withdrawal speed monitoring was added to an existing computer-aided detection (CADe) system, this improved ADR compared with CADe alone or with no AI assistance [81].

Despite its subjectivity, scoring by experienced endoscopists using a validated scale appears to be the most clinically pertinent reference standard. As in previous studies in this topic, a consensus with multiple raters based on a centralized blinded reading seems a reasonable approach to reduce the weakness of the comparator reference standard.


#

6.2 Assessment of completeness of colonoscopy: identification of cecal landmarks by AI

Table 13

Completeness of colonoscopy.

Technique

Colonoscopy

Domain

Completeness of procedure

AI task

Identification of the cecal landmarks required for confirmation of completeness of colonoscopy

Description

Real-time AI-assisted endoscopist certification of complete colonoscopy (NEW)

Performance measure

Correct assessment of the completeness of colonoscopy

Rationale

Identification of the cecal landmarks is required for a certification of completeness of colonoscopy; this is also involved in the measurement of the cecal intubation rate (a key performance indicator) and its appropriate image-based documentation. AI-trained software can identify the landmarks of cecal anatomy, assisting the endoscopist in the certification of completeness of the procedure.

Reference standard

Experience-based certification of completeness of colonoscopy based on cecal landmark identification

Expected value

AI-assisted identification of cecal landmarks is comparable to reference standard

Recommendation

For acceptance of AI for certification of completeness of colonoscopy, AI-assisted certification should be comparable to that based on the identification of cecal landmarks by experienced endoscopists.

Agreement: > 95 %

Accurate identification of cecal landmarks is required for a certification of completeness of colonoscopy, including their photodocumentation. This should include possible anatomical variants, such as after appendicectomy or right-sided colectomy. Misinterpretation or incomplete visualization of such landmarks may result in an incorrect assertion of the completeness of the procedure, resulting in the risk of lesions in the nonvisualized part of the tract being missed.

This indicator should not be confused with the required rate of completeness of colonoscopy; it merely deals with how correct the assertion of completeness. is.

By identifying the main landmarks of the cecum, the software can assist the endoscopist in an appropriate characterization of the completeness of the exam. An additional benefit may be that an automatic assessment of withdrawal time also requires a correct confirmation of cecal intubation.

Computer-aided identification of the cecal landmarks has been shown to be feasible with AI systems based on supervised deep learning. The task specifically consists of the AI-based identification of the ileocecal valve/appendiceal orifice, based on one or more frames in real-time colonoscopy. Computer-aided identification has been shown to correctly identify the cecum with high accuracy. In detail, one computer-aided system was trained to identify the cecum (and automatically record insertion and withdrawal time after cecal intubation); tested in real time it had an overall accuracy of 95 % [82]. It is expected that the reference standard would be defined by a consensus among multiple experienced raters, as this is an augmented standard of practice as compared with what happens in the clinical setting.


#

6.3 Mucosal visualization in colonoscopy

Table 14

Mucosal visualization in colonoscopy.

Technique

Colonoscopy

Domain

Completeness of procedure

AI task

Completeness of mucosal exposure

Description

Real-time AI-assisted endoscopist assessment of completeness of mucosal visualization

Performance measure

Assessment of the completeness of mucosal visualization

Rationale

Failure to expose the entire surface of the colorectal mucosa because of its folds and angulations is one of the two main pitfalls leading to the missing of colorectal neoplasia, the other being recognition failure. Incomplete mucosal exposure is only partially compensated for by the wide angle of view and the tip maneuverability of the latest generations of scopes, and the problem has been addressed using a number of add-on devices, with varying efficacy. AI-assisted warning of uninspected areas and scope-slipping has been shown to increase mucosal exposure. In the future, AI could also quantify mucosal exposure as a percentage of the total area, and certify its completeness.

Reference standard

Assessment of completeness of mucosal visualization by experienced endoscopists with standard scores

Expected value

AI-assisted visualization of colorectal mucosa is comparable to completeness of procedure as defined by the reference standard

Recommendation

For acceptance of AI in evaluating the completeness of mucosal visualization, AI-assisted assessment should be comparable to that of experienced endoscopists.

Agreement: 100 %

Failure to expose the entire surface of the colorectal mucosa due to its folds and angulations is one of the two main pitfalls leading to colorectal neoplasia being missed, the other being recognition failure. Mucosal exposure is only partially compensated by the wide-angle view and tip-maneuverability of the latest scope generations, and the problem has been addressed by means of a number of add-on devices, with varying efficacy. Extent of exposure is also likely to be affected by the withdrawal time, itself in turn linked to the risk of post-colonoscopy colorectal cancer. The visualized mucosa percentage, evaluated using a validated score has also been correlated with ADR [83].

AI-assisted warning of uninspected areas and scope-slipping has been shown to increase the extent of mucosal exposure. In the future, AI might also quantify the percentage of the total mucosa that has been exposed and certify the completeness of exposure. The most reliable reference standard is still the assessment of the quality of mucosal exposure by one or more experienced endoscopists using semiquantitative scales. In the future, it is plausible that AI assessment may become a more robust reference standard than subjective human assessment, and may replace it.


#

6.4 Detection of colorectal neoplasia

Table 15

Detection of colorectal neoplasia.

Technique

Colonoscopy

Domain

Identification of pathology

AI task

Detection of colorectal lesions

Description

Real-time AI-assisted detection and localization of colorectal lesions

Performance measure

Adenoma detection rate (ADR)

Rationale

Failure to identify colorectal neoplasia on the endoscopic screen is a plausible reason for low ADR that in turn has been associated with a higher risk of post-colonoscopy colorectal cancer (CRC). By flagging areas with suspected lesions, AI may increase the ADR of individual endoscopists to above the recommended levels.

Reference standard

ADR (proportion of patients with pathologically verified adenomatous lesions) of experienced endoscopists, considering the specific clinical scenario

Expected value

AI-assisted detection of colorectal polyps is comparable to reference standard

Recommendation

For acceptance of AI in the detection of colorectal polyps, the AI-assisted adenoma detection rate should be comparable to that of experienced endoscopists.

Agreement: > 95 %

The importance of this task is underpinned by the evidence that failure to identify neoplastic lesions that are present on the endoscopic screen accounts for a substantial proportion of the miss rate for colorectal neoplasia. This recognition failure is likely to be associated with factors related to the endoscopist (fatigue, distraction, or suboptimal training), or to the lesion (subtle appearance).

CADe may be expected to reduce this miss rate as the system alerts the endoscopist when a suspicious lesion is present and shows its location, so that the endoscopist needs only to accept or reject the presented lesion. By reducing the miss rate due to recognition failure, CADe may be expected to improve the detection of colorectal neoplasia by each individual endoscopist. The ADR has been identified as the key performance measure in the domain of identification of pathology, in other words, of the level of inspection of the mucosa, with a minimum target of 25 %, as this threshold has been robustly related to a low incidence of post-colonoscopy colorectal cancer [84] [85] [86]. Unfortunately, ADR consistently shows a high variability among endoscopists, with a non-negligible proportion scoring below the recommended threshold.

CADe of colorectal lesions in real-time colonoscopy has been shown to be feasible with AI systems primarily based on supervised deep learning. The task specifically consists of the AI-based identification of one or more suspected lesions in the endoscopy image that is, in real time, flagged (detection) and localized (segmentation) on the same or a different screen. Different CADe systems have been tested in artificial settings against human “ground truth” data, showing a sensitivity of up to 99.7 % for detection of colorectal polyps [87] [88] [89]. False-positive rates of 2.4 (SD 1.2) per minute of colonoscope withdrawal time have been found, that resulted in a mean increase of withdrawal time of 16 seconds [90]. Furthermore, several CADe systems have been tested in real-life randomized controlled trials with real-time colonoscopy [6] [82] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100]. Meta-analyses showed a significant mean increase in ADR (36.6 % vs. 25.2 %; risk ratio [RR] 1.44, 95 %CI 1.27–1.62), that was consistent across adenomas of all dimensions [101] [102]. In addition, CADe efficacy was shown across all endoscopist levels of expertise [100]. More recently, tandem RCTs have also analyzed the impact of CADe systems in decreasing the adenoma miss rate, showing a reduction in the miss rate of up to 50 % [96] [97] [98] [99].

We preferred to select the ADR of experienced endoscopists as a reference standard rather than an ADR cutoff value as the task of AI is to lift the less experienced to the level of experienced endoscopists rather than to further increase the high ADR values of the high-detector experts.


#

6.5 False positives

Table 16

False positives in the detection of colorectal lesions.

Technique

Colonoscopy

Domain

Identification of pathology

AI task

Detection of colorectal lesions: false-positive rate

Description

Real-time flagging of false-positive areas

Performance measure

False-positive rate

Rationale

A possible drawback of CADe is the potentially large number of false-positive results. The endoscopist might spend an excessive amount of time in discarding a false-positive. Furthermore, it is possible that a false-positive alert may result in unnecessary polypectomy with related avoidable adverse events.

Reference standard

Mean withdrawal time without AI

Expected value

Clinically relevant false-positive rates do not significantly prolong withdrawal time

Recommendation

For acceptance of AI in the detection of colorectal polyps, AI-assisted detection should have an acceptable false-positive rate that does not significantly prolong withdrawal time.

Agreement: 87 %

A possible drawback of CADe is the potentially large number of false-positive alerts due to suboptimal specificity. Two main causes for false-positive activations have been proposed, namely artifacts from the bowel wall and artifacts from bowel content. It has been shown that most of these alerts arise from artifacts of the bowel wall rather than from luminal material [90]. A recent study also compared two different CADe systems with regard to false-positive activations, showing that the use of a standardized nomenclature provided comparable results with the two systems [103]. The main impact of false positives is the additional time needed to discard them. In theory, it is possible that a false-positive alert may result in unnecessary polypectomy with possible adverse events. On the other hand, it could be argued that most false-positive alerts may be promptly discarded by an experienced endoscopist, and so only marginally prolong the inspection time. Thus, when choosing between the risks of false-negative and false-positive results, the former is more relevant and should be preferentially decreased.

At this stage, we selected duration of clean withdrawal time as the reference standard assuming a direct relationship between the lack of specificity and the additional time spent by the endoscopists to analyze the false positives. However, we cannot exclude that in future other comparators such as the rate of non-neoplastic resections may be considered.


#

6.6 Optical diagnosis of polyps ≤ 5 mm

Table 17

Optical diagnosis of polyps ≤ 5 mm in size.

Technique

Colonoscopy

Domain

Management of pathology

AI task

Characterization (computer-aided diagnosis [CADx]) of diminutive (≤ 5 mm) colorectal polyps

Description

Real-time AI-assisted optical diagnosis of diminutive colorectal polyps

Performance measure

PIVI/SODA criteria for leave-in-situ and resect-and-discard strategies

Rationale

Diminutive ( ≤ 5 mm) colorectal polyps with a negligible risk of harboring invasive neoplasia constitute up to 60 % of all colorectal polyps. Current management is to resect them all and submit them to histology, with a high burden and cost. Highly accurate optical diagnosis of diminutive polyps should permit a resect-and-discard strategy for diminutive adenomas and a diagnose-and-leave strategy for diminutive rectosigmoid hyperplastic polyps, resulting in a substantial reduction of related burdens. However, optical diagnosis needs competence training and maintenance, and is seldom implemented outside expert centers. AI-assisted diminutive polyp characterization with high accuracy could expand the uptake of optical diagnosis, increasing the cost–effectiveness of colonoscopy.

Reference standard

Minimum performance standards for the implementation of resect-and-discard and diagnose-and-leave strategies for diminutive polyps

Expected value

AI-assisted characterization of diminutive polyps comparable to reference standard

PIVI, Preservation and Incorporation of Valuable Endoscopic Innovations; SODA, Simple Optical Diagnosis Accuracy.

Recommendation

For acceptance of AI optical diagnosis (computer-aided diagnosis [CADx]) of diminutive polyps (≤ 5 mm), AI-assisted characterization should match performance standards for implementing resect-and-discard and diagnose-and-leave strategies.

Agreement: 100 %

Diminutive (≤ 5 mm) colorectal polyps constitute the vast majority of identified neoplasia and, despite a negligible invasive neoplasia risk [104], are currently all sent for histopathological examination, with a high burden in terms of resection time and costs, pathological examination handling and costs, and the environmental impact of required materials [105] [106] [107] [108]. The implementation of cost-saving strategies based on optical diagnosis of polyps, namely the “leave-in-situ” (“diagnose-and-leave”) strategy for diminutive rectosigmoid hyperplastic lesions and the “resect-and-discard” strategy for diminutive colorectal adenomas, could result in huge reduction of this burden [109] [110]. However, outside referral centers the uptake has been poor [111]. Computer-aided characterization of colorectal neoplasia could standardize optical diagnosis performance and widen the uptake of cost-saving strategies.

The PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) criteria proposed in 2011 by the American Society for Gastrointestinal Endoscopy [112], require an NPV of > 90 % for diminutive hyperplastic rectosigmoid lesions to implement the leave-in-situ strategy. For implementation of a resect-and-discard strategy for all diminutive colorectal adenomas, the PIVI requires a > 90 % agreement in assignment of post-polypectomy surveillance intervals (according to established guidelines), with the schedule indicated by optical diagnosis for diminutive polyps or with the schedule indicated by histology for larger polyps (≥ 6 mm). ESGE recently published a Position Statement on the criteria required for implementation of optical diagnosis in clinical practice, which also addressed the need to set criteria for AI-assisted optical diagnosis [113]. Briefly, this was based on a simulation approach, in which a virtual endoscopist or artificial intelligence system performed optical diagnosis of diminutive polyps, with given diagnostic performance levels, on two existing cohorts of patients who had undergone colonoscopy following either a positive primary screening colonoscopy or positive fecal immunochemical test. Finally, the ESGE panel concluded that an 80 % sensitivity and 80 % specificity would be required to implement the resect-and-discard strategy, and 90 % sensitivity and 80 % specificity for the leave-in-situ approach.

Only four real-time in vivo clinical trials are available that test CADx systems. Two trials [114] [115] from the same Japanese group, using a CADx system and employing endocytomicroscopy and NBI, showed conflicting results, the first concluding that the system could meet required thresholds, while the second, in a multicenter setting with less experienced endoscopists, failed to reach acceptable levels. Another single-center study [116] involving 162 patients with 544 polyps showed that an approved CADx system exceeded PIVI thresholds for implementation in clinical practice, while a multicenter experience [117] with another approved CADx system fell short of required thresholds.


#

6.7 Characterization of polyps ≥ 6 mm

Table 18

Characterization of polyps ≥ 6 mm.

Technique

Colonoscopy

Domain

Management of pathology

AI task

Management of colorectal neoplasia ≥ 6 mm: selection of lesions amenable to endoscopic resection

Description

Real-time AI-assisted estimation of deep submucosal invasion of polyps ≥ 6 mm

Performance measure

Endoscopic prediction of submucosal invasion

Rationale

Estimation of the depth of invasion is crucial for risk stratification and planning of therapy, and it is usually performed with advanced imaging techniques. Certain characteristics are highly predictive of a high risk of lymph node metastasis and prompt an immediate surgical referral. However, many early lesions can be treated with curative intent by endoscopic resection performed by experienced endoscopists. Primarily, a false-positive diagnosis (leading directly to surgical resection) is to be avoided as it would result in surgical overtreatment.

Reference standard

Performance by experienced endoscopists in selection of lesions that are amenable to endoscopic resection

Expected value

AI-assisted endoscopist referral for endoscopic resection of colorectal lesions ≥ 6 mm is comparable to reference standard

Recommendation

For acceptance of AI in the management of polyps ≥ 6 mm, AI-assisted characterization should be comparable to that of experienced endoscopists in selecting lesions amenable to endoscopic resection.

Estimation of the depth of invasion is crucial for risk stratification and planning of therapy, and it is usually performed with advanced imaging techniques. Certain characteristics [118] are highly predictive of a high risk of lymph node metastasis and prompt an immediate surgical referral. However, many early lesions can be treated with curative intent by endoscopic resection performed by experienced endoscopists. Primarily, a false-positive diagnosis is to be avoided as it would result in a surgical overtreatment.

This statement merges two earlier statements on 6–19 mm and ≥ 20 mm polyps, as per the other organs for consistency. However, it is likely that it will mainly be applied to large lesions. As with the other organs, this statement is mainly intended to underpin AI assistance in standardizing referral to an expert center for advanced resection, namely, EMR or ESD. At many community centers LGD and HGD polyps are still sent to surgery and primarily, this is the outcome to be avoided.


#
#

7 Conclusions

According to ESGE, the main benefit to be expected from the implementation of AI in the clinical setting is a standardization of detection and optical diagnosis of endoscopic lesions and conditions that will assure a uniformly high quality standard for diagnosis and treatment of GI neoplasia. Regarding detection of disease, we stressed that the target of AI should not be to enhance the performance of experienced endoscopists with already high detection rates, but to bring up to their level the performance of less experienced endoscopists such as those in the community setting. This is especially relevant given the still suboptimal implementation of quality assurance programs. Anecdotally, most endoscopists still fail to measure their own colonoscopy performance,, even though the high prevalence of disease would facilitate such a task. In addition, the very low prevalence of disease in the upper GI tract precludes the definition of clear real-life benchmarks, so that both training and quality improvement are largely suboptimal in the community setting. Thus, the expectations of AI are that it should instantly add an expert level of performance of to that of less experienced endoscopists, resulting in a dramatic and universal improvement in the detection rate for GI neoplasia: in other words AI will effectively reduce the miss rate for these subtle lesions.

Secondly, ESGE advises against unduly high expectations for AI regarding optical diagnosis. The task here of AI is not to be equivalent to pathology – the sole exception being cost-saving strategies for ≤ 5-mm colorectal polyps – but to reproduce the decision-making algorithm of experienced endoscopists for referring patients to endoscopic or surgical resection. The main target here is to avoid the overuse of surgery.

It could be argued that most research in the AI field is not aligned in these directions. For instance, randomized trials involving the same operators with and without CADe assistance, or direct comparison between CADx and pathology results do not address the value of AI as we define it. However, these studies are generating only preliminary evidence of the possible impact of AI on operator performance. We propose that the next step will be to compare the performance of less experienced endoscopists with AI assistance with that of experienced endoscopists, in order to ensure uniform quality standards for detection and characterization of GI neoplasia wherever AI has been successfully implemented.


#
#

Competing interests

O.F. Ahmad has received speaker fees from Olympus (March 2022). G. Antonelli has received a consultancy fee from Medtronic. P. Bhandari’s department has received research grants from NEC Japan (June 2019, ongoing) and Fujifilm Europe (June 2020, ongoing). J.J.G.H.M. Bergman has received support for AI-related research in endoscopy from Olympus and and Fuijfilm; he has carried out sponsored AI-related research into Barrett’s esophagus for which the rights have been transferred to Olympus. R. Bisschops has received consultancy and speaker’s fees from Fujifilm, Pentax, and Medtronic (2015, ongoing), and provided consultancy to CDx Diagnostics (2017–2019); his department has received research grants from Pentax and Fujifilm (2015, ongoing), and from Medtronic (2018, ongoing). E. Dekker has received honoraria for consultancy from Fujifilm, Olympus, GI Supply, and PAION; she has received speakerʼs fees from Norgine, Ipsen, PAION, and Fujifilm. M. Dinis-Ribeiro has provided consultancy to Medtronic and Roche (2021–2022); his department has received support (loan for research) from Fujifilm (2021–2022); he is Co-Editor-in-Chief of Endoscopy journal. R. Eliakim has provided consultancy on various occasions to Medtronic (from 2019). I. Gralnek has provided consultancy to and been on the advisory board of Motus GI; he has provided consultancy to Boston Scientific, Clexio Biosciences, Medtronic, and Symbionix; he has received research support from Astra Zeneca and CheckCap; all during the last 3 years, 2019 to present. R.J. Haidry has received an educational grant to support research from Cook Endoscopy (2015, ongoing); his department has received an educational grant to support research from Medtronic (2018, ongoing). C. Hassan has received consulting fees and/or research grants from Alphasigma, Fujifilm, Medtronic, Norgine, Olympus, and Pentax. R. Joverʼs department has received a research grant from Medtronic (2021–2022). M.F. Kaminski has provided consultancy to Olympus (2016, ongoing) and Erbe (2021, ongoing), and has lectured for Boston Scientific (2016, ongoing) and Recordati (2020, ongoing). H. Messmann’s department has received financial and/or research support from Apollo Endosurgery, Biogen, Boston Scientific, CDx Diagnostics, Cook Medical, CSL Behring, Dr. Falk Pharma, Endo Tools Therapeutics, Erbe, Fujifilm, Hitachi, Janssen-Cilag, Medwork, Norgine, Nutricia, Olympus, Ovesco Endoscopy, Satisfai, Servier Deutschland, and US Endoscopy (in the past 3 years). Y. Mori has provided consultancy to and delivered talks for Olympus (2018–2022); he has ownership interest in Cybernet System Corporation (2020–2022). A. Repici has received research grants and speakerʼs fees from Boston Scientific (2020–2022), Fujifilm (2019–2022), and Norgine (2020–2022); he is also on an advisory board for Fujifilm (2019–2022). E. Rondonotti has been an Expert group member and speaker for Fujifilm (January–December 2021), and provided consultancy to Medtronic (July 2021–July 2022); his department received a research grant from Fujifilm (January–December 2021). Y. Saitoʼs department is conducting joint research with the NEC Corporation in developing AI for colonoscopy (April 2016–March 2023); he or his department will hold joint patents with the NEC Corporation, and five patents are pending. P. Sharma is providing consultancy services to Bausch, Boston Scientific, CDx Labs, Covidien LP, Exact Sciences, Fujifilm Medical Systems, Lucid, Lumendi, Medtronic, Phathom, Olympus, Takeda, and Samsung Bioepis (ongoing); his department is receiving research support from Cosmo Pharmaceuticals, Covidien, Docbot, ERBE, Fujifilm, Ironwood Pharmaceuticals, Olympus, and Medtronic. C. Spada has provided consultancy to Medtronic (2018–2022) and AnX Robotics (2018–2022). M. Abdelrahim, M. Areia, I. Boskoski, D. Domagk, A. Ebigbo, T. Eelbode, M. Häfner, R. Kuvaev, D. Libânio, M. Palazzo, M. Rutter, P. Sinonquel, M. Spadaccini, and A. Veitch have no competing interests.

  * Helmut Messmann and Raf Bisschops, first authors, contributed equally to this manuscript.


 ** Although both ESGE and the Thieme Group adhere to a policy prohibiting publications by Russian authors in Endoscopy at this time, an exception has been made to include Dr. Kuvaev due to the fact that his significant contribution to this Position Statement was made before Russia’s invasion of Ukraine.


*** Cesare Hassan and Mario Dinis-Ribeiro, senior authors, contributed equally to this manuscript.



Corresponding author

Cesare Hassan, MD PhD
Humanitas University, Department of Biomedical Sciences
Via Rita Levi Montalcini 4
20072 Pieve Emanuele, Milan; IRCCS Humanitas Research Hospital, Rozzano
Milan
Italy   

Publication History

Article published online:
21 October 2022

© 2022. European Society of Gastrointestinal Endoscopy. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 Incorporation of performance measures into the use of artificial intelligence in gastrointestinal endoscopy. GI, gastrointestinal; BERN, Barrett’s esophagus-related neoplasia; PIVI, Preservation and Incorporation of Valuable Endoscopic Innovations; SODA, Simple Optical Diagnosis Accuracy.