CC BY-NC-ND 4.0 · Endoscopy 2022; 54(08): 780-784
DOI: 10.1055/a-1660-6500
Innovations and brief communications

Artificial intelligence versus expert endoscopists for diagnosis of gastric cancer in patients who have undergone upper gastrointestinal endoscopy

Ryota Niikura
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
2   Gastroenterological Endoscopy, Tokyo Medical University, Tokyo, Japan
,
Tomonori Aoki
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
,
Satoki Shichijo
3   Department of Gastrointestinal Oncology, Osaka International Cancer Institute, Osaka, Japan
,
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
,
Takuya Kawahara
4   Clinical Research Promotion Center, The University of Tokyo Hospital, Tokyo, Japan
,
Yusuke Kato
5   AI Medical Service Inc., Tokyo, Japan
,
Yoshihiro Hirata
6   Division of Advanced Genome Medicine, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
,
Yoku Hayakawa
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
,
Nobumi Suzuki
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
,
Masanori Ochi
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
,
Toshiaki Hirasawa
7   Department of Gastroenterology, Cancer Institute Hospital Ariake, Japanese Foundation for Cancer Research, Tokyo, Japan
,
Tomohiro Tada
5   AI Medical Service Inc., Tokyo, Japan
8   Department of Surgical Oncology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
9   Tada Tomohiro Institute of Gastroenterology and Proctology, Saitama, Japan
,
Takashi Kawai
2   Gastroenterological Endoscopy, Tokyo Medical University, Tokyo, Japan
,
Kazuhiko Koike
1   Department of Gastroenterology, Graduate School of Medicine, The University of Tokyo, Japan
› Author Affiliations
Supported by: P-CREATE by AMED 21448169

Trial Registration: ClinicalTrials.gov Registration number (trial ID): NCT04040374 Type of study: Retrospective
 

Abstract

Aims To compare endoscopy gastric cancer images diagnosis rate between artificial intelligence (AI) and expert endoscopists.

Patients and methods We used the retrospective data of 500 patients, including 100 with gastric cancer, matched 1:1 to diagnosis by AI or expert endoscopists. We retrospectively evaluated the noninferiority (prespecified margin 5 %) of the per-patient rate of gastric cancer diagnosis by AI and compared the per-image rate of gastric cancer diagnosis.

Results Gastric cancer was diagnosed in 49 of 49 patients (100 %) in the AI group and 48 of 51 patients (94.12 %) in the expert endoscopist group (difference 5.88, 95 % confidence interval: −0.58 to 12.3). The per-image rate of gastric cancer diagnosis was higher in the AI group (99.87 %, 747 /748 images) than in the expert endoscopist group (88.17 %, 693 /786 images) (difference 11.7 %).

Conclusions Noninferiority of the rate of gastric cancer diagnosis by AI was demonstrated but superiority was not demonstrated.


#

Introduction

Upper gastrointestinal endoscopy is the standard procedure for diagnosis of gastric cancer. However, gastric cancer may be diagnosed within a few years after endoscopy because of missed lesions. Artificial intelligence (AI)-aided methods are needed to reduce the rate of missed lesions by automatic detection of gastric cancer, which could reduce the mortality rate.

AI based on deep learning shows promise for gastric cancer surveillance. Use of convolutional neural networks (CNNs) for deep learning enables extraction of specific features from endoscopic images and endoscopic diagnosis. Twelve previous studies, including ours [1], have investigated the diagnosis of gastric cancer lesions using upper gastrointestinal endoscopy images [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]. The results were heterogeneous, but most models reached a sensitivity of over 80 %. However, these studies had technical limitations, including problems with patient-level comparison of the efficacy of gastric cancer diagnosis by AI and by expert endoscopists. In addition, to evaluate gastric cancer diagnosis it is important to reduce bias and the influence of confounding factors. For these reasons, we conducted a retrospective matching analysis to evaluate noninferiority of the detection rate of gastric cancer by AI compared with that of expert endoscopists. A STROBE checklist statement for items that should be included in reports of observational studies has been completed for this study (Table 1 s in the online-only supplementary material).


#

Methods

Patients

We retrospectively selected patients aged 20 years or over who had previously undergone upper gastrointestinal endoscopy at the University of Tokyo Hospital during 2018. All upper gastrointestinal endoscopies were performed using an electronic video endoscope (Olympus Medical Systems, Tokyo, Japan). Indications for endoscopy were gastric cancer surveillance or gastroesophageal symptoms. Biopsy specimens were obtained from gastric cancer lesions. Histological diagnosis of gastric cancer was performed and confirmed by experienced pathologists. The trial was approved by the institutional review board of the University of Tokyo Hospital. The study protocol and statistical analysis plan were published before initiation of the study.


#

Preparation of the endoscopic image dataset and AI algorithm

We collected 23 892 white-light upper gastrointestinal endoscopy images of 500 patients, including 985 invasive gastric cancer images from 51 patients and 549 early gastric cancer images from 49 patients confirmed histologically. Early gastric cancer was defined as T1a and invasive gastric cancer as T1b–T4 (Union for International Cancer Control tumor–node–metastasis classification, v. 8).

The images were collected and prepared in July 2019. The investigators (R.N. and T.A.) annotated gastric cancer lesions with their coordinates (X, Y) in the images; gold-standard bounding boxes were generated, and data concealment was carried out. The AI algorithm method termed the Single Shot MultiBox Detector was used [1].


#

Trial design and diagnosis

Patients were matched (1:1) to diagnosis by AI or expert endoscopists using a computer-based matching system. Stratified matching of early and invasive gastric cancer and Helicobacter pylori status was performed in accordance with the allocation sequence generated by the trial statistician at the University of Tokyo. H. pylori status was defined as positive, negative, or eradicated, based on the most recent serological, urea breath test, or stool antigen test results.

After matching, endoscopic image diagnosis was performed by both AI and expert endoscopists. The optimal diagnostic cut-off for AI diagnosis was taken from a prior report [1]. The AI reviewed endoscopy images and reported those in which gastric cancer was detected, together with the coordinates (X, Y) of the lesions. The expert endoscopists, two physicians with experience of more than 20 000 endoscopies, reviewed the endoscopy images of each patient for 5 minutes and reported endoscopic images in which gastric cancer was detected; they manually annotated the coordinates (X, Y) of the lesions in those images.


#

Outcomes

The main outcome was per-patient diagnosis of gastric cancer. Detection of gastric cancer by AI and expert endoscopists on even one gastric cancer endoscopic image was defined as diagnosis of gastric cancer. The definition of accuracy was the presence of overlap between the AI-drawn bounding boxes with a probability score threshold of 0.01 or greater, expert endoscopist-drawn bounding boxes, and the gold-standard boxes in gastric cancer endoscopic images. If the AI drew multiple bounding boxes in the same gastric cancer lesion, we used the bounding box with the highest probability score.

Other outcomes were per-patient diagnosis of invasive gastric cancer, per-patient diagnosis of early gastric cancer, per-image diagnosis of gastric cancer, and intersection over union (IOU) of gastric cancer. Per-image diagnosis of gastric cancer was evaluated as the number of images analyzed for diagnosis of gastric cancer. IOU was defined as the amount of overlap between the area of the predicted and the gold-standard bounding boxes; it ranged from 0 to 1 (see online-only supplementary material, Fig.1 s).


#

Statistical analysis

Data regarding the per-patient rate of gastric cancer diagnosis, per-patient rate of invasive gastric cancer diagnosis, per-patient rate of early gastric cancer diagnosis, and per-image rate of gastric cancer diagnosis were compared by χ2 test and risk difference assessment. IOU was compared by t-test and risk difference assessment. Analyses were performed using SAS software v. 9.4 (SAS Institute, Cary, North Carolina, USA).


#
#

Results

Baseline characteristics

Of the 500 patients who underwent a matching analysis, 249 were allocated to the AI diagnosis group and 251 to the expert endoscopist diagnosis group ( [Fig.1]). Patient demographics were similar between the groups ( [Table 1]).

Zoom Image
Fig. 1 Study flow diagram.
Table 1

Baseline patient characteristics (n = 500).

Variable

AI diagnosis, n  =  249

Expert endoscopist diagnosis, n  =  251

P value

Age, mean ± SD, years

72.2 ± 9.54

72.0 ± 9.55

0.629

Sex, male

137 (55.02)[1]

136 (54.18)

0.851

Endoscopic atrophy[2]

  • No atrophy

88 (35.34)

87 (34.66)

0.873

  • C-1

7 (2.81)

6 (2.39)

0.768

  • C-2

29 (11.65)

17 (6.77)

0.059

  • C-3

22 (8.84)

29 (11.55)

0.315

  • O-1

30 (12.05)

31 (12.35)

0.918

  • O-2

38 (15.26)

45 (17.93)

0.423

  • O-3

36 (14.35)

35 (14.05)

0.927

H. pylori status[3]

  • Negative

123 (49.40)

123 (49.00)

0.982

  • Positive

13 (4.82)

13 (5.18)

  • Eradicated

114 (45.78)

115 (45.82)

Number of patients with gastric cancer

49 (19.68)

51 (20.32)

0.858

Early gastric cancer

27 (10.84)

26 (10.36)

0.860

Invasive gastric cancer

22 (8.84)

25 (9.96)

0.667

Number of gastric cancer images/nongastric cancer images

748 /11 185 (6.27)

786 /11 173 (6.57)

0.338

Abbreviations: AI, artificial intelligence; SD, standard deviation.

1 Figures given in parentheses are percentages.


2 Endoscopic atrophy was evaluated according to the Kimura–Takemoto classification, which considers no atrophy to grade C3 atrophy as closed type and grades O1 to O3 as open type; no atrophy was the mildest and O3 was the most severe. Closed type was milder than open type.


3 H. pylori status was defined as: negative: H. pylori antibody, urea breath test (UBT), or H. pylori stool antigen test negative; positive: H. pylori antibody, UBT, or H. pylori stool antigen test positive; or eradicated: successful eradication confirmed by UBT or H. pylori stool antigen test after eradication therapy.



#

Outcomes

Gastric cancer was diagnosed in 49 of 49 patients (100 %) in the AI diagnosis group and 48 of 51 (94.12 %) in the expert endoscopist diagnosis group (difference 5.88, 95 % confidence interval [CI]: −0.58 to 12.3) ( [Table 2]). Invasive gastric cancer was diagnosed in 22 of 22 patients (100 %) in the AI diagnosis group and 25 of 25 patients (100 %) in the expert endoscopist diagnosis group. Early gastric cancer was diagnosed in 27 of 27 patients (100 %) in the AI diagnosis group and 23 of 26 patients (88.46 %) in the expert endoscopist diagnosis group (difference 11.54, 95 %CI –0.74 to 23.82; P = 0.069).

Table 2

Main outcome and other outcomes.

Outcome

AI diagnosis, 49 patients with gastric cancer with 748 images

Expert endoscopist diagnosis, 51 patients with gastric cancer with 786 images

Risk difference

[95 % confidence interval]

Main outcome

  • Per-patient rate of gastric cancer diagnosis

49/49 (100)[1]

48/51 (94.12)

5.88 [−0.58 to 12.3]

Other outcomes

P value

  • Per-patient rate of invasive gastric cancer diagnosis

22/22 (100)

25/25 (100)

Not applicable

Not applicable

  • Per-patient rate of early gastric cancer diagnosis

27/27 (100)

23/26 (88.46)

11.54 [−0.74 to 23.82]

0.069

  • Per-image rate of gastric cancer diagnosis

747/748 (99.87)

693/786 (88.17)

11.7 [9.43 to 13.97]

< 0.001

  • IOU of gastric cancer[*], mean ± SD

0.842 ± 0.246

0.972 ± 0.079

−0.13 [−0.15 to −0.11]

< 0.001

Abbreviations: AI, artificial intelligence; CNN, convolutional neural network; IOU, intersection over union; SD, standard deviation.

* IOU was evaluated as the area of overlap between the predicted bounding box and the gold-standard bounding box.


The per-image rate of gastric cancer diagnosis was significantly higher in the AI diagnosis group (747 of 748 images, 99.87 %) than in the expert endoscopist group (693 of 786 images, 88.17 %) (difference 11.7, 95 %CI 9.43 to 13.97; P < 0.001). The IOU of gastric cancer was significantly lower (0.842) in the AI diagnosis group than in the expert endoscopist diagnosis group (0.972) (difference −0.13, 95 %CI −0.15 to −0.11; P < 0.001) ( [Table 2], Table 2 s).


#
#

Discussion

The rate of gastric cancer detection by AI was not inferior to the rate of detection by expert endoscopists. To our knowledge, this study is the first to evaluate patient-level detection rates of early and invasive gastric cancer and to compare AI and expert endoscopists.

The detection rate of AI for gastric cancer was higher than the detection rate of expert endoscopists. We suggest two reasons for this result. First, the per-image rate of gastric cancer diagnosis in the AI diagnosis group was 13.1 % higher than the per-image rate of gastric cancer diagnosis in the expert endoscopist group. A previous study reported a per-image detection rate of gastric cancer of over 96 % [5]; our per-image rate of gastric cancer diagnosis was 99.87 % (747 of 748 images). As the number of images analyzed increased, the likelihood of identifying a cancer increased; this may explain the high detection rate of gastric cancer by AI. Alternatively, the high rate of gastric cancer detection in the AI diagnosis group may be due to the definition of the main outcome, per-patient diagnosis of gastric cancer, as “detected on at least one endoscopic image of gastric cancer.” This definition may favor AI diagnosis because AI could suggest many images that potentially include gastric cancer lesions. However, we consider our main outcome to be reasonable when using AI for gastric cancer screening examinations.

The IOU of gastric cancer was significantly lower in the AI diagnosis group (0.09) than in the expert endoscopist group, although the bounding boxes of gastric cancer detected in the AI diagnosis group did not affect the diagnosis of gastric cancer ( [Fig.2]). However, further studies are needed to improve the IOU of gastric cancer by our CNN-based AI diagnosis model.

Zoom Image
Fig. 2 Images of gastric cancer used for diagnostic purposes by the artificial intelligence (AI) diagnosis group. Green boxes, gold-standard bounding boxes; red boxes, AI-detected bounding boxes. Source: Keita Otani.

Our AI model showed a performance in the detection of gastric cancer similar to that of expert endoscopists, even in patients in whom H. pylori had been eradicated, who were difficult to evaluate on the basis of endoscopic images [12]. Furthermore, the model was suitable for evaluation of both early and invasive gastric cancers. The AI diagnosis model was developed using 13 584 images of 2639 gastric cancer lesions taken during eight types of endoscopies over a 12-year period [1]. Therefore, our CNN-based AI diagnosis model has potential for use in various patient populations.

This study was the first direct comparison between AI and expert endoscopists of per-patient diagnosis of gastric cancer. However, the study had limitations. First, the study was a single-center retrospective work and potentially affected by selection and confounding bias. Future prospective randomized controlled studies are required. Second, the environment in which images were diagnosed differed from that in which upper endoscopy was performed in practice; this may have compromised the diagnostic accuracy of the expert endoscopists.

In conclusion, we demonstrated noninferiority but not superiority of AI for gastric cancer diagnosis compared with expert endoscopists.


#
#

Competing interests

The authors declare that they have no conflict of interest.

Acknowledgments

We thank Keita Otani for assistance with creating  [Fig.2] and Fig.1 s.

Supplementary material


Corresponding author

Ryota Niikura, MD PhD
Gastroenterological Endoscopy
Tokyo Medical University
6-7-1 Nishishinjuku
Shinjuku-ku
Tokyo 1600023
Japan   

Publication History

Received: 30 August 2020

Accepted after revision: 12 October 2021

Accepted Manuscript online:
04 October 2021

Article published online:
04 May 2022

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 Study flow diagram.
Zoom Image
Fig. 2 Images of gastric cancer used for diagnostic purposes by the artificial intelligence (AI) diagnosis group. Green boxes, gold-standard bounding boxes; red boxes, AI-detected bounding boxes. Source: Keita Otani.