CC BY-NC-ND 4.0 · Endosc Int Open 2021; 09(09): E1361-E1370
DOI: 10.1055/a-1507-4980
Original article

A deep learning framework for autonomous detection and classification of Crohnʼs disease lesions in the small bowel and colon with capsule endoscopy

Tomáš Majtner
1  Applied Artificial Intelligence and Data Science, Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
,
Jacob Broder Brodersen
2  Department of Internal Medicine, Section of Gastroenterology, Hospital of South West Jutland, Esbjerg, Denmark
,
Jürgen Herp
1  Applied Artificial Intelligence and Data Science, Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
,
Jens Kjeldsen
3  Department of Medical Gastroenterology, Odense University Hospital, Odense, Denmark
,
Morten Lee Halling
2  Department of Internal Medicine, Section of Gastroenterology, Hospital of South West Jutland, Esbjerg, Denmark
,
Michael Dam Jensen
4  Department of Internal Medicine, Section of Gastroenterology, Lillebaelt Hospital, Vejle, Denmark
› Author Affiliations
 

Abstract

Background and study aims Small bowel ulcerations are efficiently detected with deep learning techniques, whereas the ability to diagnose Crohnʼs disease (CD) in the colon with it is unknown. This study examined the ability of a deep learning framework to detect CD lesions with pan-enteric capsule endoscopy (CE) and classify lesions of different severity.

Patients and methods CEs from patients with suspected or known CD were included in the analysis. Two experienced gastroenterologists classified anonymized images into normal mucosa, non-ulcerated inflammation, aphthous ulceration, ulcer, or fissure/extensive ulceration. An automated framework incorporating multiple ResNet-50 architectures was trained. To improve its robustness and ability to characterize lesions, image processing methods focused on texture enhancement were employed.

Results A total of 7744 images from 38 patients with CD were collected (small bowel 4972, colon 2772) of which 2748 contained at least one ulceration (small bowel 1857, colon 891). With a patient-dependent split of images for training, validation, and testing, ulcerations were diagnosed with a sensitivity, specificity, and diagnostic accuracy of 95.7 % (CI 93.4–97.4), 99.8 % (CI 99.2–100), and 98.4 % (CI 97.6–99.0), respectively. The diagnostic accuracy was 98.5 % (CI 97.5–99.2) for the small bowel and 98.1 % (CI 96.3–99.2) for the colon. Ulcerations of different severities were classified with substantial agreement (κ = 0.72).

Conclusions Our proposed framework is in excellent agreement with the clinical standard, and diagnostic accuracies are equally high for the small bowel and colon. Deep learning approaches have a great potential to help clinicians detect, localize, and determine the severity of CD with pan-enteric CE.


#

Introduction

Crohn's disease (CD) belongs to the group of chronic inflammatory bowel diseases [1]. Cardinal lesions are mucosal ulcerations ranging from small aphthous ulcerations to large ulcers and fissures. Typically, CD has a segmental distribution, and the entire gastrointestinal tract may be involved, although the disease is most often located in the terminal ileum and right colon (ileocecal CD) [2].

In recent years, technological advances have improved modalities for diagnosing and monitoring CD. Capsule endoscopy (CE) is non-invasive, patient-friendly, and highly sensitive for the earliest lesions of CD [3] [4]. In patients with suspected CD and a normal ileocolonoscopy, the European Society of Gastrointestinal Endoscopy (ESGE) and the European Crohnʼs and Colitis Organization (ECCO) recommends CE as first line modality for investigating the small bowel in patients without obstructive symptoms [5] [6]. Colon CE was introduced in 2006, and pan-enteric CE is now available allowing a direct and detailed evaluation of the entire gastrointestinal mucosa. However, the role of pan-enteric CE in patients with suspected or known CD remains to be established.

The camera pill captures more than 50,000 images of the gastrointestinal tract, and a significant limitation with CE is the time-consuming manual video analysis. In previous series, reading times above 40 to 50 minutes were reported for small bowel CE [3], and pan-enteric CE takes more than 60 minutes to interpret for CD. Hence, lesions may be missed due to readerʼs fatigue or distraction. Better ways to optimize the work of the GI specialist without affecting the diagnostic accuracy of CE would be helpful in clinical practice.

Utilizing artificial intelligence (AI) – especially deep learning techniques–has received great attention in recent years. Multiple clinical settings have been studied including the ability to analyze endoscopy images and aid clinical decision-making [7] [8] [9]. A recent meta-analysis showed a high sensitivity and specificity of deep learning techniques for ulcer detection in the small bowel [8], whereas the ability to diagnose CD in the colon is unknown. Results are promising, and AI could have a pivotal role in the future of non-invasive diagnosis of CD with pan-enteric CE. The aim of this study was to examine the ability of a deep learning framework to detect CD lesions in single images of the small bowel or colon captured with pan-enteric CE, determine the localization of lesions, and the ability to characterize lesions of different severity.


#

Patients and methods

Study design

Patients with suspected or known CD were recruited from three centers in the Region of Southern Denmark managing adult patients with inflammatory bowel diseases. All patients were prospectively enrolled in a clinical trial examining non-invasive modalities for diagnosing suspected CD (http://ClinicalTrials.gov Identifier NCT03134586) or assessing treatment response in patients with known CD (http://ClinicalTrials.gov Identifier NCT03435016).

CD was clinically suspected in patients with diarrhea and/or abdominal pain for more than 1 month (or repeated episodes of diarrhea and/or abdominal pain) associated with a fecal calprotectin > 50 mg/kg and at least one additional finding suggesting CD: elevated inflammatory markers, anemia, fever, weight loss, perianal abscess/fistula, a family history of inflammatory bowel disease, or suspicion of CD after sigmoidoscopy.

Patients with an established diagnosis of CD based on ECCO criteria [10] were included if they had clinical disease activity (Harvey-Bradshaw Index ≥ 5 or Crohnʼs Disease Activity Index ≥ 150), endoscopic activity (Simple Endoscopic Score for Crohnʼs disease ≥ 3), and a clinical indication for medical treatment with corticosteroids or biological therapy.

All patients had a standardized work-up including medical history, physical examination, blood and stool samples, ileocolonoscopy, pan-enteric CE, magnetic resonance imaging enterocolonography, and bowel ultrasound.


#

Capsule endoscopy procedure

Pan-enteric CE was performed with the PillCam Crohn's capsule (Medtronic, Dublin, Ireland) after overnight fasting and bowel preparation with 2 + 2 L of polyethylene glycol (Moviprep) as previously described by ESGE [11]. Videos were analyzed with the PillCam Software v9.


#

Image selection and classification

Images with a normal mucosa or CD lesions located in the small bowel or colon were manually searched and randomly collected by three gastroenterologists with experience in CE and inflammatory bowel diseases (M.D.J., J.B.B. and M.L.H.). Images were anonymized and assigned into one of the following 13 categories by authors M.D.J. and J.B.B.:

  • Small bowel: normal mucosa, normal mucosa with lymphoid hyperplasia, normal mucosa with bubbles and/or debris, non-ulcerated inflammation, aphthous ulceration, ulcer, fissure / large ulcer

  • Colon: normal mucosa, normal mucosa with bubbles and/or debris, non-ulcerated inflammation, aphthous ulceration, ulcer, fissure/large ulcer

The following definitions were used for image classification:

  • Normal: No or minimal luminal content (estimated < 1 mm in size) and mucosa without erythema, edema or mucosal breaks

  • Debris: Dark fluid or solid luminal content without surrounding erythema or mucosal break

  • Bubbles: Luminal pocket of air reflecting the flashlight

  • Aphthous ulceration: Small superficial mucosal break with surrounding erythema (estimated < 5 mm in size)

  • Ulcer: Mucosal break with loss of substance and fibrin

  • Fissure: Longitudinal ulcer

  • Large ulcer: Ulcer involving > 50 % of the lumen

  • Non-ulcerated inflammation: Erythema and edema without mucosal break

In case of disagreement, a consensus decision was reached. If more lesions were seen in the same image, the most severe lesion determined the overall classification. The visual illustration of image classification is in  [Fig.1].

Zoom Image
Fig. 1 Examples of image classification. a Normal colon. b Normal colon with debris. c Aphthous ulceration in the colon; d Ulcer in the colon. e Fissures in the colon; f Large ulcer in the terminal ileum.

#

Image processing

After manual classification of all collected images, the images needed to be preprocessed in order to effectively train the deep learning algorithms. Since the original images contain text information near the corners, the Chan-Vese segmentation via graph cuts was used to extract the binary mask and to segment the relevant information from the image [12].

Four different algorithms were employed for textural improvements:

Contrast increase: The original image is split into three individual intensity channels (red, green, and blue channels, RGB image). In each channel, the adjustment was applied, where the bottom 1 % and the top 1 % of all pixel values were saturated. At the end, grayscale channels were merged back to form an enhanced RGB image.

Histogram equalization: The input RGB image was converted to a different color space that describes colors similarly to how the human eye tends to perceive them. In this color space, hue (H) channel specifies the base color, saturation (S) channel represents the vibrancy of the color and captures the amount of gray in a particular color, and value (V) channel in conjunction with S channel describes the intensity or brightness of the color. In the next step, a contrast-limited adaptive histogram equalization (CLAHE) was applied on the V channel [13]. This non-parametric equalization operates on small regions in the image and computes histograms corresponding to distinct sections of the image. Subsequently, it uses them to redistribute the lightness values of the image. In the last step, the enhanced HSV image was converted back to RGB color space.

Gradient X: The original RGB image was converted to a single grayscale image by forming a weighted sum of the red, green, and blue component. In the next step, the X-directional gradient of the grayscale image was extracted. The resulting image was then copied into three color channels of RGB to form the output image.

Dehazing: First, the complement image of the original RGB input was computed and dehazing algorithm that relies on a dark channel prior was used. The algorithm was originally designed to reduce the atmospheric haze, and it is based on the observation that unhazy images contain pixels that have low signal in color channels, which is also our case in CE images. At the end, the complement image was derived again and used as an enhanced output image.

For better illustration of all four described methods, their visual outcomes on three random samples are provided in  [Fig.2]. All possible subsets of these image transformations were evaluated, but the highest performance was achieved when all variants (original image plus four new variants) were considered together. It demonstrates that every single transformation adds to the robustness of our proposed system.

Zoom Image
Fig. 2 Illustration of applied texture enhancement methods on three random samples from our collected data. The five columns correspond to original images, contrast increase images, histogram equalization images, gradient x images, and dehazing images, respectively.

Using the image processing steps previously described, five separate datasets were created; one for original images and the remaining four for enhanced images. For each dataset, a separate deep learning model was trained and at the end, the results were merged to a single classification output. Configuration of all five models, however, was the same. The only difference between them was that they were trained either on the original set of images or on a set of images with a specific texture enhancement method.


#

Splits for training and validation

Two separate data splits were performed to evaluate our automated framework:

  • Random split: In this split, 70 % of all input images from each category were randomly chosen for training, 10 % for validation, and remaining 20 % were used as an independent test set.

  • Patient split: In this split, all images from a single patient were used either for training or for testing. The ratio between training, validation, and testing samples was as close as possible to the previous split.

Because the size of the training dataset was not sufficient for a deep learning algorithm, a data augmentation was employed. In the training part of the dataset, each image was rotated by 90°, 180°, and 270°. Together with the mirroring operation applied on original and rotated samples, the augmentation step resulted in seven new training samples that were derived from each original image.

The training process was done using a fine-tuning approach that is known as transfer learning [14]. The stochastic gradient descent was utilized with the momentum optimizer and the initial learning rate of 0.001. The mini-batch size of 8 images was used. Each model was trained for 30 epochs. In this work, only results obtained on the independent test set that was not used during the training process are reported.

ResNet-50 architecture pre-trained on ImageNet was employed [15]. The last three layers were removed and replaced with new fully-connected layer, softmax layer, and classification layer. The last layer was set to classify images directly to our 13 desired categories. All images were resized to an appropriate input size using the bicubic interpolation, and all tests were performed using MATLAB R2019b.

After training, all five models were evaluated on test images. For each test image variant, softmax probabilities for each output class were extracted. At the end, corresponding probabilities were multiplied (five values for each output class, one from each model) and assigned the test image to the output class with the highest value. Illustration of this approach is provided in  [Fig.3].

Zoom Image
Fig. 3 Illustration of the classification process.

#

Statistics

Manual classification of images served as reference standard. The sensitivity, specificity, and diagnostic accuracy of our automated framework for detection of CD lesions was calculated from 2 × 2 contingency tables with 95 % confidence intervals (CI). For the overall evaluation of sensitivity and specificity, a lesion was considered true positive if it was detected in accordance with manual reading irrespective of the localization (i. e. an ulceration in the small bowel classified as an ulceration in the colon is a true positive for detection of ulceration overall). Agreement between the gold standard and our automated framework for lesion classification was assessed with kappa statistics. Kappa values were interpreted the following way: absence of agreement 0, slight agreement < 0.20, fair agreement 0.21 to 0.40, moderate agreement 0.41 to 0.60, substantial agreement 0.61 to 0.80, and almost perfect agreement > 0.81 as proposed by Landis and Koch [16].


#

Ethics

The above-mentioned studies were approved by the Local Ethics Committee of Southern Denmark (S-20150189 and S-20170188) and the Danish Data Protection Agency (journal number 16/10457 and 18/11210). All patients gave informed consent before participation including permission to use anonymized CE videos for additional analysis.


#
#

Results

A total of 38 patients were included in the study of which 33 patients were examined for clinically suspected CD and five patients had an established diagnosis of CD. After ileocolonoscopy with biopsies, pan-enteric CE and MR-enterocolonography, 31 patients were diagnosed with active CD. Ulcerations were located in the small bowel, colon, and small bowel plus colon in 12, 10 and 9 patients, respectively.

Overall, 7744 anonymized image frames (small bowel 4972, colon 2772) were manually collected and annotated. 2748 of them contained at least one ulceration (small bowel 1857, colon 891). A total of 408 images showed non-ulcerative inflammation in patients with concomitant lesions consistent with CD or an established diagnosis of CD. The number of images and specific lesions used for training, validation, and testing in both splits are shown in  [Table 1].

Table 1

Number of images used for training, validation, and testing in both considered splits.

Random split

Patient split

Total

Training (after augmentation)

Validation

Testing

Training (after augmentation.)

Validation

Testing

Small bowel

Normal

712

(5,696)

101

204

714

(5,712)

101

202

1017

Normal with bubbles/debris

1415

(11,320)

202

406

1,429

(11,432)

202

392

2023

Lymphoid hyperplasia

32

(256)

4

10

32

(256)

4

10

46

Non-ulcerated inflammation

21

(168)

2

6

22

(176)

2

5

29

Aphthous ulceration

514

(4,112)

73

148

538

(4,304)

73

124

735

Ulcer

504

(4,032)

72

144

520

(4,160)

72

128

720

Fissure/large ulcer

280

(2,240)

40

82

285

(2,280)

40

77

402

Colon

Normal

150

(1,200)

21

44

154

(1,232)

21

40

215

Normal with bubbles/debris

901

(7,208)

128

258

916

(7,328)

128

243

1287

Non-ulcerated inflammation

266

(2,128)

37

76

270

(2,160)

37

72

379

Aphthous ulceration

184

(1,472)

26

54

193

(1,544)

26

45

264

Ulcer

237

(1,896)

33

68

254

(2,032)

33

51

338

Fissure/large ulcer

203

(1,624)

28

58

219

(1,752)

28

42

289

Total

5,419

(43,352)

767

1,558

5,546

(44,368)

767

1,431

7744

Lesion classification

Our automated framework was evaluated on three different levels. The first level is a multiclass classification, where 13 classes used in the training process were considered. For the patient split, the algorithm was tested on 1431 image frames ( [Table 2]). The agreement between the automated framework and manual reading was substantial (κ = 0.74). Using a random split of patients for training, validation, and testing, an almost perfect agreement was achieved on 1558 images (κ = 0.89).

Table 2

Algorithm testing on 1431 images with a patient split used for training.

Deep learning framework

Colon_aphtae

_debris

_fissure

_non-ulc. Inflamm.

_normal

_ulcer

SB_aphtae

_fissure

_non-ulc. Inflamm.

_normal

_debris

_lymph. hyp.

_ulcer

Total

Gold standard

Colon_aphtae

34

2

0

3

0

3

3

0

0

0

0

0

0

45

_debris

0

223

0

0

15

0

0

0

0

0

5

0

0

243

_fissure

6

1

0

6

0

20

0

9

0

0

0

0

0

42

_non-ulc. Inflamm.

21

2

1

39

1

2

1

0

4

0

1

0

0

72

_normal

0

6

0

0

34

0

0

0

0

0

0

0

0

40

_ulcer

9

1

0

8

0

27

0

5

0

0

0

0

1

51

SB_aphtae

0

0

0

0

0

0

109

0

4

0

0

0

11

124

_fissure

0

0

1

0

0

5

3

50

1

0

3

0

14

77

_non-ulc. Inflamm.

0

0

0

1

0

0

2

2

0

0

0

0

0

5

_normal

0

0

0

0

0

0

0

0

0

186

16

0

0

202

_debris

0

0

0

0

0

0

0

0

0

17

374

0

1

392

_lymph. hyp.

0

1

0

0

0

0

0

0

0

0

1

8

0

10

_ulcer

1

0

0

1

0

8

46

39

0

0

4

0

29

128

Total

71

236

2

58

50

65

164

105

9

203

404

8

56

1431

The matrix displays the number of images according to their classification with the gold standard and deep learning framework depending on the location. Lesions were assigned to one of 13 predefined categories. The inter-modality agreement was substantial (κ = 0.74).

The framework was able to firmly distinguish between the small bowel and colon. For the random split, only four of 558 images of the colon were misclassified as the small bowel (29 of 493 for the patient split), and only seven of 1000 images of the small bowel were misclassified as the colon (18 of 938 for the patient split).


#

Diagnostic accuracy

The second set of tests was focused on the diagnostic accuracy for the detection of CD. For the patient split, the automated framework detected ulcerations consistent with CD with a sensitivity, specificity, and diagnostic accuracy of 95.7 % (CI 93.4–97.4), 99.8 % (CI 99.2–100), and 98.4 % (CI 97.6–99.0), respectively ( [Table 3]). The diagnostic accuracy was similar for lesions located in the small bowel and colon – 98.5 % (CI 97.5–99.2) and 98.1 % (CI 96.3–99.2), respectively. For detection of CD including non-ulcerated inflammation, the sensitivity, specificity, and diagnostic accuracy was 96.1 % (CI 94.2–97.6), 99.9 % (CI 99.4–100), and 98.5 % (CI 97.7–99.0), respectively.

Table 3

Diagnostic accuracy, sensitivity and specificity for detection of ulcerations in the small bowel and colon in patients with suspected or known Crohnʼs disease.

TP

TN

FP

FN

Accuracy (%)

95 %CI

Sensitivity (%)

95 %CI

Specificity (%)

95 %CI

Patient split

  • Small bowel

317

602

2

12

98.50

97.50–99.18

96.35

93.72–98.10

99.67

98.81–99.96

  • Colon

130

283

0

8

98.10

96.29–99.18

94.20

88.90–97.46

100

98.70–100

  • Overall

447

885

2

20

98.38

97.55–98.98

95.72

93.46–97.36

99.77

99.19–99.97

Random split

  • Small bowel

359

620

0

15

98.49

97.52–99.15

95.99

93.47–97.74

100

99.41–100

  • Colon

174

302

0

6

98.76

97.31–99.54

96.67

92.89–98.77

100

98.79–100

  • Overall

533

922

0

21

98.58

97.83–99.12

96.21

94.26–97.64

100

99.60–100

Data are shown for two different split of images used training, validation and testing of the deep learning framework.

TP, true positive; TN, true negative; FP, false positive; FN, false negative.

For the random split, the sensitivity, specificity, and diagnostic accuracy was 96.2 % (CI 94.3–97.6), 100 % (CI 99.6–100), and 98.6 % (CI 97.8–99.1), respectively, with similar results for lesions located in the small bowel and colon ( [Table 3]). Ulcerations plus non-ulcerated inflammation was detected with a sensitivity 97.2 % (CI 96.6–98.3), specificity 100 % (CI 99.6–100) and diagnostic accuracy 98.8 % (CI 98.2–99.3).


#

Severity of Crohnʼs lesions

Images were grouped according to the type of lesion irrespective of their location in the small bowel or colon, and the ability of the automated framework to determine the severity of lesions was compared with manual reading. For the patient split, normal mucosa, aphthous ulcerations, ulcers and fissures/ large ulcers were classified with substantial agreement (κ = 0.72,  [Table 4]). Using a random split for training and testing, the agreement was almost perfect (κ = 0.90).

Table 4

Classification of images with the gold standard and deep learning framework according to the severity of ulcerations regardless their localization.

Deep learning framework

Normal

Aphthae

Ulcer

Fissure/large ulcer

Total

Gold standard

Normal

885

0

1

1

887

Aphtae

9

146

14

0

169

Ulcer

14

56

65

44

179

Fissure/large ulcer

11

9

39

60

119

Total

919

211

119

105

1354

Data are shown for the patient split used for training. The inter-modality agreement for severity of lesions is substantial (κ = 0.72).


#
#

Discussion

CE is patient-friendly and non-invasive, and, compared to cross-sectional imaging, highly sensitive for the earliest lesions of CD [4] [17]. Additional information obtained with CE about the proximal distribution of CD affects the prognosis and medical treatment [1] [18] [19]. Hence, CE is the preferred method for examining the small intestine in patients with suspected CD without obstructive symptoms [5] [6]. With the Crohnʼs capsule, pan-enteric evaluation in one procedure is now feasible. Although the role of pan-enteric CE in CD is not yet established, it could play a major role in a future algorithm for noninvasive diagnosis and monitoring of CD.

The risk of capsule retention, required bowel preparation, and time consumption used for video analysis are important limitations for the clinical use of pan-enteric CE. Our study addresses the use of deep learning algorithms for optimizing the video analysis. At present time, there are no evidence-based recommendations regarding the optimal reading protocol for analyzing CE recordings [20]. With the existing software, reading times can be reduced by increasing the frame rate or the number of images seen simultaneously, or by using a quick view function (i. e. only a fraction of images is shown). Increasing the speed, however, results in lower detection rates [21]. Although missed lesions is undesirable, these techniques may be justified in patients with diffuse involvement of the gastrointestinal tract, e. g. CD. Deep learning algorithms are attractive because of their potential for fast video analysis while maintaining a high diagnostic accuracy.

Previous studies in this field were focused on the small bowel. In a retrospective study by Aoki et al. including 5800 images of erosions and ulcers, and 10,000 normal images, lesions were detected with a 90.8 % diagnostic accuracy and an AUC of 0.958 [22]. Interestingly, the degree of obscuration due to bubbles, debris, and bile reduced the sensitivity, regardless of the lesion size. The false negative rate was 19.4 % and 8.5 % in patients with major and minor obscuration, respectively (P = 0.001). Klang et al. developed a deep learning algorithm for the automated detection of small bowel ulcers in patients with CD [23]. With 7391 images of ulcerations and 10,249 images of normal mucosa, a diagnostic accuracy of 96.7 % was achieved. The algorithm required a median of less than 3.5 minutes to analyze a complete small bowel CE. A recent meta-analysis on this topic showed sensitivity and specificity of 95 % (CI 89–98) and 94 % (CI 90–96), respectively for ulcer detection in the small bowel with deep learning algorithms [8].

The largest retrospective study performed so far was not included in the meta-analysis, however. Ding et al. collected 113,426,569 images from 6970 patients examined with small bowel CE performed on various indications [24]. In this extensive multi-center analysis, automated CE analysis achieved a per lesion sensitivity of 98.1 % (CI 96.0–99.2) and a specificity of 100 % (99.9–100) for detection of ulcers. Inflammation was diagnosed with a 93,9 % sensitivity (CI 92.6–94.9). The deep learning algorithm identified abnormalities with a higher sensitivity and significantly shorter reading times compared to manual analysis (5.9 ± 2.2 minutes vs. 96.6 ± 22.5 minutes, P < 0.001).

To the best of our knowledge, this is the first study to examine the use of deep learning for detection of CD in the both the small bowel and colon. It is also the first study to apply texture enhancement methods for capsule endoscopy images. In 7744 images collected from patients with clinically suspected or known CD, our automated framework diagnosed ulcerations with an almost perfect sensitivity and specificity (> 95 %) compared to manual analysis by two gastrointestinal experts. In our test set of 1558 images, only four colon images (about 0.26 % of all test samples) and seven small bowel images (about 0.45 % of all test samples) were misclassified. Typical reasons include abnormalities in inputs caused by some rare artifacts.

It should be emphasized, that we did not use a grading scale to evaluate the bowel cleansing and image quality in each frame although non-diagnostic CEs were excluded from the analysis (i. e. large amount of debris precluding a complete examination). Instead, we randomly selected images from patients examined for CD and classified them according to the type of lesion and presence of debris or bubbles. Our aim was create an algorithm that could discriminate a normal mucosa with debris or bubbles from CD lesions. We achieved a similar high diagnostic accuracy for detection of ulcerations in the small bowel and colon. Although the image quality was not included in our analysis, the impact of obscuration found by Aoki et al. [22] did not result in a lower sensitivity for detection of CD in the colon.

Endoscopic disease severity is currently based on validated scores with ileocolonoscopy or CE: Crohn's Disease Endoscopic Index of Severity (CDEIS), Simple Endoscopic Score for Crohn's Disease (SES-CD), Lewis Score or Capsule Endoscopy Crohn's Disease Activity Index (CECDAI) [17]. Common denominators in these scores are ulcer size and the affected surface. No previous study of deep learning algorithms included lesion characterization, which is fundamental for determining the disease severity. In this study, ulcerations were classified as aphthous ulcerations, ulcers, or fissures/large ulcers with a substantial to almost perfect agreement compared to manual reading. In a recent study, Barash et al. found an agreement between manual reading and deep learning of 67 % for discriminating ulcers of different severity with small bowel CE (grades 1–3 from mild to severe) [25]. There was excellent accuracy when comparing grade 1 ulcerations with grade 3 ulcerations (specificity and sensitivity of 0.91 and 0.91, respectively). These results encourage a future role of deep learning algorithms for autonomous assessment of the disease severity in CD.

There are some limitations to this study. First, two different splits for training and testing were applied. With the random split, there is a risk of bias in favor of the automated classification because images from the same patient are included for training and testing. Hence, the algorithm may recognize lesions with similar appearance from the same patient, which tends to increase the diagnostic accuracy. With the patient split, images from the same patient were used either for training or for testing. This, however, tends to lower the diagnostic accuracy because of variance in visual appearance between patients (color, lighting, debris, bubbles, lesions types, localization, etc.). Validation of our results in a larger cohort will overcome this issue. Second, the number of patients was limited and the analysis was retrospective, although patients were recruited from two ongoing prospective studies of patients with suspected or known CD based on accepted clinical criteria. Third, this study – similar to previous studies – included static images of normal mucosa and CD lesions, and results cannot be generalized to full-length CEs. Our results need validation on full length video sequences. This step is pivotal before clinical implementation of the framework. Fourth, the algorithm performed equally well in the small bowel and colon. However, we did not include a grading scale to evaluate the bowel cleansing and image quality in each image frame. Finally, data augmentation was used in the analysis to increase the number of samples. This is very common in studies employing deep learning techniques. It should be emphasized that this process only applies for training the algorithm.


#

Conclusions

In conclusion, we built a robust and efficient framework for automated recognition of CD lesions with various severities located in the small bowel and colon. The technical solution relies on combined multiple pre-trained deep learning models and a unique image preprocessing step. The framework was extensively evaluated using different testing scenarios, and we report results with almost perfect agreement with the clinical standard. These results are promising for future automated diagnosis of CD. Deep learning approaches have great potential to help clinicians detect, localize, and determine the severity of CD with pan-enteric CE.


#
#

Competing interests

The authors declare that they have no conflict of interest.


Corresponding author

Michael Dam Jensen
Lillebælt Hospital – Internal Medicine, Section of Gastroenterology
Beriderbakken 4 Vejle 7100
Denmark   
Fax: +79406888   

Publication History

Received: 20 January 2021

Accepted: 03 May 2021

Publication Date:
16 August 2021 (online)

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 Examples of image classification. a Normal colon. b Normal colon with debris. c Aphthous ulceration in the colon; d Ulcer in the colon. e Fissures in the colon; f Large ulcer in the terminal ileum.
Zoom Image
Fig. 2 Illustration of applied texture enhancement methods on three random samples from our collected data. The five columns correspond to original images, contrast increase images, histogram equalization images, gradient x images, and dehazing images, respectively.
Zoom Image
Fig. 3 Illustration of the classification process.