Open Access
CC BY 4.0 · Journal of Digestive Endoscopy 2024; 15(04): 243-249
DOI: 10.1055/s-0044-1800916
Review Article

Capsule Endoscopy Technology: A New Era in Digestive Tract Examination

Kang-ming Huang*
1   Department of Gastroenterology, Sanming First Hospital, Fujian Medical University, Sanming, China
,
Hua-bin Qiu
2   Department of Gastrointestinal Endoscopy, Sanming First Hospital, Fujian Medical University, Sanming, China
,
Yinghan Deng*
1   Department of Gastroenterology, Sanming First Hospital, Fujian Medical University, Sanming, China
,
Lian-hui Wu
1   Department of Gastroenterology, Sanming First Hospital, Fujian Medical University, Sanming, China
,
1   Department of Gastroenterology, Sanming First Hospital, Fujian Medical University, Sanming, China
› Author Affiliations
 

Abstract

Capsule endoscopy (CE) represents an important groundbreaking advancement in gastrointestinal (GI) examinations, distinguished by its noninvasive, painless, and convenient nature, and has swiftly established itself as a crucial tool for diagnosing and treating digestive diseases. With the development of artificial intelligence (AI) and machine learning (ML), as AI and ML progress, the capabilities of CE have expanded beyond mere imaging within the GI tract; it is progressively evolving to encompass procedures such as biopsies and targeted drug delivery. This review systematically searched five reputable repositories—Scopus, PubMed, IEEE Xplore, ACM Digital Library, and ScienceDirect—for all original publications on CE from 2001 to 2024. The review provides an overview of the current status and identified limitations of CE, highlighting the significant role that AI and ML are projected to play in its future development.


Introduction

In the past, the diagnosis and treatment of small intestine diseases were inconvenient due to the lack of tools that could examine the small bowel (SB).[1] Since the emergence of SB capsule endoscopy (CE) in 2001, this situation has changed.[2] This innovative, noninvasive procedure offers unparalleled clarity in observing the SB's mucosal lining, all without exposing patients to radiation.[3] CE excels in its noninvasive nature, the possibility of repeated use, and its high degree of patient tolerance. Looking ahead, the fusion of CE with the advancements in artificial intelligence (AI) is set to elevate its significance in gastrointestinal (GI) diagnostics and treatment, heralding the arrival of a new era in precision medicine.[4] [5]

Current research efforts are focused on integrating AI to enhance image interpretation, simplify analysis duration, and reduce the likelihood of human error. In addition, due to the integration of AI, CE has made certain progress in areas such as image transmission, battery life, intestinal motility, and targeted drug delivery systems. This article will further elaborate on the potential applications of AI in CE.


Application of AI in the Preparation of the Gastrointestinal Tract before Capsule Endoscopy

Currently, a unified standard for the optimal bowel cleansing medication prior to CE has not been established.[6] The caliber of bowel preparation is pivotal, as it directly influences the examination's quality and the precision of diagnostic outcomes.

Current studies compare the effects of ingestion of 2 L of polyethylene glycol (PEG) solution the night before the test, 5 mL of simethicone 20 minutes before the test, and 5 mg of metoclopramide. The control group did not eat solid food after 7 p.m. the day before CE and did not eat liquid food 4 hours before CE. Contrary to the aforementioned research results, there was no significant difference in visualization and completion rate between the observation group and the control group. Instead, it increased the discomfort of the patients,[7] and this is in the same vein as a recently published result by Estevinho et al, whose findings showed that patients who underwent bowel preparation the night before the examination had a worse visualization, diagnostic field of view, and vasodilatation detection rate.[8] In addition, other studies have reported that purgative bowel preparation may not be superior to clear liquid diets.[7] [9] From the results of the current study, bowel preparation with laxatives prior to colonoscopy improves visualization and detection rates of colonoscopy, However, whether bowel preparation with 4 or 2 L PEG prior to CE can improve the visualization and detection rates of the examination still requires further investigation. Therefore, future multicenter randomized controlled studies with adequate sample sizes, validated definitions of diagnostic outputs and visual quality, standardized protocols for bowel preparation, and homogeneous patient populations are needed, and attention needs to be paid to patient tolerability.

With the increasing application of AI in the field of CE, the standards for judging the cleanliness of the intestines in CE images have become more unified and efficient. This is of great significance for the continuous improvement of bowel cleansing quality. AI has been integrated to enhance the assessment of bowel cleansing scores, which is crucial for high-quality imaging of the GI tract through CE. Several scales have been developed to classify the bowel preparation for CE; however, their application is limited due to poor interobserver agreement. Mascarenhas Saraiva et al[10] trained a deep learning network on 35,269 frames of colonic mucosa and developed a deep learning algorithm capable of evaluating the quality of bowel preparation with a sensitivity of 91%, specificity of 97%, and an overall accuracy rate of 95%. This algorithm has a good discriminative ability, which is crucial for the future application of CE. Currently, due to inconsistencies in human judgment of CE images, there is a certain variation in the results of reading these images. Ju et al[11] compared the judgments of five gastroenterologists with those of an AI system. When assessing the cleanliness of bowel images, the judgments of the five gastroenterology experts were diverse, while the AI's judgments were consistent. Ribeiro et al[12] designed a Convolutional Neural Network (CNN) based on 12,950 CE images obtained from two clinical centers in Porto, Portugal. The quality of bowel preparation for each image was classified as: excellent, with visible mucosa on ≥90% of the image surface; satisfactory, with 50 to 90% of the mucosa visible; and unsatisfactory, with <50% of the mucosa visible. The CNN's predictive results were compared with the classification established by the consensus of three CE experts, which is currently considered the gold standard for assessing cleanliness. This algorithm achieved an overall accuracy of 92.1%, with a sensitivity of 88.4%, a specificity of 93.6%, a positive predictive value of 88.5%, and a negative predictive value of 93.4%, proving its ability to accurately classify bowel preparation for CE.


Endurance of the Capsule Endoscopy

The usage time of CE is a prerequisite for its performance. According to a study by Ou et al,[13] enhancing the device's battery life could be an effective way to boost the procedure's completion rate. Typically, CE devices have an operating time that spans between 8 and 15 hours.[14] When evaluating different types of capsule endoscopes, factors such as battery life, pixel count, imaging frequency, capsule size, reception technology, and field of view are all crucial for determining the completion rate of diagnosis and the assessment of the field of vision ([Table 1], [Fig. 1]). The battery's lifespan is crucial, as it can limit the duration of CE procedures, with 16.5% of examinations being cut short due to battery depletion.[15] An innovative approach could involve developing a self-sustaining battery. This battery would exploit the digestive system's energy, potentially by transforming gastric fluids into a continuous source of electrolytes, thereby significantly prolonging the battery's endurance.[16] [17] Despite this, it also faces a major obstacle—the issue of low output power. A recent study by Ilic et al[18] introduced an edible rechargeable battery composed of common food ingredients and additives, an emerging battery design leverages redox reactions to energize devices, yet it grapples with delivering sufficient power density. To keep pace with the evolving needs of CE, research is exploring wireless energy transfer as a potential solution, harnessing electromagnetic waves to provide a steady power source.[19] Future researchers may be able to use an oral solution to increase the endurance of capsule batteries.

Zoom
Fig. 1 Common mainstream capsules images.
Table 1

Parameter with different capsule endoscopes

Brand/manufacturer

Battery life

fps

Optical angle

Clarity/resolution

Size

Reception technology

Israel

Given Imaging (under Medtronic)

7 ± 1 h (PillCam partial models of a series)

14

140 deg

256 × 256 pixels

11 × 26 mm (diameter × length)

Wireless power transfer

Korea

IntroMedic

Over 11 h

3

170 deg

320 × 320 Pixels

11 × 24 mm

Human body communication technology

Olympus

Over 10 h

Adjustable

160 deg wide angle

3,968 × 2,974 pixels

11 × 26 mm

Wireless power transfer

Ankon Technologies Co. Ltd

6–8 h

2

100 deg

480 × 480 Pixels

27 × 11.8 mm

Magnetic control Technology

Jinshan Technology Group Co., Ltd.

Exceeding 10 h (OMOM capsule endoscopy)

2–5

170 deg

512 × 512 Pixels

13 × 27.9 mm

Wireless power transfer

Shenzhen Jifu Medical Technology Co., Ltd

6–8 h

4

145 deg ultra-wide field of view

Patented trilens, high resolution

17 × 11.8 mm

Macrophotography, wireless image transmission technology

CapsoVision

15–19 h

12–20

360 deg panoramic side view

11.5 × 26 mm

Human body communication technology data transmission system

Abbreviation: fps, frames per second.



Capsule Endoscope Localization and Motion

Indeed, the movement of a CE through the GI tract is typically passive, relying on natural peristaltic actions to propel the device. This method, however, may lead to obscured areas, potentially raising the likelihood of undetected diagnoses. To counteract this, controlled active motion CE devices have been engineered, enabling directed movement of the capsule.[20] [21] Following this, extensive research has been conducted on integrating functional modules into active CE to expand the capabilities of this technology, including biopsy, drug delivery, and tattooing.[22] [23] The novel tattoo capsule endoscope (TCE) under investigation is designed for submucosal ink delivery to the digestive organs, serving as a supplementary marker for surgical locations.[24] This research has achieved active, multidirectional movement of the CE by utilizing the interaction between an externally controlled magnetic field and a permanent magnet within the system. The research team used fresh pig intestine segments to test the proposed TCE in vitro and found that it could guide TCE to the target and deliver the tattoo agent into the tissue.

Magnetic field control presents a highly promising avenue. The magnetic actuation system consists of a capsule shell made of magnetic material,[25] which can be formed by mixing neodymium-boron-iron magnetic powder with silicone resin,[26] or by modifying the capsule with magnetic material at one end. This modified capsule's movement is directed by an external magnetic source, positioned beyond the body's surface.[27] Such a magnetic field is produced by devices such as a portable permanent magnet or a robotic arm, as well as electromagnets capable of varying the magnetic field intensity. The magnetic field generates rotational and translational forces, enabling movement, speed control, orientation, positioning, and precise imaging.[28] Alternative technologies, including magnetic resonance imaging (MRI), computed tomography, ultrasound, X-ray, and gamma-ray imaging, could potentially aid in positioning the capsule. However, their integration with CE is challenging due to the need for sustained imaging throughout the procedure, which can extend up to 8 hours. In assessing these technologies, various factors must be taken into account. Positioning accuracy is important but not the only indicator. For example, although radiation-based methods such as MRI and X-ray provide high levels of accuracy, performing these methods for extended periods is not feasible, and the risk of radiation exposure is undesirable. It should be noted that magnetically controlled capsule endoscopy (MCCE) also has its limitations. The clinical evidence for detecting gastric lesions, especially gastric cancer, is still limited.[29] It does not have the advantages of traditional endoscopy in detecting gastric fluids, performing biopsies of lesions, or carrying out endoscopic treatments. MCCE with biopsy capability is currently in the preclinical application research phase. Compared with traditional endoscopy, MCCE requires a longer time to examine the GI tract, has higher requirements for GI preparation, and incurs higher examination costs.

Advancements in AI algorithms have significantly enhanced the capabilities of video localization in the field of GI imaging. This technology capitalizes on the dynamic changes such as distortions, curls, and shape variations present in the visual data of the GI tract. Unlike some other techniques, video localization does not need additional devices for better positioning; it works by analyzing the raw video frames. There are two main types of video localization: topographical video segmentation and motion estimation.[30] Topographical video segmentation uses image features such as color, texture, and movement to segment the video into different organ-specific areas, which aids in precise localization.[31] [32] Motion estimation, which is based on visual odometry (VO),[33] determines the exact location by analyzing how point features change between the frames captured by the capsule's camera. This method for wireless capsule endoscopy (WCE) localization was initially put forward by Iakovidis et al, who used a Java-based video analysis framework to speed up the development of intelligent video analysis tools. Later improvements included the use of artificial neural networks to boost the VO method, enhance geometric calculations, and increase the accuracy of positioning.[34] VO not only locates the capsule but also provides directional information by calculating the movement and rotation of specific points in the video frames. However, relying solely on video-based positioning might not meet the precision needs for WCE localization, and the low frame rate of video transmission along with the speed of image recognition could result in significant delays.


Image Transmission

Through the peristalsis of the GI tract, the capsule traverses the GI tract, taking and wirelessly sending images to a patient-worn storage device. It operates for approximately 8 hours, capturing images at a rate of 35 frames per second.[35] To reduce the power consumption and resolve the bottleneck of wireless communication bandwidth, in 2010, a study[36] presented a low-complexity video encoding method based on Wyner–Ziv theory, which was applied to image transmission in capsule endoscopes. By transferring complex video encoding processes (such as motion estimation) to the receiver side, the encoding process at the transmitter was simplified. This encoding method achieved lossless compression results using only 30% of the original video image data when the channel signal-to-noise ratio reached 3 dB. Additionally, the advantages of this encoding method were confirmed through a comparison with the JPEG standard.

Various studies have focused on the successors of WCE that capture images in Bayer format, thereby reducing the data volume to one-third.[37] However, with the rapid development of image sensor technology, images captured in Bayer mode with high frame rates, good quality, and strong resolution still exceed the transmitter's bandwidth. Many image compression methods for Bayer format have been proposed to achieve higher compression performance.[38] However, these methods allow some information contained in the original image to be lost, leading to errors. There are also lossless and high-quality compression methods based on structural transformations.[39] [40] [41] [42] However, their high computational demands make them unsuitable for wireless endoscopy. Moreover, these methods involve multichannel compression, increasing hardware requirements and costs.

CE generally uses radio frequency for data transmission, which is categorized based on frequency into low, high, intermediate, and microwave bands. While low frequency is easy to design and exhibits excellent performance in terms of cortical penetration, its reliance on larger electronic components hinders the miniaturization process of CE. Most commercial CE devices employ dual-frequency communication, typically around 400 MHz. However, the 300 kHz channel bandwidth allowed by this frequency band poses significant challenges in providing the data rates required for transmitting high-quality real-time video. These limitations make narrowband transmission technologies increasingly inadequate to keep pace with advancements in CE technology. Ultra-wideband communication,[43] [44] capable of data transmission exceeding 100 Mb/s, significantly enhances video quality while reducing power consumption, making it an ideal choice for emerging research on CE wireless interfaces.


Application of Artificial Intelligence in Capsule Endoscopy

AI-Assisted Capsule Endoscopy Review

We know that in each CE examination, there may be as many as 50,000 images, but only one or two of them are meaningful. It takes ∼30 minutes to make an image report. During this period, the reporter may suffer from inattention, which may increase the missed diagnosis rate. To overcome the limitations of the reporter and save time, several AI-based viewing modes can be used on the current CE platform.[45] Adjustment mode, automatic viewing mode, and Omni mode are programs that analyze overlapping images and adjust playback speed or discard redundant frames. Unfortunately, compared with the slowest viewing mode, the missed diagnosis rate of some lesions is as high as 12% when using the fastest viewing mode.[46] In conclusion, at present, the current AI software has little effect on cost and viewing time. However, some studies have shown that applying the CNN system of deep learning to the reading process of CE can reduce reading time without reducing the detection rate of erosive and ulcerative lesions.[47] [48] [49] In the future, AI needs to rely more on deep learning to improve in terms of saving time and reducing missed diagnoses.

In fact, AI cannot only help improve the efficiency and accuracy of reading films,[4] [50] but it can also assist in the early identification of tumor changes. For instance, in the field of esophagology, the primary application of AI has been in detecting esophageal dysplasia and tumors. Barrett's esophagus is a precancerous condition for esophageal adenocarcinoma.[51] However, endoscopists may miss ∼25% of high-grade dysplasia and esophageal adenocarcinoma lesions within Barrett's esophagus. de Groof et al[52] utilized a large database of nearly 500,000 images of Barrett's esophagus to develop a computer-aided detection (CADe) system that achieved an accuracy rate of 89% in detecting early tumors in patients with Barrett's esophagus, outperforming all 53 endoscopists included in the study. Similarly, in the stomach, Luo et al[53] conducted a large prospective multicenter case–control study using AI to compare and learn from over 1 million images of 84,424 patients. They developed a computer-assisted diagnostic system that not only achieved good results in distinguishing from benign lesions but also helped determine the depth of invasion.[54] [55] This is of great significance for the subsequent treatment of tumors. Researchers[56] have also been leveraging AI for the detection of SB lesions by training AI systems on a vast number of images to form a CADe system for identifying and diagnosing small intestinal diseases. These advancements showcase the potential of AI in enhancing the diagnostic process for GI diseases, offering more accurate and efficient analysis of the extensive image data produced by CE. Additionally, the detection of polyps and tumors is also an important goal of CE examinations that rely on CADe systems. Polyps exhibit similar color and texture to the background tissue, making them more challenging to detect compared with ulcers and vascular lesions, which are more visually distinct. The miss rate for small intestinal polyps is higher.[57] So far, the accuracy of machine learning (ML) methods is relatively lower when compared with other types of lesions. Despite these challenges, several research groups are applying novel deep learning methods to improve the detection of small intestinal tumors.[30]


AI-Assisted Judgment of Bleeding in Capsule Endoscopy

Unexplained GI bleeding is the most common indication for CE. Early systems for CADe of bleeding relied on ML with manual feature extraction. An accuracy of 81 to 98% for ML-based algorithms is reported, though real-world performance likely differs.[30] Various investigations have utilized ML, CNN, and computational algorithms for discerning intestinal angioectasias, achieving notable sensitivity and specificity.[58] [59] [60] [61] Additionally, AI facilitates the assessment of intestinal mucosal bleeding during CE by estimating blood levels in the digestive tract, thereby deducing the presence of active bleeding in the small intestine's lining.[62] [63] [64] For small intestinal ulcers, the initial ML technology's accuracy rate in identifying ulcers was between 89.5 and 95.4%.[30] CADe of ulcers has evolved over the past 15 years, though the visual subtlety of ulcers presents a challenge compared with frankly red or bleeding lesions. Although erosions and ulcers are related pathological processes, they are visually distinct, and for AI algorithms, reliably grouping them together poses a challenge. In 2009, Pan et al[65] developed a CNN to detect bleeding images using color and texture features. This study utilized a total of 150 complete CE videos, 3,172 bleeding images, and 11,458 nonbleeding images to test the algorithm, achieving a sensitivity and specificity of 93.1 and 85.6%, respectively, at the image level. In 2018, Fan et al trained a CNN with a training set consisting of 3,250 ulcer images and 4,910 erosion images to distinguish between erosions and ulcers, respectively. Using expert endoscopy as the gold standard, they achieved sensitivities of 96.8 and 93.67%, and specificities of 94.79 and 95.98%.[66]

Additionally, AI has its limitations, for example, it has not surpassed experienced endoscopists in making final diagnoses for individual patients. Therefore, AI mainly serves to assist endoscopists and improve their work efficiency. Moreover, AI is unlikely to replace the human eye, as humans are ultimately responsible for the final endoscopic report.[5]

In the future, AI, through more deep learning, can reflect and solve the following problems through the tissue color bar of CE: (1) whether all SB detection is complete; (2) helps the examiner accurately inform the subject of the junction of the jejunum and ileum; (3) accurate prompting of the duodenal papilla and the ileocecal valve; (4) accurate prompting of suspected lesions; and (5) removal of report templates from the CE controversial language (e.g., unable to inform the patient where the jejunum and ileum are demarcated) ([Table 2]).

Table 2

Summary of studies in the literature review on AI application in capsule endoscopy

Author and ref. no.

Field

Year of publication

Country

Research methods/technology

Conclusion

Daniel and Rana[25]

Capsule Motility

2020

India

Magnetically assisted

Magnetic control capsule motility

Mascarenhas Saraiva et al[10]

Bowel Preparation

2023

UK

Retrospective study

Improving intestinal preparation evaluation standards

Ju et al[11]

Bowel preparation

2023

Korea

Control study

AI evaluation bowel is superior to human evaluation

Ribeiro et al[12]

Bowel preparation

2023

Portugal

Deep learning

The deep learning algorithm can be used to classify the quality of bowel preparation

Hosoe et al[45]

Image

2016

Japan

Randomized

The reading time has significantly decreased

Kyriakos et al[46]

Reading

2012

Greece

Controlled trial

The reading frequency that can be safely substituted for slower models in clinical practice was determined

Hwang et al[47]

Image reading

2021

Korea

Control study

Improving the detection rate of small lesion and ulcers using convolutional neural

Otani et al[48]

Lesion detection

2020

Japan

Control study

AI can diagnose small intestinal disease

Aoki et al[49]

Lesion detection

2020

Japan

Deep learning

Reduce the reading time

de Groof et al[52]

Image reading

2020

The Netherlands

Deep learning

Improve early detection of Barrett's esophagus

Cho et al[54]

Lesion detection

2019

Korea

Machine learning

Distinguish between benign and malignant lesions

Ueyama et al[55]

Lesion detection

2021

Japan

Deep learning

Improve the diagnostic identification lesion detection n rate of early gastric cancer

Cardoso et al[56]

Lesion detection

2022

Portugal

Control study

Enhancing lesion detection efficiency

Tsuboi et al[61]

Lesion detection

2020

Japan.

Deep learning

Small bowel angioectasia

Fan et al[66]

Lesion detection

2018

China

Deep learning

Detect small intestinal ulcer and erosion



Conclusion

The evolution of CE technology has been remarkable, transitioning from an initial capability of mere imaging within the small intestine to the present-day magnetically navigated capsules that can actively traverse the GI tract. The future promises an expansion of CE's role to include biopsies and therapeutic interventions. The application of AI is expected to augment diagnostic precision and diminish the incidence of overlooked diagnoses. Although the exploration of CE's potential is still primarily confined to porcine models or ex vivo GI tracts, its clinical prospects are encouraging. To fully harness the potential of CE in clinical settings, it is imperative to refine the performance of CE's intelligent technologies and appreciate the benefits of interdisciplinary technological convergence.



Conflict of Interest

None declared.

Authors' Contribution

K.-m.H. and H.-b.C. initiated the study design, K.-m.H.Y.D. and H.-b.Q. are responsible for collecting original articles and article writing, while H.-b.C. and L.-h.W. are responsible for reviewing article quality and English calibration.


Source of Support

None.


* These authors have contributed equally to this work and share first authorship.



Address for correspondence

Hong-bin Chen
Department of Gastroenterology, Sanming First Hospital, Fujian Medical University
No. 29, Dongxin 1st Road, Ledong Street, Sanming City 365000
China   

Publication History

Article published online:
30 December 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India


Zoom
Fig. 1 Common mainstream capsules images.