Introduction
Colonoscopy is a common procedure and considered the gold standard for the investigation
of the colon and rectum. Cancer screening has led to an increase in colonoscopy procedures
and thus a need for educating more endoscopists [1]
[2]. Learning the skills needed to perform a safe colonoscopy is not an easy task, and
it is imperative to ensure that every single endoscopist is competent before being
allowed to practice independently [3]. The importance of regular assessment of individual endoscopists has been recognized
by the major gastroenterology societies [4]
[5]. Recommended quality indicators are cecal intubation rate (CIR) above 90 %, cecal
intubation time (CIT) less than 16 minutes, and a minimum number of performed colonoscopies
[2]
[6]
[7]
[8]. Unfortunately, many performed procedures are needed to make CIR and CIT statistically
valid indicators for technical competence making them unsuitable for ensuring the
competence of new endoscopists [9]. Furthermore, a specific number of performed procedures does not ensure that each
endoscopist is competent [9]. To deliver high-quality care for patients, competence needs to be based on observation
such as Gastrointestinal Endoscopy Competence Assessment Tool (GiECAT), Global Assessment
of Gastrointestinal Endoscopy Skills (GAGES), Assessment of Competency in Endoscopy
(ACE),or Direct Observation of Procedural Skills (DOPS) [10]
[11]
[12]
[13]
[14]. These assessment tools focus on technical performance and cognitive skills. Unfortunately,
these are prone to observer bias and present a formidable workload for expert supervisors/assessors,
in turn reducing compliance [2]
[4]
[15]
[16]. Nerup et al. developed a computerized assessment tool, the Colonoscopy Progression
Score (CoPS) to allow an automatic and unbiased assessment of colonoscopy skills [17]. The CoPS metric relied on computer-based image analysis of the ScopeGuide video
and could only assess the progression of the colonoscope on its route from anus to
cecum based on the two-dimensional positions of the tip. Direct access to XYZ-coordinates
from the whole length of the colonoscope could allow the development of a system that
assesses more than just the route of the tip, such as looping, tip control, and optimal
path to ensure a safe and smooth progression. However, solid validity evidence proving
meaningful measurements is necessary before the system can be trusted to assess competence
in colonoscopy.
This study aimed to develop and collect validity evidence for a computerized assessment
tool, called the 3D-Colonoscopy Progression Score (3D-CoPS).
Methods
Development of the assessment system
The system utilizes a Magnetic Endoscopy Imaging system called ScopeGuide (UPD-3,
Olympus, Tokyo, Japan) where electromagnetic coils are built in along the length of
the colonoscope. Each coil generates a magnetic field that is picked up by receiver
coils in a receiver dish and processed in the UPD-3 device. For normal use, the UPD-3
system renders 2 D images of the shape of the colonoscope to assist the operator in
understanding and handling certain issues during a procedure. We collected XYZ-coordinates
directly from the UPD-3 unit through an Olympus receiver box (UCES-3) five times per
second (5 Hz) during insertion of the colonoscope.
The aim was to use the XYZ data to create a combined score of progression (3D-CoPS)
based on five different measures: 1. Travel length; 2. Tip progression; 3. Chase efficiency;
4. Shaft movement without tip progression; and 5. Looping.
Travel length: The optimal path from anus to cecum reflects the shortest possible
distance. The measurement is the travel length of the tip from anus to cecum provided
in cm ([Fig. 1]). Higher score values indicate a longer distance traveled whereas a low score represents
a shorter path to reach cecum.
Fig. 1 Travel length is measured by the cumulative distance in cm between all the data points
from the first coil, i. e., the tip of the colonoscope.
Tip progression: A score of progression has previously been described in a 2 D format
(Colonoscopy Progression Score) using coordinates of the tip’s position during insertion
[17]. This was based on tracking the video feed to get X, Y coordinates (e. g. the frontal
plane). Getting data directly from the ScopeGuide system increases the precision because
adding a third dimension (Z) provides information regarding the progression in the
transverse plane of the phantom model. The Tip progression score is based on the assumption
that the optimal path line is defined by the shape of the colonoscope when the tip
is positioned in the cecum. The score is completed by calculating the minimal distance
from all data points collected from tip to the optimal path line ([Fig. 2]). A low score indicates a more smooth and even progress during insertion.
Fig. 2 Tip progression is the cumulative distance in cm from all the data points from the
first coil, i. e. the tip of the colonoscopy, to the optimal path line defined by
the shape of the colonoscope when the tip is positioned in the cecum.
Chase efficiency: A straight colonoscope is in principle an instrument that will respond
more instantly to both torque and in/out movements. This situation is ideal since
the applied force needed to advance is less compared to a more bent colonoscope. When
the shaft of the colonoscope starts to bend it is less likely to travel in the same
path as the tip and the colonoscope is displaced. The measurement is the distance
over time (meter per second) between neighbor coils meaning that if coil 1 travels
from A to B and coil 2 travels from C to D then the calculated distance are the distance
between B and D. The calculation is done for all the coils. The measurement evaluates
how closely a coil follows the same path as the coil in front. The displacement is
small (low score) if the coils follow one another closely and a large displacement
(high score) represents a greater distance between the coils from one-time point to
another ([Fig. 3]).
Fig. 3 Chase efficiency is a score of how closely a coil follows the same path as the coil
in front (meter per second). A shift away from the coil in front increases the score
and represents a greater distance between the coils from one data point to the next.
Shaft movement without tip progression: Ideally the whole colonoscope moves concurrently,
but this is not always the case. This measure refers to the situation where the tip
of the colonoscope stands still but the rest is moving and potentially stretching
the colon and thus mesenteric tissue. A movement cannot be considered concurrent if
the tip of the colonoscope is lodged while the shaft is moving (a low score). A concurrent
movement is when the tip of the colonoscope and the shafts is moving equally (a high
score). The analysis is a comparison of the movement differences between the tip and
the shaft (the scale has no physical unit) ([Fig. 4]).
Fig. 4 Shaft movement without tip progression is a measurement of movement differences between
the tip and the shaft. Increasing score reflects a greater difference between the
movements of the tip compared to the shaft. The score has no physical unit.
Looping: The occurrence of loops during a colonoscopy is correlated to patient discomfort,
pain or even incompleted procedure [18]. A quantified measure of the amount of looping during a procedure has never been
investigated. In 2002 Rogen et al. introduced an automatic classification method of
protein structure, called Writhe [19]. Writhe is a statistic for how knotted a protein is and we applied the same method
to describe the colonoscope. The quantification of scope crossings provides us with
a score for how much the colonoscope is crossing its own path in a 3 D space during
the procedure ([Fig. 5]). The scale has no physical unit, is time-independent, and goes from 0 to 1. A score
of 0 represents a completely straight colonoscope whereas a score of 1 represents
a complete loop in either direction. The total score is the area under the curve where
the Y axis is the Writhe value, and the X axis is time. A threshold for meaningful
looping was set at above 0.5 as we want to measure meaningful loops and not general
bending of the colonoscope. Furthermore, only the first 70 cm of the colonoscope was
analyzed to avoid including bends and loops outside the phantom model in the score.
Fig. 5 Looping is a measurement of how looped the colonoscope is during a colonoscopy. A
score of 0 reflects a completely straight colonoscope (right side) and a score of
1 reflects a bent colonoscope overlapping itself (left side). The score has no physical
unit.
Gathering validity evidence for the newly developed assessment tool: We gathered evidence
of validity according to Messick’s contemporary framework and its five sources: content,
response process, internal structure, relationship to other variables, and consequences
[20]. These categories of evidence need to be collected to support the construct validity
of inferences.
The measurements are meant to measure the core technical skills of colonoscopy. Collecting
content evidence for the measurements was based on a previously validated assessment
tool and phantom model [17]
[21], and the opinions of an expert panel consisting of colonoscopists and engineers.
Response process was ensured by standardized introduction to the equipment and the
phantom model, and all data-collection were recorded in a uniform file-format. When
using multiple items in a test that is intended to measure the same thing (technical
performance), a goal is to know to what extent the items measure the same thing (reliability
of the test). Internal consistency was explored using Cronbach’s alpha. We assumed
that the data reflected a probability distribution and hence enable us to test the
assumption that the 3D-CoPS is correlated to experience, i. e. relationship to other
variable (using Pearson’s r). To ensure technical competence based on 3D-CoPS Contrasting
Groups methods were used (Consequence) to calculate a pass/fail.
Participants and equipment
Twenty-four volunteer physicians were included and divided into two groups based on
clinical experience, defined by the colonoscopy volume; 12 novices with less than
50 colonoscopies and 12 experienced with more than 140 colonoscopies. A standardized
introduction lasting 30 minutes was given to novices without former experience. All
participants were given written and oral information and signed a consent form before
entering the study. Demographics are shown in [Table 1].
Table 1
Demographics.
|
Group
|
12 novices
|
12 experienced
|
P value
|
|
Mean age (range)
|
29.5 (25–32)
|
55.5 (41–66)
|
< 0.001
|
|
Sex (Female/Male)
|
F = 8 (67 %) / M = 4 (33 %)
|
F = 1 (8 %) / M = 11 (92 %)
|
0.002
|
|
Mean years since graduation (range)
|
1.9 (0–8)
|
28.5 (16–39)
|
< 0.001
|
|
Mean colonoscopies performed (range)
|
7.4 (0–34)
|
6,863 (2,000–10,000)
|
< 0.001
|
|
Mean colonoscopies per year (range)
|
7.4 (0–34)
|
512 (300–1000)
|
< 0.001
|
|
Mean gastroscopies performed (range)
|
48 (0–200)
|
NA
|
NA
|
|
Simulator experience (yes/no)
|
10/2
|
12/0
|
0.166
|
The study was done using a setup with an Olympus colonoscope (CF-H180DL, Evis Exera
II video center CV-180, Olympus Medical System Ltd, Tokyo, Japan), the ScopeGuide
(UPD-3, Olympus, Tokyo, Japan), and a phantom model (Kyoto Kagaku Colonoscopy Training
Model, by, Japan). The ScopeGuide was connected to an Olympus receiver box (UCES-3,
Olympus, Tokyo, Japan) and an Intel NUC computer device (NUC7i3BNK, Intel Corporation,
Santa Clara, California, United States) running Windows 10.
The training model contains a 130-cm long rubber colon which can be configured into
different scenarios. Each participant performed twice on the same case (case 3) with
an alpha loop formation in the sigmoid colon. A maximum of 10 minutes was allowed
to reach the cecum. Data collection (XYZ-coordinates) was started at the insertion
of the colonoscope into the anus and ended when the participant had a clear view of
the cap representing the cecum. No feedback was given during or in between the performances.
The data was independent of the UPD-3 unit settings and saved on the Intel NUC computer
device. The five measurements and 3D-CoPS were developed in Python 3.7 and the XYZ
dataset were applied for calculations of the measurements for each participant.
Sample size
The sample size was calculated based on data from a previous trial [17]. A significance level of 5 % and a power of 0.9 required a minimum of 6 participants
in each group.
Ethics
The regional committee of ethics evaluated and approved the study (H-17040471). All
participants were provided with oral and written information regarding the trial.
Participation was voluntary; no material goods were donated to the participants. The
trial was registered (December 22, 2017) at clinical-trials.gov with trial identification
number NCT03401723.
Statistics
Statistical analysis was done in IBM SPSS statistics (PASW, version 22; SPSS Inc,
Chicago, Illinois, United States) and STATA software version 14.0 (College Station,
TX). The level of statistical significance was set at α < .05 for all tests. Internal
consistency was evaluated using Cronbach´s alpha. To allow comparison between the
five measurements and the two groups the scores were standardized by converting to
z-score for each performance. Consequentially, a z-score of 1 was equivalent to one
standard deviation higher than the mean score. The coefficient of variation was calculated
as the standard deviation divided by the mean. The correlation between performances
in the first and second attempt (i. e. test-retest reliability) was explored using
Pearsonʼs r [22]. Comparison of the mean scores of the two performances between groups was made by
using an independent sample t-test for continuous data and the Pearson chi-square
test for categorical data.
Results
Data collection was conducted from October 2017 to February 2018. Twelve experienced
and 12 novices participated. One participant in the experienced group was excluded
due to a large and unintendedly movement of the phantom which affected the whole data-set.
All novices but one reached the cecum twice within the time limit of 10 minutes.
Each of the five measurements was analyzed individually; statistically significant
differences in performance were demonstrated between the novices and experienced participants
in all measurements ([Table 2]).
Table 2
Mean score for the five measurements and cecal intubation time.
|
Novices
Mean (SD)
|
Experienced
Mean (SD)
|
P
|
|
Travel length (cm)
|
737 (397)
|
378 (155)
|
< .001
|
|
Tip progression (cm)
|
3418 (2073)
|
1525 (944)
|
< .001
|
|
Chase efficiency (meter per second)
|
250 (69)
|
188 (46)
|
.001
|
|
Shaft movement without tip progression (no physical unit)
|
0.65 (0.12)
|
0.79 (0.11)
|
< .001
|
|
Looping (no physical unit)
|
617 (622)
|
223 (165)
|
.006
|
|
Cecal intubation time (minutes)
|
4:35 (2:39)
|
1:35 (0:45)
|
< .001
|
Internal consistency of the 3D-CoPS was high: Cronbach’s alpha = 0.78. Indicating
that all five outcome measures were highly correlated, i. e. measured the same construct.
3D-colonoscopy progression score: Mean scores of the five measurements, based on the
two performances, were calculated based on the standardize z-scores. The measurements
were weighted equally (20 %) and combined to a single score of progression, the 3D-CoPS. The
novices scored significantly lower than the experienced; –0.454 (SD 0.707) and 0.495
(SD 0.303), P < 0.001, respectively ([Fig. 6]).
Fig. 6 Box-Plot of 3D-Colonoscopy Progression Score (3D-CoPS) showing outliers (*) and passing
score resembled by the dotted line. Blue colored box is the first try and the green
box is the second try.
A Pearson correlation was done revealing a strong test-retest reliability between
the two tests (r = 0.86, P < 0.001) ([Fig. 7]).
Fig. 7 Correlation of mean 3D-Colonoscopy Progression Score (3D-CoP)S during first and second
try. Blue colored dots are the experienced and the green dots are the novices.
A pass/fail was set based on a contrasting groups method ([Fig. 8]).
Fig. 8 Distributions of 3D-Colonoscopy Progression Score (3D-CoPS) between groups. Passing
score set by contrasting groups method at the intersection of the groups. Blue colored
dots are the experienced and the green dots are the novices.
Based on the passing score one experienced failed both tests and one experienced failed
the first try. Three novices passed both tests and the rest failed both the first
and second test. The 3D-CoPS and clinical experience had a moderate correlation (r = 0.61, P < 0.001).
Discussion
We developed a computerized assessment tool, the 3D-CoPS, to assess technical competence
in simulated colonoscopy. Five different measures were created based on XYZ-coordinates
from along the length of the colonoscope: 1. Travel length; 2. Tip progression; 3. Chase
efficiency; 4. shaft movement without tip progression; and 5. Looping. The 3D-CoPS
and the results on each of the five measurements revealed a statistically significant
difference between groups in performance. Internal consistency and reliability of
the results were good, and evidence of validity was established.
Technical skills are often evaluated by using single measures (e. g. direct observation
of procedural skills or cecal intubation rate). Safe and smooth steering of the colonoscope
relies on more than just one technical aspect during insertion, hence combining several
technical measures increase reliability and reduce possible bias [23]. By combining the five different measurements into a single score, the 3D-CoPS,
we reduce the weight of each measurement, and therefore reduce misinterpretation based
on single parameters. Unexpectedly, even for the experienced endoscopists the travel
length from anus to cecum was 2.9 times the length of the rubber colon and for novice
endoscopists, 5.6 times. This is not in line with previous reports on the distance
from the anus to the cecum (computed tomography [CT] scan:189 cm (range, 75 – 257)
and colonoscopy: 83 cm (range, 49–150)) [24]
[25]. Results from CT scan are closer to the real length of the rectum and colon, compared
with the length measured during colonoscopy. During a colonoscopy the sigmoid colon
tends to be shortened during de-looping maneuvers, hence the length may be underreported.
The traveled distance is not the same as the shortest distance from A to B. This indicates
that even for an experienced endoscopist there is room for improvement to optimize
the distanced traveled.
Tip progression is the cumulative distance from the tip to the optimal path line from
anus to the cecum. A high score indicates an uneven route during insertion. Although
some degree of stretching of the colon is inevitable as the instrument pushes inward,
with subtle movements, the endoscopist may be able to achieve almost “direct” passage
with minimal stretch and bends, hence minimizing the distance from the tip to the
optimal path line. Similar motor skills have been described in the DOPS and Mayo Colonoscopy
Skills Assessment Tool (MCSAT) as tip control and safe endoscope advancement techniques.
However, the assessment is based on the amount of verbal guidance or hands-on assistance
during a procedure, making it prone to observer bias. Furthermore, the assessment
tools are designed to evaluate a training program’s ability to meet specific requirements
rather than assessing a single procedure [10]
[26]
[27].
Pain during a colonoscopy is most likely to occur in the sigmoid colon, and 80 % of
reports are due to loops or straightening the colon [28]. To traverse the colon as gently and as rapidly as possible, the operator needs
to keep the shaft as straight as possible, avoid losing “one-to-one” movements and
avoid over-angulation of the tip [29]. Experienced endoscopists often try to avoid situations where the tip becomes lodged
and further attempts to advance the colonoscope leads to the shaft stretching the
colon wall. Shaft movement without tip progression and looping are supposed to resemble
situations where the colonoscope stretches the colon wall and potentially induces
discomfort, pain, or even risk. Novices endoscopists, who may not know the techniques
needed to prevent unwanted events may instead forcefully push the colonoscope creating
loops or shaft movements without tip progression.
The looping scores indicate that experienced endoscopists had a less bent colonoscope
and spent less time with loop formation during insertion. This is in line with previous
studies investigating loop management in a simulation-based setup [21]
[30]. In a clinical perspective the time spent from anus to cecum on the phantom was
very short even for those with experience ([Table 2]) [31]
[32]. Moreover, in the clinic patients resemble a much more heterogeneous group with
various anatomic formations and loop management in the clinic pose a greater challenge
compared to our setup [32]
[33].
Direct observation assessment tools assess numerous domains of the colonoscopy procedure
one of which is technical aspects, such as pace, tip control and scope handling [12]
[27]. These are essential aspects when evaluating the trainee during a procedure but
are based on a subjective interpretation of the assessment tools and are therefore
subject to rater bias [16]
[34]. The simulated environment offers an opportunity to deconstruct the colonoscopy
procedure into its parts based on the five different measurements. This allows trainees
to identify and comprehend the important procedural steps and train single aspects
to enhance performance.
The major endoscopy societies around the world recommend regular assessment of endoscopists
in general. Direct observation by a supervisor assessing the trainee and procedure-related
quality measures, such as cecal intubation rate, time spent and complications is frequently
used [35]. Experts within the field of endoscopy have been calling not only for a means for
objective assessment but also for a continuous evaluation to follow a learning curve
as trainees progress. The assessment tools are prone to bias and are resource demanding,
which reduces compliance regarding their continued use. Sedlack et al. followed all
gastroenterology fellows in training at the Mayo institution during a 3-year study.
The overall reported compliance rate with completing the assessment tool on each procedure
was 62 % but was initially as low as 21 % [27]. Preisler et al. explored the correlation between CoPS and patient experienced discomfort
but had to exclude 29 % of recordings [36]. Excluding user errors, the compliance was as high as 89 % (67/75) favoring automatic
assessment.
A meta-analysis investigating the effect and type of feedback in simulation-based
training found that feedback increases performance [37]. Traditionally, supervisors are needed to provide feedback when training, but supervisors
remain a scarce resource, which is the reason why self-practice is gaining ground
in the simulation centers. Continuous and automatic assessment as typically seen in
the virtual reality simulators provides the trainees with easily accessible feedback
and access to the learning process. In the clinic, however, continuous assessments
remain difficult without supervisors and the workload needed decreases the compliance,
hence the number of procedures assessed for meaningful learning curves becomes inaccessible
[38]. The 3D-CoPS circumvent these problems being automatic, unbiased and feasible with
the possibility of assessing, giving feedback and access to learning curves, in both
a simulation and a clinical environment.
Limitations
First of all, assessing colonoscopy is much more than just technical performance and
3D-CoPS do not assess technical skills such as mucosa visualization and the ability
to handle therapeutic tools. Furthermore, non-technical skills, such as pathology
identification and the management of patient discomfort have been acknowledged as
essential components [39]. Until recently Olympus was the only manufacturer with a magnetic endoscopic imaging
system; accordingly, 3D-CoPS has been tested only on Olympus equipment. The study
was conducted in a simulation-based setup that offered a standardized and feasible
model for gathering initial validity evidence. A possible limitation was the missing
resemblance to the clinic, but a meta-analysis is investigating the use of simulation-based
assessments found a positive correlation with patient-related outcomes [40]. However, clinical studies should be performed to explore the validity and usefulness
of the system in colonoscopies on patients.
We used a relatively small sample of physicians; nevertheless, we found significant
differences in each of the five measurements and the 3D-CoPS. Validation studies have
been criticized for the expert-novice comparison [41]. When comparing a proficient group (e. g., experts) with one that is not (e. g.,
true beginners such as medical students), a large difference in competence is to be
expected. The novice group in this study was made up of physicians with various experience,
ensuring a very high baseline capability compared with other studies within the same
field, leading to a stricter pass/fail standard than if all novices were “true beginners.”
Inclusion criterion for the novices was physicians having done fewer than 50 clinical
colonoscopies; only four novices had no clinical experience. One of whom had already
participated in simulation-based training programs and had passed the tests in colonoscopy
and gastroscopy. The rest had mixed experienced with up to 200 clinical gastroscopies
and 34 clinical colonoscopies. This might explain why some novices performed so well
and why the pass/fail standard was so high.
The selected case was straightforward with an alpha loop in the sigmoid colon. The
ease of the case may have interfered with the discriminative ability between the more
experienced physicians by introducing a ceiling effect to the assessment tool. Conversely,
increasing the level of difficulty would have increased the risk of the novices not
being able to complete the case. During a colonoscopy, the endoscopist might need
to change the position of the patient to advance the colonoscope. In our study, the
phantom model was fixed in the supine position. A position change of the phantom model
shifts all the coils approximately 90 to180 degrees in either direction and thus affects
travel length, tip progression, and chase efficiency. Shaft movement without tip progression
is less likely to be affected because the measurement is the difference in movement
between the tip and the shaft, whereas looping remains completely unaffected of external
movement. A position change may unintentionally favor or disfavor our measurement
scores since the position change could be meaningful for further advancement during
a procedure. As a result, position change during a procedure requires mathematical
adaption in all our measurements to account for the change in the dataset. Ongoing
clinical studies are dealing with these issues.
In theory, 3D-CoPS is fully implementable in the clinic and can continuously assess
endoscopists. Furthermore, 3D-CoPS could add to existing training programs providing
the trainee and the supervisors’ with information on technical skills and learning
curves [14]
[42]. Moreover, the tool could be used logistically to optimize time schedules for patients
in control programs. A patient with a previous colonoscopy now has a score of procedural
difficulty, this information can be used in the planning to ensure the technical competence
of the endoscopist, and also the time needed to complete the colonoscopy. Quality
measures for gastrointestinal endoscopy units have been stated to constitute high-quality
endoscopy. Many of these procedure-related indicators demand retrospective registrations
and system developments for data tracking and interpretation. Furthermore, efforts
needed to gather the relevant information have led to a low response rate [4]. Computerized automatic assessment of every colonoscopy could provide the units
with a procedure specific quality indicator based on their entire production.
Conclusion
In conclusion, the study presents a novel, real-time computerized assessment tool
for colonoscopy with strong evidence of validity based on Messicks framework. XYZ-coordinates
from coils along the length of the colonoscope were sampled, five different technical
measurements were developed and built into a combined score of progression, the 3D-CoPS. With
further development, 3D-CoPS could provide feedback for trainees, aid in the certification
process, and help ensure competent performance of colonoscopies.