Introduction
Certification for esophagogastroduodenoscopy (EGD) can be long and arduous, with a
median time to certification of 1.7 years for gastroenterologists, and 2.9 years for
surgical trainees [1]. A preexisting concern associated with the Shape of Training era was a reduction
in gastroenterology specialty training time from 5 to 4 years [2]. This has been compounded by to the COVID-19 pandemic, with endoscopy training activity
initially falling from a mean of 1,930 to 133 procedures per week, and service provision
collapsing to 5 % of pre-COVID levels [3]. Contemporary recovery, now approaching 50 % [4], is predominantly service related to address the accumulated clinical backlog. The
associated training deficit remains unaddressed and with surely require additional
measures [5]
[6].
In the surgical arena, simulation training has been shown to significantly augment
learning curves in a range of surgical specialities, with two-fold left-shift learning
curve trajectory shifts described in minimally invasive general surgery [7], gynecology [8], urology [9] and orthopedic surgery [10]. Existing learning curve analysis of skill acquisition in EGD has highlighted several
factors, including total procedure count, that influence trainee competence [11]
[12], yet the potential simulator-based training impact on learning curve trajectory
shift in EGD is unknown. Supporting evidence is sparse, and although simulation curricula
exist, verification is limited by scant and unvalidated longitudinal performance [13]
[14].
A recent UK survey reported that 93.4 % of trainees were concerned regarding their
acquisition of competencies, and 82.6 % required an extension of specialty training
[13]: demonstrating a plain and pressing need for iterative high-quality, validated training
adjuncts [6]. Simulation has been reported to enhance training more than any other modality and
its benefit is inversely proportional to experience [15]
[16]. The provision of endoscopy simulation-exposure varies across centers: a validated
simulation pathway to improve competencies, aimed at novice endoscopists, would be
invaluable in supporting the role out of this modality should simulator-acquired skill
translate into clinical practice.
EndoSim (Surgical Science, Gothenburg, Sweden) is a novel endoscopic virtual reality
(VR) simulator which incorporates a flexible curriculum that generates task-specific
metrics, incorporating iterative development, mapped to specific areas of the Joint
Advisory Group on Gastrointestinal Endoscopy (JAG) Direct Observation of Procedural
Skill (DOPS) tool [17]. Confirming the face validity of the EndoSim SPICE, which focuses on basic scope
handling and OGD skill acquisition, using a series of benchmarked exercises, would
enable further study into the relevance of observed differences in clinical practice.
Methods
A prospective observational cohort validation study of the EndoSim VR endoscopic simulator
(Surgical Science, Gothenberg, Sweden) was performed between January 1, and April
30, 2021.
Participants and setting
A pilot cohort of four independent expert endoscopists, defined as a healthcare professional
with a job-plan including a routine weekly independent endoscopy session, with no
prior EndoSim experience, tested 12 EndoSim exercises and rated the face validity
of each exercise on a Likert scale of 1 to 5 (1 very poor, 5 very good). Feedback
regarding the face validity of each exercise was assessed by means of bespoke questionnaires,
including free-text comment and suggestions for improvement. Iterative development
resulted in a provisional final 13-exercise simulation-based pathway.
A cohort of 10 experts subsequently completed the 13-exercise pathway twice. The first
familiarization run was disregarded, and the metric values for the second run was
analyzed to determine validated benchmark values. The sample size was informed by
a previous study reported by Brown et al of the Surgical Science LapSim simulator;
a sibling high-fidelity surgical simulation training platform [7]. Ethical approval was granted by Cardiff University, School of Medicine (SMREC 20/117).
Description of simulator, procedural module, and simulator measurements
A Simcart, table-mounted, and height-adjustable EndoSim VR endoscopy simulator with
integrated haptic technology was used (EndoSim: Surgical Science Sweden AB) ([Fig. 1]).
Fig. 1 A Simcart, table-mounted, height-adjustable EndoSim Virtual Reality (VR) endoscopy
simulator with integrated haptic technology. (EndoSim: Surgical Science Sweden AB).
The system consisted of a software program run on an Intel Core i7 processor (Intel
Corporation, Santa Clara, California, United States) using Windows 10 Pro (Microsoft
Corporation, Redmond, Washington, United States). The computer was equipped with 8
GB of internal RAM, a NVIDIA GeForce RTX 2060 graphics card (NIVIDIA Corporation,
Santa Clara, California, United States), a 27-inch monitor, and a virtual endoscopic
interface, including a gastroscope or colonoscope with accessory channel and accessory
tool. In this study, the 2020 version of the system was utilized. Exercises from Fundamental
Endoscopy Skills 1, Fundamental Endoscopy Skills 2 and Upper GI Gastroscopy Intubation
were chosen by a focus group consisting of a consultant gastroenterologist, surgical
registrar, and surgical science software development representative. Each exercise
was mapped against the JAG DOPS tool to determine each examined skill through as many
domains as possible. Where multiple exercises assessed the same skill, the focus group
agreed upon the optimum exercise to include in the simulation pathway. The exercises
were further deconstructed, by metric, to provide immediate computer-generated feedback
presented, aligned to DOPS the following domains: scope handling, angulation and tip
control, pace and progress, visualization, and patient comfort.
Pilot exercises can be found in [Table 1]. Some exercises were modified to improve the face validity: the degree in which
the exercise replicates the skills being tested; and a new course was developed, named
Validation Study. The included exercises for the proposed training pathway are listed
in [Table 2] and can be seen at: https://www.youtube.com/watch?v=SFl3Mqz4StQ&list=PLuJxB-uznJ-Wbryk64lxG_9WR5SEvh1kg. Two identical EndoSim machines were used: one in the Welsh Institute of Minimal
Access Therapy (WIMAT), Cardiff, and one at Southmead Hospital, Bristol. This was
an independent study; Surgical Science had no access to the study data.
Table 1
Variation in expert Likert scores related to pilot exercises.
Exercise
|
Median score [IQR]
|
P value
|
Mucosal examination
|
5 [4.5–5]
|
1.000
|
Examination
|
4.5 [4–5]
|
0.686
|
Knob handling
|
4.5 [4–5]
|
0.686
|
Visualize colon 1
|
4 [4–4.5]
|
0.343
|
Scope handling
|
4 [4–4.5]
|
0.343
|
Navigation skill
|
4 [3.75–4]
|
0.057
|
Retroflexion
|
4 [3.5–4]
|
0.057
|
Photo and Probing
|
3.5 [2–5]
|
0.486
|
Navigation tip/torque
|
3.5 [2.5–4.5]
|
0.200
|
ESGE photo
|
2 [1–3.5]
|
0.029[1]
|
Loop management 1
|
2 [1–3]
|
0.029
[1]
|
Loop management 2
|
1.5 [1–2.5]
|
0.029
[1]
|
ESGE, European Society of Gastrointestinal Endoscopy.
1
P values were generated using Mann-Whitney U test to compare Likert score per exercise
against the highest rated (Mucosal Examination).
Table 2
Variation in expert Likert scores across validation study exercises.
Exercise
|
Median score [IQR]
|
P value
|
Visualize colon 1
|
4.5 [4–5]
|
1.00
|
Visualize colon 2
|
4.5 [4–5]
|
1.00
|
Scope handling
|
4.5 [3–5]
|
0.796
|
Examination
|
4 [4–5]
|
0.796
|
Navigation skill
|
4 [4–5]
|
0.853
|
Mucosal examination
|
4 [4–5]
|
0.739
|
Knob handling
|
4 [4–5]
|
0.529
|
Photo and probing
|
4 [3.5–5]
|
0.579
|
Retroflexion
|
4 [2–5]
|
0.218
|
Navigation tip/torque
|
3.75 [3–4]
|
0.105
|
ESGE photo
|
3.75 [3–4]
|
0.105
|
Intubation case 3
|
3 [2–3]
|
0.004[1]
|
Loop management
|
3 [1–3]
|
0.001
[1]
|
ESGE, European Society of Gastrointestinal Endoscopy.
1
P values were generated using Mann-Whitney U test to compare Likert score per exercise
against the highest rated (Visualize Colon 1).
Statistical analysis
Statistical analysis appropriate for nonparametric data (Kruskal-Wallis, Mann-Whitney
U) was performed using SPSS 27 (IBM SPSS Statistics for MacOS, Version 27.0. IBM Corp.,
Armonk, New York, United States). Statistical significance was taken at P < 0.05.
Results
Pilot exercise
Four experts completed the 11 pilot exercises. Median Likert scores related to each
exercise can be found in [Fig. 2], with qualitative feedback presented in Supplementary Table 1. There was a variation in Likert scores across exercises (median [IQR] 4 [3–4.75];
P < 0.005), with the face validity of Loop Management 1, Loop Management 2, and European
Society of Gastrointestinal Endoscopy photo exercises receiving the lowest scores
(median 2, 1.5, and 2 respectively; P = 0.029, [Table 1]).
Fig. 2 Evaluation of each pilot exercise by 4 expert endoscopists (Likert scores – 1: Very
poor to 5: Very good)
Validation study
Ten experts completed the 13-exercise simulator-based training pathway with 35 individual
metrics: amounting to a total of 859 metric values. Overall expert performance demonstrated
equivalence (P = 0.992). Variation in individual metric values related to individual expert performance
can be found in [Table 3]. There was an equal representation of consultant gastroenterologists to consultant
surgeons with no variation in performance between specialty roles (P = 0.472). All experts had performed more or equal numbers of OGDs (median 2500 [2000–5000]),
compared with colonoscopies (1500 [100–2500]). The face validity of each exercise
varied among experts (median Likert score 4 [3–4.75], P = 0.003) with Loop Management and Intubation Case 3 exercises receiving the lowest
scores (median Likert score 3 in both cases, P < 0.001, P = 0.004 respectively, [Table 3]). Median Likert scores related to each exercise can be found in [Fig. 3].
Table 3
Variation in metric values related to performance of 10 experts.
DOPS category
|
Metric
|
Value
|
Median [IQR]
|
P value
|
Scope handling
|
Colonoscope rotation
|
Degrees
|
2758 [1540–4142]
|
0.912
|
Slot collisions
|
Number
|
3 [2–5]
|
0.437
|
Insertion path length
|
mm
|
1114 [883–1664]
|
0.434
|
Targets photographed
|
%
|
100 [100–100]
|
1.000
|
All photo targets complete
|
Yes/no
|
1 [1–1]
|
0.437
|
Deviations from 45 degrees
|
Number
|
3 [3–12]
|
0.437
|
Angulation tip control
|
Missed target
|
Number
|
0 [0–1]
|
0.437
|
Knob rotation left/right
|
Degrees
|
240 [63–964]
|
0.026
|
Knob rotation up/down
|
Degrees
|
1622 [846–3655]
|
0.268
|
Probed outside of target
|
Number
|
3 [2–6]
|
0.437
|
Targets probed
|
%
|
100 [100–100]
|
1.000
|
Into trachea
|
Yes/no
|
0 [0–0]
|
1.000
|
Collisions against mucosa
|
Number
|
5 [4–9]
|
0.038
|
Average photo quality
|
%
|
100 [95–100]
|
0.437
|
Tip path length
|
mm
|
3102 [2383–6266]
|
0.955
|
Targets aligned
|
%
|
100 [100–100]
|
1.000
|
Red out
|
Number
|
0 [0–1]
|
0.437
|
Time in red out
|
Seconds
|
0 [0–1.25]
|
0.437
|
Pace and Progress
|
Total time
|
Seconds
|
163 [101–227]
|
0.069
|
Time to papilla
|
Seconds
|
62 [44–74]
|
0.187
|
Visualisation
|
Targets seen
|
%
|
100 [100–100]
|
0.437
|
Targets inspected
|
%
|
95 [90–100]
|
0.126
|
Lumen seen
|
%
|
100 [100–100]
|
0.037
|
Lumen inspected
|
%
|
99 [98–99]
|
0.109
|
Stomach visualized
|
%
|
97 [93–99]
|
0.259
|
Duodenum visualized
|
%
|
46 [42–49]
|
0.365
|
Papilla reached
|
Yes/no
|
1 [1–1]
|
1.000
|
Patient comfort
|
Max Torque
|
Newton
|
0.3 [-0.1–3.4]
|
0.437
|
Max insertion force
|
Newton
|
7.5 [2.9–19.3]
|
0.437
|
Miscellaneous
|
Tool unprotected
|
mm
|
1212 [277–3602]
|
0.849
|
Side view assistance
|
Seconds
|
0 [0–11]
|
0.027
|
Net insufflation
|
|
0 [0–0]
|
1.000
|
Time in excess insufflation
|
Seconds
|
0 [0–0]
|
0.423
|
Percentage of time insufflation
|
%
|
1.5 [0–7]
|
0.075
|
Excess insufflations
|
Number
|
0 [0–0]
|
0.423
|
DOPS, direct observation of procedural skills; IQR, interquartile range.
Fig. 3 Evaluation of each exercise by 10 expert endoscopists (Likert scores – 1: Very poor
to 5: Very good)
A validated training pathway (SPICE) of VR endoscopy training, with clearly defined
performance metrics (pass marks) was developed and the benchmark metric values populating
the EndoSim simulator can be found in Supplementary Table 2.
Achieving the median expert performance was deemed equivalent to full marks i. e.,
100 %. Allowance buffers were created after review of the relevant published literature
[7]
[18]
[19]
[20]. For metrics where higher scores related to improved performance, for example percentage
of mucosa visualized, the lower quartile provided a buffer to achieve the minimum
pass mark, with scores increasing incrementally up to a maximum of 100 % – the expert
median value. Where higher scores relate to poorer performance, for example more mucosal
collisions, the upper quartile provided the buffer to achieve a minimum pass. Participants
must pass all metrics in every exercise to achieve an overall pass.
Discussion
This is the first study to investigate the potential of a validated VR simulation
pathway with benchmark performance metrics providing computer-generated feedback set
by expert operators and linked with specific domains of DOPS global task performance.
The principal findings were equivalent and consistent expert performance across all
13 set simulation exercises. Thirty-one of 35 metrics (89 %) were equivalent with
four (11 %) exhibiting variation, namely: rotation control, mucosal collision, luminal
visualization, and side view assistance – an EndoSim training tool offered to visualize
intraluminal scope position.
These findings bolster those reported by Siau et al when the construct validity of
EndoSim to discriminate between expert, intermediate and novice performance was confirmed
[21], though further work was suggested to appraise the face validity and explore the
relevance of the observed differences to clinical practice: leading to the development
of expert benchmark scores in this study.
Simulation-based curricula have existed since the 1980 s, incorporating a human body
Cardiology Patient Simulator named “Harvey” [22] alongside a simulated core-curriculum, supported by an additional slide deck [23]. The simulator-trained group achieved a two-fold performance improvement in their
multiple-choice knowledge and skills tests, in both simulator and live clinical settings
when compared with standard patient-based training [24]. With specific regard to endoscopy simulation curricula the historical focus has
predominantly examined colonoscopy. Grover et al reported a structured progressive
learning curriculum of increasing difficulty using the EndoVR endoscopy simulator,
resulting in improved performance at colonoscopy, as measured by a 10 % improvement
in JAG DOPS scores, in a single-blinded randomized control trial of 37 novice endoscopists
[25]
[26]. This improvement was further augmented by over 10 % when incorporating simulated
non-technical skills training, which was sustained at 6 weeks [27]. With regard to VR training across the three procedures of OGD, flexible sigmoidoscopy
and colonoscopy, the most pertinent study is the Cochrane systematic review reported
by Khan et al in 2018 [14]. Eighteen trials were included (421 participants; 3817 endoscopic procedures). The
quality of evidence was rated as moderate, low, or very-low due to risk of bias, imprecision,
and heterogeneity, and consequently, a meta-analysis was not performed. There was
insufficient evidence to determine the effect on competency composite score (mean
difference 3.10, 95 % CI –0.16 to 6.36; 1 trial, 24 procedures; low-quality evidence).
The most positive conclusion was that VR training compared with no training likely
provided participants with some benefit, as measured by independent procedure completion
(RR 1.62, 95 % CI 1.15 to 2.26; 6 trials, 815 procedures; moderate-quality evidence).
Moreover, Virtual Reality training in combination with conventional training appeared
to be advantageous over VR training alone. With specific regard to upper gastrointestinal
endoscopy simulation, there has been debate regarding the face and content validity
of the Symbionix GI Mentor II [28] and the ability to draw conclusions regarding concurrent validity in a pilot study
of eight novice endoscopists. Ferlitsch and colleagues [29] have since reported that the same simulator shortened the time taken to intubate
the duodenum and improved technical accuracy in the simulator-trained group: with
results maintained up until 60 endoscopies. This study sought to establish the face
validity of the EndoSim simulator over a wider range of endoscopic handling domains,
mapped to the JAG DOPS parameters.
The study has several inherent limitations. Any simulator-based training pathway represents
an adjunct, and not a replacement for live clinical hands-on learning. The EndoSim
simulator does not replace scenarios best experienced in front-line medical practice
such as: consent, gastrointestinal lesion recognition, the management of complications
and does not address pre- and post-procedure skills as recorded by the DOPS tool.
Similarly, management of findings is beyond the scope of this simulator, which focuses
on acquisition of scope handling skill: this skill is addressed both clinically and
in other areas of Health Education and Improvement Wales’ SPRINT program. SPRINT:
a Structured PRogramme for INduction and Training, is an existing initiative to improve
OGD training delivery to novice endoscopists and incorporates integrated simulator
and lesion recognition training, with endoscopic non-technical skills, and has been
reported to shorten the time taken for trainees to complete the requisite 200 procedures
as stipulated for JAG accreditation [30].The EndoSim SPICE development focused on basic scope handling and the examination
of the upper gastrointestinal tract. The pilot study revealed poorer face validity
of the representative lower gastrointestinal exercises, which measure very limited
metrics. Loop Management and Intubation Case 3 scoring less well, poor-fair (Likert
scale 1–3), in both the pilot and validation study.
Face validity is subjective. The Loop Management exercise measured only the time taken
to complete the procedure and was not considered by experts to be able to discriminate
between poor or good performance, and consequently should not contribute to the overall
score and pass mark equating to competency. Loop management, an important skill in
lower gastrointestinal endoscopy, falls outside the basic scope handling remit of
this training pathway and requires extra-simulator techniques such as patient positioning
and abdominal pressure [31]. This corroborates Dyke et al reported findings that the requirements for teaching
loop resolution is difficult to achieve through simulation alone [32]. Intubation Case 3 exercise was developed from loop management and measured maximum
insertion force and maximum torque as well as time taken – nevertheless the face validity
was still considered poor when compared with other exercises. Arguably, an alternative
measure of face validity for lower gastrointestinal exercises could use a trainer’s
subjective opinion of the lower gastrointestinal exercises as a training tool in individual
trainee specific cases. Such an approach however would not provide expert level metrics
or benchmarks, and moreover would require equal number of faculty trainers to trainees,
removing one potential benefit and efficiency of a simulator-based training pathway.
The scope handling skills developed in other exercises are transferrable, and therefore,
applicable, conferring benefit to all groups of novice trainees in both upper and
lower gastrointestinal endoscopy.
Issenberg et al. supports the importance of feedback in facilitating simulated learning,
alongside repetitive practice and curriculum integration [33] Cross-referencing the deconstructed skills, as measured by metric, per exercise
against the JAG validated DOPS tool has allowed focused simulator-generated feedback
grouped into the following domains: scope handling, angulation and tip control, pace
and progress, visualization, and patient comfort.
Conclusions
This study demonstrated very good face validity of the EndoSim SPICE for providing
early skills development for OGD, despite the inherent limitations in using computer-based
programs to teach patient-based skills. Moreover, the training pathway provides immediate,
computer-generated feedback, aligned with specific domains of DOPS global task performance
– adding value to existing simulation curricula. Simulators offer a valuable aide
to the modalities available for education in high-risk, reproducible training scenarios.
Better understanding of their role in early training and optimization and incorporation
into the wider elements of the emerging curriculum alongside knowledge acquisition
is critical, especially during recovery from the impact of the COVID-19 pandemic and
the resultant deficit in endoscopy service and training provision. A validated VR
endoscopy SPICE, informed by expert level benchmarks and aligned to JAG DOPS domains,
provides the basis to define simulation’s training role. The training pathway should
be evaluated in a novice endoscopist setting to assess the translation of simulator-learned
skill into clinical practice, when compared with simulator-naïve novice control endoscopists. Such an approach will be an essential component to successfully embed such programs
into endoscopy training.