Introduction
Advanced endoscopy fellowships have traditionally been taught through an apprenticeship
model, wherein trainees perform procedures under the supervision of an experienced
endoscopist [1]. In this model, the supervisors’ subjective assessment and procedural volume have
been used as surrogates for competency. A growing emphasis on ensuring graduating
trainees are ready to practice effectively has supported the shift toward competency-based
medical education (CBME) for advanced endoscopic procedures. CBME allows for documentation
of learner acquisition of the requisite skills needed for independent practice while
offering formative feedback to the trainee based on their measured performance throughout
the training period [2]
[3]. Accurate, timely, and meaningful assessment tools for advanced endoscopy trainees
(AETs) are critical to track achievement of cognitive and technical milestones [4].
There is growing concern over use of arbitrary procedural volume thresholds in advanced
endoscopy training. Learning curve analyses have reported that the number of endoscopic
retrograde cholangiopancreatography (ERCP) procedures to achieve competence in bile
duct cannulation can range from 79 to more than 300 [5]. These findings support the notion that uniformly applying volume thresholds neglects
the variable rates at which trainees learn. Recent reports have revealed that some
trainees fail to reach competency by the end of their fellowship [6]
[7]. Given the higher rates of potentially serious complications with ERCP compared
to other endoscopic procedures [8], efforts to systematically document trainees’ competence and demonstrate readiness
for practice are needed.
Canadian advanced endoscopy training programs do not currently have uniform curricular
or assessment requirements [9]. Recommendations from the American Society for Gastrointestinal Endoscopy (ASGE)
suggest the use of a task-specific competence assessment tool to facilitate grading
of technical and cognitive skills continuously throughout fellowship training [4]
[10]. Several such tools, including The endoscopic ultrasound (EUS) and ERCP Skills Assessment
Tool (TEESAT) and the ERCP Direct Observation of Procedural Skills (DOPS) tool, have
been used to evaluate trainees in the United States and United Kingdom [11]
[12]. In this study, we examined Canadian AETs’ learning curves and achievement of competence
using an ERCP assessment tool with strong evidence of validity.
Methods
This multicenter prospective study of AETs was conducted at five academic institutions
for the 2017–2018 training cycle. All participants provided informed consent. This
study received approval from the research ethics board at each participating site.
Setting
The five sites in this study were St. Michael’s Hospital in Toronto, University of
British Columbia, University of Calgary, University of Alberta, and the University
of Ottawa. The advanced endoscopy fellowships at these institutions are focused on
ERCP and EUS. AETs eligible to participate if they were enrolled in an advanced endoscopy
program and had completed at least two prior years of core gastroenterology or five
years of general surgery training. Prior to the start of the fellowship, AETs completed
a questionnaire on prior ERCP experience.
Assessment protocol
Trainees were assessed throughout their fellowship year. After completion of an initial
25 ERCPs to allow for trainee orientation, attending endoscopists assessed every fifth
procedure and provided formative feedback to trainees. Data on the procedure, trainee,
rater, and site were collected prior to each assessment.
Assessments were conducted using the United Kingdom Joint Advisory Joint Advisory
Group of Gastrointestinal Endoscopy (JAG) ERCP DOPS tool, a direct observation assessment
tool that grades the technical, cognitive, and integrative skills for ERCP [11]. Prior to the study period, all attending endoscopists underwent a five-hour training
session during which the reviewed the ERCP DOPS items, scores, and anchors, assessed
example cases, and discussed ratings among each other. Trainers were asked to complete
assessments immediately after procedures and provide specific feedback based on ERCP
DOPS domains.
This formative tool was developed through multidisciplinary consensus and has strong
validity evidence for direct observational assessment of ERCP [11]. It contains 27 task-specific items within six domains: pre-procedure, intubation
and positioning, cannulation and imaging, selected therapies, post-procedure, and
non-technical skills. Assessors rated each item and the level of supervision needed
and overall level of supervision on a four-point scale based. Assessors also graded
the difficulty of the each procedure as grade 1 (deep cannulation of duct of interest
via main papilla, biliary sampling, and biliary stent removal or exchange), grade
2 (biliary stone extraction ≤ 10 mm, treatment of bile leaks or extrahepatic strictures,
and prophylactic pancreatic stent insertion), grade 3 (biliary stone extraction ≥ 10 mm,
minor papilla cannulation, treatment of pancreatic strictures, removal of pancreatic
duct stones ≤ 5 mm, and treatment of strictures in hilum or above), or grade 4 (removal
of internally migrated pancreatic stent, removal of pancreatic stones ≥ 5 mm, removal
of intrahepatic stones, and ERCP after Whipple or Roux-en-Y bariatric surgery) using
an ASGE framework on procedural complexity [13].
Outcomes
The primary outcome was overall performance on the ERCP DOPS, rated on a four-point
rating scale ranging from 1 (maximal supervision), 2 (significant supervision), 3
(minimal supervision), to 4 (competent for independent practice) [12]. Maximal supervision was selected if the supervisor undertakes the majority of the
tasks/decisions & delivers constant verbal prompts while competent for independent
practice was selected if no supervision was required (Supplementary file 2). Secondary outcomes were the performance of individual skills noted in the ERCP
DOPS tool, including technical (e. g. selective cannulation, sphincterotomy, biliary
stenting and tissue sampling) and non-technical (e. g. communication and teamwork,
situational awareness, leadership, and judgment and decision-making) domains.
Statistical analysis
We used cumulative sum (CUSUM) analysis to create learning curves for overall supervision
ratings and for individual elements of the ERCP DOPS by plotting successful outcomes
of consecutive procedures performed each month over the duration of the training program
[14]. CUSUM allows for continuous monitoring of a trainee’s performance and detection
of deviations from predefined standards [15]
[16]
[17]. When used in training programs, it can enable and earlier recognition of deficiencies
and provision of feedback to address them [18]
[19]
[20].
For the CUSUM calculation, successful outcomes are given a score of s, and unsuccessful outcomes are given a score of 1-s. We defined competence (a successful outcome) as an ERCP DOPS rating of 3 (minimal
supervision) or 4 (competent for independent practice) for ≥ 80 % of the procedures
in a month, which is often the goal of ERCP training programs [21]
[22]. The value of s is determined by a predefined acceptable failure rate (p0), which represents the failure
rate for competent practitioners, and the unacceptable failure rate (p1, where p1 – p0 represents the maximum acceptable level of human error), which is typically two to five fold higher than p0 [14].
We used a p0 score of 0.2, which has been used in a previous ERCP learning curve analysis
[23] and is in line with the 80 % goal success rate of ERCP programs [21]
[22], and a p1 score of 0.5. Decision limits were calculated based Type I (false-positive)
error rate (α), and the Type II (false-negative) error rate (β) of 0.1 each [14], and the above failure rates. Based on learning curve plots, if the CUSUM curve
crosses the higher decision limit (unacceptable failure) from below, the trainee has
reached the preset unacceptable failure rate. If the CUSUM curve crosses the lower
decision limit (competence) from above, the trainee has achieved competence. STATA
statistical software was used for analysis.
Results
Eleven trainees were invited and all participated in this study. The median number
of cases trainees had performed prior to their fellowship was 50 (IQR 25–400).
Our study sample consisted of 261 ERCP procedures evaluated using the ERCP DOPS tool.
The actual number of procedures performed by each trainee was higher given that every
fifth procedure was evaluated. The median number of evaluations by site was 49 (Interquartile
range (IQR) 31–76) and by trainee was 15 (IQR 11–45). Based on the ASGE procedural
complexity framework, 42 procedures (16 %) were rated grade 1, 163 rated grade 2 (63 %),
52 rated grade 3 (20 %), and three rated grade 4 (1 %). There were 191 (73 %) native
papilla cases. The overall cannulation rate by trainees was 82 % (241/261), and the
native papilla cannulation rate was 78 % (149/191). Baseline characteristics are summarized
in [Table 1].
Table 1
Summary of procedure characteristics.
|
Characteristic
|
n (%)
(N = 261)
|
|
Overall difficulty of cases
|
|
|
42 (16 %)
|
|
|
163 (63 %)
|
|
|
52 (20 %)
|
|
|
3 (1 %)
|
|
Successful cannulation by trainees
|
214 (82 %)
|
|
Number of cases with a native papilla
|
191 (73 %)
|
|
Successful cannulation of native papilla by trainees
|
149 (78 % of native papilla cases)
|
Learning curves
Using a predefined competence threshold of 80 % of the procedures in a month receiving
a “minimal supervision” or “competent for independent practice” rating, all 11 (100 %)
trainees reached competence for overall supervision by the end of their fellowship
([Fig. 1]).
Fig. 1 Cumulative sum analysis learning curve of overall level of supervision, tissue sampling,
and wire management with the upper solid line signifying the unacceptable failure
threshold, and the lower solid line representing the competence threshold.
Trainees reached competency in all pre-procedural, intubation and positioning, post-procedure,
and endoscopic non-technical skills items by six months (Supplementary file 1). Within the cannulation and imaging domain, trainees reached competency for selective
cannulation at 12 months, wire management at six months, sphincterotomy at six months,
stone therapy at six months, and stenting at 12 months ([Fig. 1]). Trainees did not reach competency for tissue sampling or sphincteroplasty during
their fellowship. ERCP DOPS items and the percentage rated as competent throughout
the fellowship year are detailed in [Table 2].
Table 2
Competence in ERCP DOPS items by 3-month intervals.
|
Item, %
|
Percentage of assessments rated as competent stratified by time
|
|
First quarter
|
Second quarter
|
Third quarter
|
Fourth quarter
|
|
Overall supervision
|
58
|
83
|
91
|
96
|
|
Pre-procedure
|
|
Indication
|
92
|
93
|
100
|
100
|
|
Risk
|
91
|
93
|
100
|
100
|
|
Preparation
|
91
|
90
|
100
|
100
|
|
Equipment check
|
86
|
92
|
99
|
100
|
|
Consent
|
92
|
93
|
100
|
100
|
|
Sedation and monitoring
|
95
|
92
|
99
|
100
|
|
Intubation and positioning
|
|
Intubation of esophagus and duodenum
|
91
|
90
|
100
|
100
|
|
Visualization and position relative to ampulla
|
80
|
90
|
96
|
100
|
|
Patient comfort
|
92
|
92
|
99
|
100
|
|
Cannulation and imaging
|
|
Selective cannulation
|
70
|
73
|
89
|
95
|
|
Wire management
|
81
|
90
|
97
|
97
|
|
Image quality and interpretation
|
81
|
93
|
96
|
99
|
|
Decision about appropriate therapy
|
83
|
93
|
99
|
99
|
|
Sphincterotomy
|
77
|
89
|
86
|
93
|
|
Sphincteroplasty
|
54
|
75
|
91
|
73
|
|
Stone therapy
|
77
|
97
|
94
|
93
|
|
Tissue sampling
|
77
|
71
|
100
|
65
|
|
Stenting (metal and plastic)
|
79
|
75
|
97
|
73
|
|
Actions to minimize pancreatitis
|
85
|
88
|
100
|
100
|
|
Complications
|
96
|
95
|
94
|
100
|
|
Post-procedure
|
|
Report writing
|
90
|
92
|
100
|
100
|
|
Management plan
|
89
|
93
|
99
|
100
|
|
Endoscopic non-technical skills
|
|
Communication and teamwork
|
89
|
95
|
99
|
100
|
|
Situational awareness
|
88
|
95
|
96
|
100
|
|
Leadership
|
85
|
95
|
97
|
100
|
|
Judgment and decision making
|
85
|
95
|
99
|
100
|
ERCP, endoscopy retrograde cholangiopancreatography; DOPS, direct observation of procedural
skills.
Discussion
We evaluated learning curves using CUSUM analysis for ERCP among AETs at five Canadian
advanced endoscopy programs. Using the ERCP DOPS tool, all of the trainees achieved
competency for overall procedure performance within their fellowship year. Trainees
achieved competence in all non-technical domains as well, including communication,
teamwork, situational awareness, leadership, and judgment and decision-making. With
respect to technical skills, competence was achieved in all tasks related to cannulation
and imaging except for tissue sampling.
Previous studies on ERCP learning curves using the TEESAT are concordant with our
findings of trainees achieving competence in ERCP during their fellowship [6]
[12]. Trainees achieved biliary cannulation rates of over 80 % by the end of their fellowship
in keeping with the goals of most endoscopy training programs [22]. While trainees did not meet the 85 % and 90 % thresholds suggested for practicing
endoscopists by the British Society of Gastroenterology and ASGE respectively [21]
[22], a report tracking AET performance suggested that they achieve > 90 % cannulation
by the end of their first year of independent practice, even if they did not achieve
competence during their fellowship [24].
Our learning curve data provide important insight into development of competency for
specific technical and non-technical skills. Trainees reached the competency threshold
for sphincterotomy and stone therapy prior to selective cannulation, in keeping with
a learning curve analysis by Ekkelenkamp and colleagues [7]. Additionally, trainees in our study achieved competency in non-technical skills
early in their fellowship. Deficiencies in these skills are associated with adverse
patient events [25]
[26]. Non-technical skills, such as situational awareness, judgment and decision-making,
and teamwork help endoscopy teams (e. g. trainee, supervisor, endoscopy nurse) understand
their roles, anticipate and respond to unexpected or challenging circumstances, and
prevent errors through open communication [27]. While comparing our findings to other studies is difficult as most ERCP tools focus
only on technical and cognitive skills [4]
[28]
[29], our findings may reflect ongoing efforts to formalize and integrate non-technical
skills training into core gastroenterology curricula [27]
[30].
Importantly, of the technical skills important to practicing independently, trainees
did not reach the competency threshold in tissue sampling or sphincteroplasty. ERCP
brush sampling of biliary strictures to date has been considered a low-complexity
task in ERCP [31], however the few studies that have examined learning curves for this skill suggest
that this is not the case. Results the United Kingdom demonstrate that ERCP tissue
sampling is one of the last specific tasks where trainees achieved competency during
their training [11]. Non-diagnostic sampling of biliary strictures can lead to further costly and more
invasive interventions such as cholangioscopy, EUS-guided biopsy, percutaneous biopsy
and unnecessary surgery [32]
[33]. Trainees in our study also struggled with sphincteroplasty, in keeping the United
Kingdom-based study [11], where learners did not reach competency by 300 procedures. ERCP sampling and sphincteroplasty
are not highly emphasized in ERCP current training curricula despite being core skills
for this procedure [10]. Advanced endoscopy training programs consider supplementing live training with
simulators or animal models [34]
[35]
[36]
[37]. Supervisors may also benefit from offering targeted feedback and help trainees
generate learning plans for these tasks. This can be enabled through the use of direct
observation tools which include tissue sampling and sphincteroplasty such as the DOPS
[28]
[29]. Additionally, training programs can create more resources to highlight evidence-based
approaches to achieving competency for sphincteroplasty and tissue sampling to improve
diagnostic yield [38].
Our study has several important limitations. First, our relatively small sample size
of procedures limited our ability to evaluate trainees competence based on more stringent
definitions of competence and acceptable failure rates. The sample size also precluded
meaningful regression analyses to identify trainee characteristics associated with
success. Second, we did not account for factors which may have introduced bias, such
as trainee prior ERCP experience, success rates of native vs. non-native papillae,
and how cases of various difficulty were assigned to trainees by their supervisors.
Third, we did not track clinical outcomes of patients included in this study and attempt
to correlate them to ERCP DOPS scores. Fourth, we applied a competency threshold for
cannulation to all other ERCP DOPS domains. While this is intuitively acceptable,
it is not ground in any data. Fifth, this study did not include all advanced endoscopy
programs in the country, limiting the generalizability of our findings. Sixth, certain
advanced techniques such as precutting or double-guidewire cannulation are often not
part of advanced endoscopy training and thus were not evaluated in this study. Finally,
as with other studies using subjective assessment methods to evaluate learning curves,
unmeasured biases, such as knowing that trainees were approaching the end of their
training, may have impacted supervising endoscopists and their assessments.
Conclusions
Despite these limitations, our study shows that Canadian AETs are graduating from
fellowship programs with acceptable levels of competence for overall ERCP performance
and for the majority of specific intra-procedural tasks. We also add to the growing
body of literature in ERCP training that supports the shift away from volume-based
training and toward assessing well defined competencies using pre-established thresholds.
With the understanding that trainees will acquire skills at different rates [39], the incorporation of assessment tools to generate individual learning curves can
help identify deficiencies, enable goal-directed and actionable feedback, and allow
trainees to generate learning plans [40]. Learning curves may also help identify areas in which many trainees are deficient
and that may require supplementary training, such as tissue sampling. In addition,
aggregate data from multi-institution samples of trainees can establish competency
thresholds at a national level to be used for credentialing purposes [18]
[41]. Meaningful assessment practices with validated tools can improve learning for trainees
and help ensure they achieve the knowledge and skills needed for high-quality advanced
endoscopic care.