Introduction
Self-assessment of endoscopic performance by trainees allows for regulation of learning and skill acquisition [1]
[2]. The Joint Advisory Group on Gastrointestinal Endoscopy (JAG) recommends that trainees incorporate self-assessment practices into their self-regulated development [3] and the American Society of Gastrointestinal Endoscopy (ASGE) provides tools for self-assessment of endoscopic performance [4]. To be an effective source of feedback, self-assessment must be accurate.
The endoscopic literature, however, has shown that novices have inaccurate self-assessment [5]
[6]
[7]
[8]. A recent cross-sectional study on colonoscopy found that novices have less accurate self-assessments compared to more experienced endoscopists [6], consistent with studies in other procedure-related domains [9]
[10]. In addition, a study of simulated polypectomy found a weak correlation between self- and externally-assessed performance scores among novices [5].
Video-based feedback has been proposed to remedy deficiencies in self-assessment ability. Several studies found that allowing novices to review videos of their own performances [11], of a benchmark performance (i. e. video of an expert completing the procedure) [8], or both [12], improved self-assessment accuracy. The impact of video-based interventions on endoscopists’ self-assessment accuracy, however, is unclear. Moreover, combined use of both self-video review and benchmark video review has not been investigated in a procedural setting. The aim of this study was to ascertain the comparative effectiveness of three different video-based interventions on self-assessment accuracy of endoscopic competence in esophagogastroduodenoscopy (EGD).
Material and methods
This single-blinded, parallel-arm, prospective randomized controlled trial was conducted at a tertiary care academic center. Approval was granted by the St. Michael’s Hospital Research Ethics Board (14 – 160) and written informed consent was obtained from all participants. Reporting of the findings followed the CONSORT statement [13]. All authors reviewed and approved the final manuscript. No changes to methodology were made after trial commencement.
Participants
One author (MAS) used purposive sampling to recruit novice endoscopists, defined as individuals who had performed fewer than 20 previous EGDs, in the clinical and/or simulated settings [14]. Participants were randomized with an allocation ratio of 1:1:1 to one of the following three groups: (1) self-video review (SVR); (2) benchmark video review (BVR); or (3) self- and benchmark video review (SBVR). Randomization was conducted by one author (RK) using a sealed envelope technique. The random allocation sequence was generated by another author (CW). It was not possible to blind participants to their assigned group.
Procedure
The study methodology is summarized in [Fig. 1]. The EndoVR endoscopy simulator was used for all assessments (CAE Healthcare Canada, Montreal, Quebec, Canada). This simulator models an EGD by using an endoscope that is inserted into a computer-based module and displays the esophageal lumen of a virtual patient on a screen. This simulator was chosen for offering a wide range of EGD cases with variable difficulty and complexity [14]
[15]. Two EGD cases were used during testing: Case 1, which represented a 42-year-old male with epigastric pain and a pre-pyloric ulcer; and Case 2, which represented a 41-year-old female with dysphagia and esophageal candidiasis.
Fig. 1 Flowchart of study methodology.
Pre-intervention assessment
All participants completed a written questionnaire to collect information on demographic and background characteristics, including age, sex, level of training, and previous experience with endoscopic procedures. Each participant completed an EGD case on the VR simulator (Case 2). A maximum of 15 minutes was allotted for case completion. All participants were video recorded during each of their procedures (as described below). Participants received no external feedback regarding their performance during the assessments.
Video-based interventions
After completion of the first case, participants received a video-based intervention, according to the group to which they were randomized. The SVR, BVR and SBVR groups were all modeled on the mode of video delivery used in previous studies using self-video review [11], benchmark video review [11] and combined self-review with benchmark review [12].
SVR group
The SVR group was provided with access to footage of their own performance of their first EGD case. Participants had 15 minutes to review the video and could cue forward and backward at their own discretion.
BVR group
The BVR group was provided with access to a benchmark video of the simulated EGD case (Case 2) which featured a demonstration of the task as completed by an experienced endoscopist (> 500 endoscopic procedures). Participants had 15 minutes to review the video and could cue forward and backward at their own discretion.
SBVR group
The SBVR group was provided access to footage of their own performance and the benchmark performance during a 15-minute period. They could cue forward and backward and switch between the videos at their own discretion.
Post-intervention assessments
After completion of their assigned video-based intervention, each participant then completed the same simulated EGD case (Case 2) again, followed by a new case (Case 1). A maximum of 15 minutes was allotted for completion of each case. As before, all performances were recorded.
Assessment tools
Performances of the simulated EGD procedures were assessed using the Gastrointestinal Endoscopy Competency Assessment Tool (GiECAT), a direct observational assessment tool, with strong evidence of reliability and validity in the clinical [16]
[17] and simulated settings [18]
[19]. The GiECAT is composed of a global rating scale (GRS) and structured checklist. Only the GRS component of the GiECAT was used as the items on the GRS are transferable across endoscopic procedures [20]. The GRS assess seven domains (technical skill; strategies for scope advancement; visualization of mucosa; independent procedure completion (need for assistance); knowledge of procedure; interpretation and management of findings; and patient safety) using a 5-point Likert scale with descriptive anchors reflective of the degree of autonomy demonstrated by the endoscopist. Ratings of the seven items on the GRS are totaled to generate scores from 7 to 35. Percentage scores can also be calculated.
Assessments
Video recordings
All three simulated EGD cases performed by each participant were recorded. The protocol for videotaping and editing the video feed of the endoscope’s intraluminal view and the endoscopist’s hands was adapted from a previous study [21]. Segments of audio and/or video that identified the endoscopist were edited to ensure anonymity. In addition, participants’ video review period was video-recorded, which allowed for calculation of the time spent viewing the assigned video(s).
External assessment
Video recordings of all three EGD cases were assessed by two blinded raters (experienced endoscopists who had completed > 500 procedures) using the GiECAT GRS. Raters were asked to watch each video in its entirety and to use the full range of responses. A second rater was employed to ensure adequate interrater reliability and was thus assigned a subset of the videos.
Self-assessment
Participants assessed their own performance at four time points: “Assessment 1a,” which was immediately after their first simulated EGD; “Assessment 1b,” which was immediately after completion of their assigned intervention (SVR, BVR, SVBR) and involved a reappraisal of their first procedure; “Assessment 2,” which was immediately after completion of their second simulated EGD; and “Assessment 3,” which was immediately after completion of their third simulated EGD. Participants self-assessed their EGD performance at each time point using the GiECAT GRS and were asked to use the full range of responses. The time-period from Assessment 1a to Assessment 3 was no more than 1 hour, as participants were allotted a maximum of 15 minutes to complete each EGD.
Outcome measures
We determined the between- and within-group impacts of the three video-based interventions on self-assessment accuracy for simulated EGD. Self-assessment accuracy was determined by comparing ratings assigned by participants and external assessors on the GiECAT GRS.
Sample size calculation
Based on previous work using educational interventions in endoscopic training, we estimated that 17 participants would be required per group [18]. Under this assumption, we recruited a total of 51 participants.
Statistical analysis
Demographic variables, endoscopic experience, and time spent on the respective video interventions were summarized using descriptive statistics. Calculation of GiECAT GRS percentage score was adapted from the original paper [17]. The mean of the assessments from the two raters were used; when both were not available, only one rater’s score was used. The second rater assessed the performance of 31 participants (61 %). For these performances, the interrater reliability of the video-based expert assessments was calculated using the intraclass correlation coefficient (ICC2,1), 2-way random-effect model for average measures.
To determine self-assessment accuracy, two approaches were used based on recommendations from the method comparison literature [22] and from a previous study examining self-assessment accuracy of endoscopic competence [6]. First, to determine overall self-assessment accuracy of participants at baseline (i. e. prior to the intervention), the ICC1,1 (1-way random-effects model for both single measures [individual rater] and average measures [average of 2 raters’ scores]) was calculated using the GiECAT GRS scores assigned by external assessors and by participants for a single EGD procedure. Second, a Bland-Altman analysis was used to compare agreement between self- and externally-assessed GiECAT GRS scores at baseline (i. e. assessment 1a) among the three groups [23].
To evaluate the impact of the video-based interventions on self-assessment accuracy, absolute difference scores (ADS) between externally- and self-assessed GiECAT GRS scores among the three groups were determined. To determine if there was a between-group effect, Kruskal-Wallis tests were completed for the ADS among the three groups at each assessment (Assessment 1a, 1b, 2, 3). To determine if there was a within-group effect, Friedman tests were completed for the ADS over the four assessment time points (Assessment 1a, 1b, 2, 3) for each group.
All analyses were conducted using SPSS 20 (IBM, Armonk NY). Interpretation of the ICC followed suggested guidelines, wherein values 0.21 – 0.40 are considered “fair,” 0.41 – 0.60 “moderate,” 0.61 – 0.80 “substantial,” and > 0.80 “almost perfect” [24]. Any significant effects on the Kruskal-Wallis and Friedman tests were further analyzed using Mann-Whitney U tests and Wilcoxon signed-rank tests, respectively. Multiple post hoc comparisons were corrected for using the Dunn-Sidak adjustment, following a pairwise approach [25]. Effect size was calculated using eta squared (η2) for Kruskal-Wallis tests and Kendall’s W for Friedman tests [26]. For all statistical tests, an alpha of 0.05 was set as the cut-off for statistical significance.
Results
A total of 51 novice endoscopists were randomized and completed the study. Participant demographics and endoscopic experience are outlined in [Table 1]. Inter-rater reliability for the two video-based external reviewers was good, as indicated by an ICC2,1 value of 0.73 (95 % CI, 0.43 – 0.87), 0.88 (0.74 – 0.94), and 0.73 (0.42 – 0.87) for assessments 1a, 2, and 3, respectively.
Table 1
Endoscopist participant demographic characteristics and previous endoscopic experience.
Characteristic
|
SVR group (n = 17)
|
BVR group (n = 17)
|
SBVR group (n = 17)
|
Age (years), median (IQR)
|
27.0 (8.0)
|
27.0 (8.0)
|
27.0 (7.0)
|
Sex
|
Male, no. (%)
|
14 (82.4)
|
12 (70.6)
|
12 (70.6)
|
Female, no. (%)
|
3 (17.6)
|
5 (29.4)
|
5 (29.4)
|
Level of training or practice
|
Medical student, no (%)
|
6 (35.3)
|
5 (29.4)
|
6 (35.3)
|
Resident, no. (%)
|
8 (47.1)
|
11 (64.7)
|
9 (52.9)
|
Staff/attending, no. (%)
|
3 (17.6)
|
1 (5.9)
|
2 (11.8)
|
Hand dominance
|
Right, no. (%)
|
17 (100)
|
15 (88.2)
|
17 (100)
|
Left, no. (%)
|
0 (0)
|
2 (11.8)
|
0 (0)
|
Endoscopic experience
|
Number of previous colonoscopies completed, median (IQR)
|
0 (2.0)
|
0 (2.0)
|
0 (0)
|
Number of previous EGDs completed, median (IQR)
|
0 (4.0)
|
0 (2.0)
|
0 (0)
|
BVR, benchmark video review; EGD, esophagoduodenoscopy; IQR, interquartile range; SVR, self-video review; SBVR, self- and benchmark video review
Median time spent on the self-video review and on the benchmark video was 14 minutes, 34 seconds (IQR: 4 min, 3 s) and 13 minutes, 12 seconds (IQR: 6 min, 44 s) by the SVR and BVR groups, respectively. Median time spent by the SBVR group on the self-video review and on the benchmark video was 8 minutes, 1 second (IQR: 4 mins, 48 s) and 6 minutes, 47 seconds (IQR: 2 mins, 6 s), respectively.
Self-assessment accuracy
Baseline
Overall, there was moderate agreement between the external and self-assessments for the GiECAT GRS at baseline (i. e. assessment 1a), as evidenced by an ICC1,1 (average measure) of 0.74 (95 % CI, 0.48 – 0.88). In the Bland-Altman analysis, the mean of the differences between externally assessed and self-assessed GIECAT GRS scores was 4.2 (SD = 11.4) ([Fig. 2]). All but three data points fell within the 95 % limits of agreement, as two participants in the SBVR group fell above the upper limit and one participant in the SBVR group fell below the lower limit. There were no systematic differences between the three groups.
Fig. 2 Bland-Altman plot.
Effects of video-based interventions
The ADS for all assessments using the GiECAT GRS among the three groups is presented in [Table 2]. There was a significant effect of group for the absolute difference of externally- and self-assessed GiECAT GRS scores for procedure 1b (Kruskal-Wallis chi-squared = 9.782, P = .008, η2 = 0.17). There were no significant differences for procedure 1a (Kruskal-Wallis chi-squared = 4.122, P = .127), procedure 2 (Kruskal-Wallis chi-squared = 1.602, P = .449), or procedure 3 (Kruskal-Wallis chi-squared = 1.132, P = .519). Post hoc analysis indicated that the BVR group had a significantly smaller ADS compared to the SBVR group on procedure 1b (P = .005). There were no other significant differences.
Table 2
Absolute difference scores between external- and self-assessed GiECAT GRS scores for participants in the SVR, BVR and SBVR groups. Values are median ratings with the interquartile range in parentheses.
Procedure[1]
|
Absolute difference percentage Score (%)
|
P value[2]
|
SVR
|
BVR
|
SBVR
|
SVR-BVR
|
SVR-SBVR
|
BVR-SBVR
|
1a
|
7.1 (12.1)
|
5.7 (10.0)
|
11.4 (9.6)
|
NS
|
NS
|
NS
|
1b
|
10.0 (13.6)
|
5.7 (7.9)
|
14.3 (14.3)
|
NS
|
NS
|
0.005[3]
|
2
|
5.7 (13.6)
|
7.1 (13.2)
|
10.0 (12.5)
|
NS
|
NS
|
NS
|
3
|
14.3 (14.3)
|
14.3 (12.5)
|
6.4 (18.2)
|
NS
|
NS
|
NS
|
GiECAT, Gastrointestinal Endoscopy Competency Assessment Tool; GRS, global rating scale; NS, Not Significant (at P < .05)
1 Note that procedures 1a and 1b correspond to the periods before and after completing the assigned video-based intervention, respectively.
2 Significant differences between groups (P < .05). Post-hoc comparisons were carried out
by using Mann Whitney U tests.
3 Denotes a significant difference (P < .05)
There was a significant effect of time for the BVR group (Friedman chi-squared = 9.402, P = .024, η2 = 0.06) and for the SBVR group (Friedman chi-squared = 10.352, P = .016, η2 = 0.07). There was no significant effect of time for the SVR group (Friedman chi-squared = 1.432, P = .698). Post hoc analysis indicated that the BVR group had a significantly higher ADS on assessment 3 compared to assessment 1b (P = .030) and the SBVR group had a significantly lower ADS on assessment 3 compared to assessment 1b (P = .016). There were no other significant differences.
Discussion
We report the first study to assess the comparative effectiveness of various video-based interventions aimed at improving self-assessment accuracy of procedural skills. We found that benchmark video review on its own was beneficial in the short term only, while self-video review in isolation was not. In addition, we found that benchmark video review paired with self-video review improved self-assessment accuracy over time. Self-assessment is an essential skill wherein individuals monitor their learning and performance [27]. Accurate self-assessment involves adequate agreement between one’s own assessment when compared to an external standard [28]. Although novice endoscopists have been shown to have inaccurate self-assessment [29], our findings suggest that their abilities can be enhanced using video-based interventions.
There are several potential explanations for our results. The SVR group may have self-assessed themselves based on an overall impression of their performance, which did not change with self-video review alone as they had no appropriate external standard against which they could compare their own performance [30]. The benchmark video, on the other hand, likely provided an advantage to the BVR group as novices could use the expert performance to help identify flaws in their own endoscopic skills. This is consistent with a previous study in sigmoidoscopy, in which general surgery residents had improved self-assessment accuracy after watching an expert performance [8].
Given our finding that the BVR group had improved self-assessment accuracy compared to the SBVR group, in the short term, we hypothesize that the benefit of the benchmark video alone may be attributable to the lower cognitive load required to process one video. Conversely, participants in the SBVR group may have initially been challenged to effectively process both videos within the allotted time. With time, however, participants in the SBVR group may have been better able to reflect on their own video and the degree to which their performance met the benchmark standard, thereby informing their self-assessment. The finding that self-video review is only beneficial when combined with benchmark video-review is commensurate with previous work on this subject [31]. In addition, video-based feedback appears to mitigate the Dunning-Kruger effect. According to this effect, novices are unaware of their own skill deficiencies wherein the least competent are more likely to overestimate their level of performance [32]. Accurate self-assessment requires appropriate external standards for measuring one’s performance and the ability to judge the extent to which one’s own performance meets those standards. Providing novices with a video of their own performance as well as a benchmark performance likely enhances self-assessment accuracy as it provides trainees with high-quality data which they can use to interpret their own performance and compare it to an explicit standard.
This study has several limitations. First, we used the GiECAT GRS to evaluate EGD performance, as there are no EGD-specific assessment tools with strong validity evidence. Although the GiECAT GRS has been validated for use in colonoscopy, it is lacking comprehensive evidence of validity for EGD [17]. In addition, we did not use a control group (i. e. no video intervention), so we are unable to determine if participants’ self-assessment accuracy would have improved over time with no intervention. A previous study, however, suggested that a control group would show no improvement [8]. Finally, our study evaluated the self-assessment accuracy of participants within a single day. It is possible that differences between groups would change over a longer observational period.
Overall, video-based interventions can improve accuracy of self-assessment of endoscopic competence among novices. In particular, benchmark performances in combination with a self-video review, may help to better inform these assessments. There are several implications of our findings. First, video-based interventions may be integrated into existing endoscopic training curricula [18]
[19]
[33] to facilitate recognition of performance deficits among novices. Video-recording has demonstrated benefits as a tool for external assessment and debriefing [34], and, based on our findings, it may also be used to improve learning by promoting accurate self-assessment. Ensuring trainees have accurate perceptions of their endoscopic competence may facilitate their learning as several studies in the educational literature have demonstrated that trainees are more receptive to feedback and more likely to uptake external feedback if it aligns with their self-perception [35]
[36].
Conclusion
Research has shown that it is critical for trainees to have an accurate perception of their abilities as their own opinions, as opposed to external assessments, predominately influence the generation of learning goals [36]. An online compendium of benchmark videos for major endoscopic procedures that feature a variety of presentations and techniques would be a useful resource for novices. The American Society for Gastrointestinal Endoscopy’s extensive database of videos could be updated to include annotations of key aspects of the performance in reference to an assessment standard to facilitate self-assessment. Future studies are required to investigate video-based interventions targeting other endoscopic procedures and evaluate their impact on self-assessment accuracy over a longer time period.