Introduction
Endoscopic ultrasound (EUS)-guided tissue acquisition (TA) is first choice for establishing
a tissue diagnosis in suspected pancreatic cancer [1]. The increasing use of neoadjuvant chemotherapy for pancreatic carcinoma, and the
fact that neoadjuvant treatments require pathological confirmation of the diagnosis,
have rendered quality of EUS-guided TA of solid pancreatic lesions ever more important
[2]
[3]. Proficiency in EUS-guided TA can only be reached in centers in which all its aspects,
including TA, tissue handling, microscopic assessment and reporting, are safeguarded.
Feedback on performance is key to improving quality [4].
In 2015, the American Society of Gastrointestinal Endoscopy (ASGE) defined the following
key performance indicators (KPIs) for EUS-guided TA in solid pancreatic lesions: rate
of adequate sample (RAS) with a performance target of 85 %, diagnostic yield of malignancy
(DYM) with a performance target of 70 %, and sensitivity for malignancy (SFM) with
a performance target of 85 % [5]. RAS mainly reflects the quality of the process within the endoscopy suite (TA,
preparation of smears, including transport to the cytopathology lab), whereas DYM
and SFM reflect the quality of the entire process, including patient selection, specimen
preparation, microscopic assessment and reporting.
Currently, quality control for the yield of EUS-guided TA is not customary or required
for centers performing EUS-guided TA. Quality measurements for EUS-guided TA procedures
were previously described as a monitoring tool during the development of academic
or regional EUS programs [6]
[7]
[8]. Wani et al. used CUSUM curves to describe the development of competence in advanced
endoscopy trainees performing both EUS and ERCP [9]
[10]
[11]
[12]
[13]. CUSUM curves reflect development of quality delivered in time relative to predefined
performance targets.
In 2015 the Dutch Quality in Endosonography Team (QUEST) was founded. This is a regional
EUS interest group, consisting of endosonographers and pathologists from five community
hospitals in the Netherlands. QUEST aims to improve performance of EUS-guided TA by
providing feedback on KPIs of individual centers based on a prospective registration
of consecutive EUS-guided TA procedures of solid pancreatic lesions. This has led
to improvements in RAS (80 % to 95 %), DYM (28 % to 64 %), and SFM (63 % to 84 %)
comparing the results of an initial retrospective analysis of yield of EUS-guided
TA to the first 21 months of prospective registration [14].
This study evaluated the use of CUSUM curves to monitor performance of contributing
centers regarding the yield of EUS-guided TA of solid pancreatic lesions. Using this
tool, we aimed to assess trends in KPIs over time, and explore potential benefits
of CUSUM curves as a feedback-tool.
Patients and methods
This was a prospective, multicenter, quality improvement study of consecutive EUS-guided
TA procedures on solid pancreatic lesions conducted in five community hospitals in
the Netherlands. The local medical ethics committee (METC Zuidwest Holland 17–038)
approved the study protocol. Informed consent was obtained from all patients. The
study is registered in the Dutch trial registry (NTR) with trial number NL9470.
Study population and data collection
All patients aged 18 and older with a solid pancreatic lesion with high suspicion
of malignancy who underwent an EUS-guided TA procedure were eligible for this study.
Primary outcome parameters were CUSUM-derived learning curves with RAS and DYM as
input parameters. RAS was defined as proportion of procedures yielding specimen sufficient
for cytopathological and/or histopathological analysis. DYM was defined as the proportion
of procedures yielding a “suspicious for malignancy” or a “malignant” diagnosis. The
secondary outcome parameter was SFM. SFM was defined as the total of true positives
(“suspected malignancy” or “malignancy” based on EUS-guided TA with a malignancy as
final diagnosis) divided by all patients with a final diagnosis of malignancy.
Collected data on EUS-guided TA procedures included: patient demographics, localization
of the pancreatic mass, hospital, endosonographer, pathologist, needle diameter ( < 22-gauge
or 22-gauge), type of needle (fine-needle aspiration [FNA]/fine-needle biopsy [FNB]),
number of passes, use of suction (slow withdrawal of stylet or vacuum suction), availability
of rapid on-site specimen evaluation (ROSE), and the result of the cytopathological
and/or histopathological evaluation of the EUS-guided TA specimen. Based on current
practice guidelines and previous experience of our group, endosonographers were advised
to perform at least three passes with FNA needles or at least two passes with FNB
needles (unless ROSE detected sufficient material for diagnosis earlier), and to use
vacuum suction [14]
[15]. All other techniques and materials used were at the discretion of the local clinicians
and according to local availability of equipment and hospital standards.
The results of cytopathological and/or histopathological evaluation were classified
as follows: non-diagnostic, benign, atypical, suspicious for malignancy, and malignant.
Neuroendocrine tumors were classified as malignant. For the purpose of this study
“suspicious for malignancy” and “malignant” were both considered malignant. All types
of pancreatic and periampullary malignancies were considered a malignant reference
standard. The gold standard for a malignant diagnosis was based on either histopathological
diagnosis after surgical resection or progression of disease compatible with malignancy
during a minimum of 12 months of follow-up.
Feedback on performance
Regional interest group meetings were organized three times a year. Prior to meetings,
all contributors received data regarding the performance of their individual center
accompanied by anonymized benchmark data from the other centers. At the regional interest
group meetings, the results of prospective registration, best practices, guidelines,
and difficult cases were discussed. Until 2017, feedback on performance overall and
per center was provided as RAS, DYM, and SFM (proportions). From 2018 onward, visual
feedback by means of CUSUM curves of RAS and DYM was also provided. At meetings all
data (numbers and CUSUM curves) were presented (in an anonymized fashion) and subsequently
discussed. Participating endosonographers and pathologists were invited to reflect
on changes in directions of the curves provided. Significant changes in the direction
of the curve were subjected to further analysis, of which, the results were discussed
separately with the practitioners from the centers involved, prior to the next general
meeting. At a subsequent meeting, the results of these analyses were presented and
discussed, with emphasis on potential learning opportunities for all participants.
All gastroenterologists and pathologists involved had completed their training at
least 3 years before the start of this study [14].
Statistics
Cumulative sum analysis (CUSUM)
Each EUS procedure was scored as a success (adequate sample/malignant outcome) or
failure (inadequate sample/non-malignant outcome). Each success is rewarded with adding
score s, each failure results in subtraction of (1 – s). Each procedure is a dot in
the learning curve that is created by a plot of the cumulative sum of all cases in
chronological order.
The acceptable rates (P0) and unacceptable rates (P1) were defined based on the ASGE
KPIs and a previous publication by Eltoum et al. [16]. For inadequate samples, we designated 10% as acceptable (P0) and 15 % as unacceptable
(P1) rates. For a nonmalignant outcome of the EUS, the P0 was defined as 25 % and
the P1 as 30 %.
Decision limits
Two decision limits (h1 and h0) were calculated. The decision limits are calculated
based on type I (α) and type II (β) errors. A type I error is the risk of rejection
of a true null hypothesis and a type II error is the risk of non-rejection of a false
null hypothesis. The formulas that are used to calculate h0 and h1 were previously
described [16]. The meaning of the decision limits in relation to the curve can be explained as
follows: [17]
[18]
-
If the learning curve crosses the upper decision limit, the failure rate is within
the preset acceptable range and it reflects high quality.
-
If the learning curve crosses the lower decision limit, the failure rate is above
the preset unacceptable rates and an intervention is needed.
-
If the learning curve remains between the two decision limits, the performance is
within the preset acceptable range.
CUSUM charts
CUSUM charts were constructed using Excel. Each success (adequate sample/malignant
outcome) contributes to an upward slope of the CUSUM curve. Each inadequate sample
will contribute to a downward slope of the CUSUM curve. A downslope curve means that
the key performance indicator is not met. A horizontal curve indicates that quality
is up to standards. An upslope curve signifies quality is above the predefined key
performance indicator threshold.
Multivariable analysis
To investigate the association of RAS and DYM with procedure characteristics, we fitted
logistic mixed models. Given the limited number of inadequate samples, only two parameters
(suction: yes/no and ROSE: yes/no) could be included in the RAS model.
The model for the DYM included the variables suction type (no, slow withdrawal of
stylet or vacuum), ROSE, number of passes (continuous), needle size (< 22-gauge, 22-gauge)
and needle type (FNA or FNB). In both models we used endoscopist specific (random)
intercepts to take into account that samples obtained by the same endoscopist may
not be independent. The model for DYM also included a pathologist specific (random)
intercept. Both models were fitted in the Bayesian framework, which allowed us to
include observations for which some of the covariates were missing. We used normal
priors with mean 0 and standard deviation 100 for all regression coefficients. The
Bayesian models were fitted using Markov chain Monte Carlo, with the help of the freely
available and widely used “JAGS” software [19] that uses Gibbs sampling and provides a wide range of samplers to sample from full-conditional
distributions that do not have a closed form. Results are presented as posterior mean
and 95 % confidence interval (CI). Calculations were performed in R version 4.0.2
(2020–06–22) (R Core Team 2020) and the package JointAI 1.0.0.9000 [20]. Missing observations were imputed during the analysis.
Results
From January 2015 until December 2018, 431 EUS-guided TA procedures on solid pancreatic
lesions in 403 individual patients were included. The median age of the patients was
68 years (range 27–88), and 51 % were men. During follow-up, a pancreatic or periampullary
malignancy (reference standard) was diagnosed in 87 % of all cases. Per hospital,
two to four endosonographers were involved in these procedures. A wide range of eight
to sixteen pathologists per hospital were involved ([Table 1]).
Table 1
Characteristics of the participating patients and hospitals.
|
Total cohort
(n = 403)
|
A (n = 79)
|
B (n = 88)
|
C (n = 81)
|
D (n = 94)
|
E (n = 61)
|
Sex male, n (%)
|
206 (51 %)
|
43 (54 %)
|
42 (48 %)
|
40 (49 %)
|
54 (57 %)
|
27 (44 %)
|
Median age in years (range)
|
68 (27–88)
|
70 (42–86)
|
68 (43–86)
|
68 (27–87)
|
67 (33–88)
|
68 (35–88)
|
Reference standard malignant, n (%)
|
351 (87 %)
|
69 (87 %)
|
77 (88 %)
|
68 (84 %)
|
81 (86 %)
|
56 (92 %)
|
Number of endoscopists involved
|
15
|
2
|
4
|
2
|
3
|
4
|
Number of pathologists involved
|
39
|
16
|
8
|
8
|
8
|
14
|
Rate of adequate sample overall and per hospital
A total of 399 of 431 procedures yielded an adequate sample. Hence, RAS was 93 % for
the complete cohort (range 86 %–99% among individual hospitals). The ASGE-defined
KPI of RAS ≥ 85 % was met overall and in each of the individual hospitals ([Table 2]). This can also be appreciated from the upslope direction of the overall learning
curve drawn for this parameter (Supplementary Fig. 1). The RAS learning curves of the individual hospitals indicate adequate and stable
quality (curves between the decision limits) in Hospitals A, B, and E, and adequate
and improving quality in Hospitals C and D (Supplementary Fig. 2, Supplementary Fig. 3, Supplementary Fig. 4, Supplementary Fig. 5,
Supplementary Fig. 6).
Table 2
Values of RAS, DYM and SFM for the complete cohort and per hospital.
Hospital
|
No. of procedures
|
RAS
|
DYM
|
SFM
|
A
|
87
|
75 (86 %)
|
53 (61 %)
|
68 %
|
B
|
91
|
82 (90 %)
|
57 (63 %)
|
71 %
|
C
|
90
|
87 (97 %)
|
59 (66 %)
|
79 %
|
D
|
100
|
99 (99 %)
|
75 (75 %)
|
87 %
|
E
|
63
|
56 (89 %)
|
41 (65 %)
|
73 %
|
Total cohort
|
431
|
399 (93 %)
|
285 (66 %)
|
76 %
|
Italics: equal or above ASGE performance target.
RAS, rate of adequate sample; DYM, diagnostic yield of malignancy; SFM, sensitivity
for malignancy.
Diagnostic yield of malignancy overall and per hospital
A total of 285 of 431 procedures yielded a malignant diagnosis. Therefore, the overall
DYM was 66 % (ranging from 61 %–75 % in the individual hospitals). This is below the
KPI of DYM ≥ 70 % ([Table 2]). The overall learning curve of this parameter has a downslope direction (crossing
the lower decision limit) until January 2018 ([Fig. 1a]). From this point onward, the curve has a more horizontal direction between the
newly constructed decision limits, indicating an adequate and stable quality throughout
2018 ([Fig. 1a] and [Fig. 1b]).
Fig. 1 DYM CUSUM learning curve of the complete cohort. a January 2015 to December 2018. b January 2018 to December 2018. DYM, diagnostic yield of malignancy.
In only one of the contributing hospitals (Hospital D) the KPI of DYM ≥ 70 % was met
overall ([Table 2]). However, the learning curves of the individual hospitals for this parameter developed
from an initial downslope (Hospitals B and E) or horizontal direction (Hospitals C
and D) into a horizontal (Hospitals B, C, and E) or an upslope direction (Hospital
D) ([Fig. 2a], [Fig. 3a], Supplementary Fig. 7a, Supplementary Fig. 8a, Supplementary Fig. 9a). This indicates a gradual improvement in these centers up to an adequate quality
level in 2018.
Fig. 2 DYM CUSUM curve of hospital a January 2015 to December 2018. b January 2018 to December 2018. Black arrow marks the decrease in evaluating pathologists
from nine to three. DYM, diagnostic yield of malignancy.
Fig. 3 DYM CUSUM curve of hospital C. a January 2015 to December 2018. b October 2017 to December 2018. Black arrows mark the temporarily absence of one experienced
cytopathologist. DYM, diagnostic yield of malignancy.
The CUSUM curve for Hospital B started with a downward slope and in January 2018,
the curve suddenly improved to a horizontal slope ([Fig. 2a] and [Fig. 2b]).
The curve of Hospital C initially showed a stable and adequate quality until May 2017. From
this point onward there was a remarkable short and sharp downslope development of
the curve, which again developed in a more horizontal direction from September 2017
onward ([Fig. 3a] and [Fig. 3b]). This indicates a 4-month episode during which a significantly lower number of
malignant diagnoses were made. During these 4 months, a high proportion of specimens
with atypia (40 %) was graded in comparison to the episodes prior to May 2017 (4 %)
and from September 2017 onward (11 %) (Supplementary Table 1). The 4-month episode coincided with the temporary absence of the most experienced
cytopathologist in this center, who had been involved in all cytopathological evaluations
of pancreatic lesions in the previous years in this hospital.
Sensitivity for malignancy overall and per hospital
The overall SFM for the contributing hospitals throughout the 4 years of this study
was 76 %, ranging from 68 % to 87 % among different hospitals. The KPI of SFM ≥ 85 %
was not met in four of five contributing hospitals. The developments in the learning
curves regarding DYM suggest improvement in quality in the majority of these centers.
In 2018, the final year of this study, the overall SFM was 85 %, ranging from 69 %
to 96 % among the centers. In this year, the KPI of SFM ≥ 85 % was met in three of
five centers (Supplementary Table 2).
FNB versus FNA needles
A total of 282 FNA procedures and 127 FNB procedures were performed. The outcome of
FNA and FNB procedures was similar (Supplementary Table 3) The use of FNB needles did not increase over time.
Multivariable analysis
Nine observations for which all covariates were missing were excluded from the analysis.
Missing values in the remaining 422 observations were imputed (missing values: suction
type 4.7 %, needle brand 2.8 %, number of passes 2.1 %, needle size 1.7 %, needle
type 0.9 %, ROSE 0.2 %, and suction 0.2 %). The use of any type of suction and the
presence of ROSE were positively associated with RAS, with odds ratios of 3.2, 95 %
CI (1.1–7.8) and 2.8, 95 % CI (1.1–8.4), respectively ([Table 3]). There was no clear evidence that any of the covariates considered was associated
with DYM ([Table 3]).
Table 3
Odds ratios and corresponding 95 % CIs for the logistic mixed models for RAS and DYM.
RAS
|
DYM
|
Covariate
|
OR
|
95 % CI
|
Covariate
|
OR
|
95 % CI
|
Use of suction (vacuum and/or slow-withdrawal of stylet)
|
3.2
|
1.1 – 7.8
|
No suction
|
0.7
|
0.3 – 1.6
|
ROSE
|
2.8
|
1.1 – 8.4
|
Vacuumsuction
|
1.1
|
0.5 – 2.3
|
|
ROSE
|
1.5
|
0.9 – 2.4
|
Number of passes
|
1
|
0.8 – 1.4
|
< 22G needle (FNA and/or FNB)
|
1.5
|
0.4 – 4.9
|
22G needle (FNA and/or FNB)
|
0.9
|
0.6 – 1.5
|
FNB
|
1.1
|
0.7 – 2.1
|
There were missing values in seven covariates, with a percentage of missing observations
per variable ranging from 0 % to 5 %. These missing observations were imputed during
the analysis.
RAS, rate of adequate sample; DYM, diagnostic yield of malignancy; OR, odds ratio;
CI, confidence
interval; FNA, fine needle aspiration; FNB, fine needle biopsy; ROSE, rapid on-site
evaluation.
Feedback and interpretation of curve deflections
During the 4 years of prospective registration, the following changes were reported
by contributing practitioners. Hospitals A, D and E requested ROSE on a regular basis,
which they did not do before. Hospital A started with ROSE halfway into 2016, Hospital
D from January 2018 onward, and Hospital E at beginning of 2016. In Hospitals B and
C, there were changes in the number of pathologists involved in EUS-guided TA procedures
of the pancreas. In Hospital B, the group of pathologists that reviewed pancreatic
samples collected with EUS was downsized from eight to three in January 2018. The
most experienced cytopathologist from Hospital C was temporarily absent during a 4-month
period in 2017.
The time that the events previously described took place are marked with an arrow
in [Fig. 2a], [Fig. 3a], Supplementary Fig. 6, Supplementary Fig. 7a, Supplementary Fig. 8, Supplementary Fig. 8a, and Supplementary Fig. 9a.
Discussion
This study evaluated the performance of five community hospitals regarding the yield
of EUS-guided TA of solid pancreatic lesions using CUSUM curves to assess trends in
quality over time and explored potential benefits of CUSUM curves as a feedback tool.
Throughout the 4 years of this study, all three ASGE defined KPIs improved. A KPI
for RAS ≥ 85 % was met consistently in most of the centers and overall (93 %). A KPI
of DYM ≥ 70 % was not met overall throughout the study between 2015 and 2018, but
eventually yielded 75 % overall in 2018. Similarly, the KPI for SFM ≥ 85 % was not
met overall from 2015 to 2018, but improved to 85 % in 2018. Because not all ASGE-defined
KPIs are consistently met in each center, feedback on performance and analyses for
potential improvements are indicated and ongoing.
The diagnostic yield of EUS-guided TA for solid pancreatic lesions is considered a
benchmark for quality measurements in EUS [1]. However, the majority of studies in which the ASGE-defined KPI are based were performed
in tertiary care facilities [21]. Moreover, the majority of publications on EUS-guided TA in solid pancreatic lesions
were controlled trials focusing on discrete factors influencing the yield, i. e. different
types and diameters of needles, use of suction, the use of ROSE, or the optimal number
of passes to perform [22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]. Therefore, when comparing the current study to these previous publications, it
cannot be ruled out that differences regarding patient selection may have influenced
yield of EUS-guided TA. Nevertheless, questioning the generalizability of the benchmark
data may never be an excuse to stop monitoring and improving your performance.
To improve quality of EUS-guided TA, it is necessary to provide feedback on performance.
For providing feedback, CUSUM-derived learning curves have several advantages over
tables with numbers. First, their interpretation is easy and does not require any
knowledge about specific KPI values (a downward trend is not good, a horizontal line
is good, and an upward trend is better). Second, they allow determination of best
practices and comparison among peers. Third, they provide a more detailed picture
of development over time, allowing for focused analysis of performance within specific
timeframes [35]. The analysis of the sudden downslope deflection in the DYM curve of Hospital C,
coinciding with the 4-month absence of a senior cytopathologist, is an excellent example
of this. Analysis of this specific example teaches us how vulnerable the multistep
process of EUS-guided TA is, being dependent on each factor or operator involved.
Therefore, the discriminating advantage of learning curves for feedback over tables
with numbers is that they provide additional learning opportunities.
RAS and DYM are obviously related. However, because CUSUM curves of these variables
reflect quality relative to a predefined quality target, they do not necessarily develop
in the same direction. An upward RAS curve, therefore, does not mean the DYM curve
has to be upward as well. In other words: Having a sample that contains at least a
couple of cells from the target organ (adequate sample) does not automatically mean
that a pathologist will be confident about the malignant origin of the lesion. This
can lead to a RAS above the performance target and a DYM and SFM below the performance
target.
Supported by feedback provided by CUSUM analyses, several changes regarding protocols
and/or staff involved were made in individual hospitals. In Hospital C today, a pathology
report regarding pancreatic cytology or histopathology can only be finalized after
consent of a dedicated cytopathologist. Several hospitals implemented routine use
of ROSE and the number of pathologists involved was reduced in one of the centers.
Although multivariable analysis supports the use of suction and ROSE to be beneficiary
to RAS, an overall positive effect of these changes can be assumed. After all, with
a RAS of 85 %, the lowest acceptable level according to ASGE definitions, the SFM
can never exceed 85 %, and makes DYM ≥ 70 % in patients with solid pancreatic lesions
difficult to achieve.
To our knowledge, this is the largest prospective multicenter study of EUS-guided
TA of solid pancreatic lesions from community hospitals and the first to implement
CUSUM-derived learning curves as a tool for monitoring and improving KPI of these
procedures. Previous publications on the use of CUSUM curves in EUS-guided TA investigated
performance of either cytopathologists or endoscopy trainees [9]
[10]
[11]
[12]
[13]
[16]. In contrast to these studies, we used CUSUM curves to evaluate the entire process
defining quality and yield of these procedures, including the work of both endosonographers
and cytopathologists. Some of the data presented in this study (133 procedures, performed
from January 2015 to September 2016) were previously described in the initial publication
about this community hospital quality initiative [14]. The current study shows ongoing and persistent improvement in performance and introduces
learning curves as a feedback and monitoring tool.
The main limitation of this study is the fact that feedback, either in tables with
numbers or as learning curves, was not provided real time. Ideally, CUSUM curves would
have been drawn three times a year, enabling contributing centers to respond more
quickly to changes in curve directions. Because of logistic challenges and the time-consuming
nature of data collection, this could not be realized in the current study. Another
limitation is the fact that in the current study, no subtypes of FNB needles were
recorded. Recent publications indicate improved outcome with a subtype of FNB needles
over FNA needles [36]. The fact that no difference between FNA and FNB was detected in our study may be
related to the unclear mix of subtypes of FNB needles used. However, other confounders
such as the endosonographer learning curve for a new type of needle or pathologist
learning curve for evaluating tissue cores may have been involved.
Future directions
Performing EUS-guided TA comes with the responsibility to measure KPI regarding these
procedures. To facilitate this, an automated system is needed allowing EUS-procedural
parameters and concomitant pathology reports to be added on regular basis. Subsequently
CUSUM curves can be constructed based on KPI data at any point in time, allowing for
constant trend analysis thereby providing the fundament for quality improvement. We
believe that feedback on KPI is an essential first step for quality improvement. If
KPIs are not up to par, this should be followed by a cycle of protocol changes and
continued KPI measurements and evaluations (plan-do-check-act cycle), aiming for continuous
improvement of quality and life-long learning opportunities for all collaborators.
Changes in protocol are to be tailored and center-specific, depending on KPI measurements
and available resources. A measure aiming to increase a low adequate sample rate in
a center using 22-gauge FNA needles, three passes and suction, for example could be:
1. The introduction of ROSE; or 2. The introduction of an FNB needle. If the hospital
involved does not have its own cytopathology lab, implementation of FNB needles could
solve their problem. A measure aiming to increase DYM, with current adequate RAS and
high proportions of atypia diagnoses, for example, might be: 1. Reorganization of
the workflow in the pathology lab to have all samples evaluated by two cytopathologists
instead of seven; 2. Introducing liquid based cytology instead of smears only; or
3. Introducing the use of FNB needles. There is evidence to support that changes made
“bottom-up” are more likely to be sustained in comparison to changes implemented “top-down”
[37].
Conclusions
In conclusion, this prospective multicenter study using CUSUM-derived learning curves
for both quality monitoring and feedback demonstrates consistent improvement of KPIs
RAS, DYM, and SFM over time. It illustrates the benefits of using learning curves
with easy-to-interpret feedback regarding performance of a whole process or its individual
components while also allowing comparison with peers. Use of CUSUM curves is an excellent
way for responsible staff to monitor and scrutinize their performance and improve
the outcome of KPI up to the desired level.