Key words
ultrasound-power doppler - ultrasound-color doppler - advanced dynamic flow - achilles
tendon - reliability
Introduction
Sonographic examinations have become routine in clinical practice for imaging pathological
changes of the Achilles tendon (AT) tissue in tendinopathy patients [1]
[2]. Besides structural signs of tendon degeneration, the clinical relevance of occasionally
observed IBF using DU is still a matter of debate [3]. Since sonographically detected IBF has been reported in 47–88% of symptomatic [3] as well as in 29–35% of asymptomatic AT [1]
[4], the initially assumed mere association to tendon pathology is questionable. Furthermore,
studies involving athletes have discovered a detectable exercise-induced increase
of IBF directly after physical activity, considering a physiological reaction [5]
[6]
[7]
[8]. In contrast, it has been assumed that normal tendon vascularisation is not detectable
with conventional DU due to the slow flow velocity and small size of supplying vessels
[3]
[9].
Common DU modes used in practice and research to examine IBF are PDU [1]
[8]
[10] and CDU [5]
[6]
[7]
[11]
[12]. PDU is often preferred to CDU due to its higher sensitivity in detection of small
vessels and low flow velocity [10]
[13]
[14], as well as more detailed visualization of vessel progression [15]. On the other hand, the advantage of CDU is the depiction of flow direction and
measurement of relative flow velocity [15]. Comparing these 2 modes, Reiter et al. [16] have found a specificity of 100% for PDU and CDU in examining IBF in ATs. In contrast,
Richards et al. [10] have found a higher sensitivity with PDU, detecting IBF in 45 ATs of tendinopathy
patients while CDU only revealed IBF in 24 of these tendons. It is argued that sensitivity
of Doppler modes depends on correctly adjusted, device-specific pre-settings regarding
e. g., pulse repetition frequency (PRF), colour gain (CG), colour velocity (CV), and
frames per second (fps), as well as the individual technical performance of the ultrasound
machine [14]. Furthermore, it is assumed that with improving technology the general sensitivity
of both Doppler modes increases [11]
[15] and differences between PDU and CDU become negligible [15]. A more recently developed broadband CDU called “Advanced Dynamic Flow” (ADF) has
been considered a technical improvement to conventional CDU in fetal cardio-vascular
diagnostics [17]
[18]. It is comparable to B-scan quality with higher sensitivity and precision in visualizing
vessels, high resolution and frame rate, and the capability to display the direction
of flow [17]. However, its convenience in comparison to conventional Doppler modes regarding
tendon blood flow imaging has not been investigated yet.
For quantification of IBF, the first scoring system was defined by Öhberg et al. [19] grading IBF according to the appearance of vessels inside a tendon. Subsequently,
this scoring system has undergone several modifications [1]
[2]
[6]
[8]
[12]. Another approach was to determine the total vessel length of IBF [5]
[20] or evaluate the amount of colour pixels inside the tendon [7]
[11]
[21]. The lack of consensus and knowledge about reliability of the different scoring
procedures and IBF assessment itself makes it difficult to compare and interpret study
results. 3 studies were identified that have investigated the reliability of examining
IBF [1]
[12]
[20]. Sengkerij et al. [1] had trained radiologists assessing IBF by applying the original Öhberg-Score [19]. They achieved excellent inter-observer reliability with an intraclass correlation
coefficient (ICC) of 0.85. Sunding et al. [12] reported a strong inter-observer reliability of 2 experienced observers using a
qualitative scoring system (ATs: Spearman’s r=0.76, Kappa=0.63; patellar tendons (PT):
Spearman’s r=0.99, Kappa=0.70). Furthermore, they investigated intra-and inter-observer
reliability of different experienced examiners (ATs: Spearman’s r=0.75–0.92, Kappa=0.59–0.87;
PTs: Spearman’s r=0.88–0.99, Kappa=0.45–0.86). Cook et al. [20] had experienced sonographers and radiologists quantifying the amount of IBF by estimating
vessel length (ICC 0.84). Additionally, they investigated intra-observer reliability
by re-scoring the same ultrasound images (ICC 0.94). However, intra-observer reproducibility
of IBF assessment was not investigated.
Due to the use of different ultrasound modes in clinical practice and research and
in the context of technical improvement of ultrasound devices in the last years [11], the consistency of PDU, CDU and ADF requires investigation. Furthermore, there
is a lack of knowledge regarding intra-observer reproducibility of AT IBF assessment.
In contrast to daily practice, IBF examinations in research are commonly performed
by experienced investigators/specialists or experienced radiologists [1]
[6]
[10]
[11]. If the level of experience of the investigator plays a role in examining IBF using
DU is, however, unclear. Therefore, the aim of this study was to examine the (1) consistency
between PDU, CDU and ADF and to investigate (2) the intra- and (3) inter-observer
reliability of investigators with different levels of experience in examining IBF
in Achilles tendinopathy patients. It was hypothesized that the 3 Doppler modes show
good consistency and that intra-observer is higher than inter-observer reliability.
Materials and Methods
Participants
18 participants were recruited from the University outpatient clinic. Included were
subjects with acute or chronic uni- or bilateral Achilles tendinopathy, who experienced
tendon pain for at least 2 weeks prior to the investigation. Exclusion criteria were
presence of systemic illnesses e. g., rheumatic diseases and hypercholesterinaemia,
preceding complete rupture and/or surgical treatment of the AT. A standardized clinical
examination by a sports medicine physician, including anamnesis, inspection and palpation
of the Achilles tendons, ensured the correct inclusion. The clinical diagnosis of
Achilles tendinopathy was based on presence of history of pain and pain on palpation
of the tendon [22].
After giving their written informed consent, anthropometrical data were collected.
Subsequently, all participants completed the Victorian Institute of Sports Assessment-Achilles
questionnaire (VISA-A), a validated and reliable tool for assessing the severity of
Achilles tendinopathy [23]. The study was approved by the local ethics committee.
Study design
The study was conducted in a test-retest (M1/M2) design performed by an experienced
(EI) and an inexperienced investigator (II). The EI was a sports medicine physician
with 4 years of intensive clinical practice and study participation using tendon and
joint ultrasound including DU techniques [24]
[25]. He also performed the clinical examination of all participants. The II was introduced
to tendon ultrasound examination 6 weeks prior to the study. Practice was supervised
by an experienced physician. Both investigators were blinded to the other’s examination.
Furthermore, the II was blinded to results of the clinical examination. Due to the
evident effect of exercise on the presence of IBF [6] and potential day-to-day variability [26], M1 and M2 were performed consecutively on the same day, controlling for exercise-abstinence
2 h prior to the measurements [4].
Ultrasound examination
Sonography was performed with a high-resolution ultrasound device (Xario SSA-660 A,
Toshiba) using a multi-frequency linear transducer at 14 MHz (PLT-1204AT). The subjects
were placed in prone position, knees extended and feet hanging over the distal end
of the table, being passively placed in 90° angle to the tibia. Left and right AT
were examined in randomized order with B-mode (gain=80, DR=65, penetration depth=3 cm,
focus 0.5 cm) longitudinally and transversely in order to assess tendon thickness
[27]. Since it is assumed that the asymptomatic contralateral side in patients suffering
from unilateral Achilles tendinopathy is frequently involved in an asymptomatic tendinosis
[10], both ATs of all subjects were examined for IBF and included in analysis. ATs were
investigated from the distal insertion at the calcaneus up to the musculotendinous
junction of the M. soleus about 7 cm proximal to the calcaneus. For each examination
of IBF, II and EI used PDU on both ATs as a reference to detect the location with
the highest amount of Doppler activity [10]
[13]. This location was marked on the skin and was erased after each examination. The
distance of this mark to a second fixed reference mark on the skin at the distal calcaneus
was documented. Absence of IBF was also documented. Subsequently, the previously marked
area was examined with the 3 Doppler modes in randomized order by taking video clips
(sequences of 5 s). Each mode had standardized pre-settings [PDU: CG=42, CV=1.7–2.1 cm/s,
PRF=8.2–10.5 kHz; CDU: CG=40–42, CV=1.7 cm/s, PRF=8.2 kHz; ADF: CG=40–42, CV=1.5 cm/s,
PRF=12.5 kHz] ([Fig. 1]). The box size of the region of interest (ROI) was 2.0 cm wide and 1.5 cm deep.
Fig. 1 Intratendinous blood flow in the Achilles tendon visualized with a PDU, b CDU, and c ADF.
The subject stayed in prone position throughout M1 and M2, which were performed in
identical procedure with randomized order of investigators. The stored video clips
were analysed on a different day. Both investigators analysed their own recordings.
The degree of IBF was graded using a modified Öhberg Score, described by Hirschmüller
et al. [2]. The 6 different grades are defined as: 0=no vessels visible; 1=1–2 vessels within
the ROI; 2=3–5 vessels within the ROI; 3=vessels in up to 30% of the ROI; 4=vessels
in 30–50% of the ROI; 5=vessels in >50% of the ROI. Reliability of this scoring system
was tested in a pilot study with the same investigators grading 67 DU video clips,
resulting in excellent Kendall’s tau b correlation coefficient for intra-observer
(0.95) and inter-observer (0.91) reliability. To ensure identical understanding and
training in application, both investigators were introduced to this scoring system
together and got some practice before commencing the study.
Statistical analysis
Statistical analysis was performed using SPSS (IBM SPSS Statistics Version 20) and
R (Version 3.0.1; Package “irr” Version 0.84). Significance level was set α=0.05.
Anthropometrical data and tendons thickness assessed by the EI are described as mean±standard
deviation (SD). Descriptive results of the measurements are presented in absolute
and relative (%) values. Analysis of IBF was based on the ordinal scaled niveau of
the applied scoring system. To investigate consistency of the 3 Doppler modes for
both measurements and both examiners, Cohen’s Coefficient of Concordance (Kendall’s
W; adjusted for ties) and Fleiss Kappa coefficient were calculated. Pairwise comparison
of all modes as well as intra-and inter-observer reliability were compared using Kendall’s
tau b correlation coefficient (adjusted for ties) and Cohen’s Kappa coefficient. Kendall’s
coefficients for ordinal data were considered as main results since they represent
the agreement between scores considering the degree of deviation [28] while Kappa coefficients for categorical data represent only absolute agreement.
Kappa coefficients are interpreted according to Landis and Koch [29] as “poor” (< 0.0), “slight” (0.0–0.20), “fair” (0.21–0.40), “moderate” (0.41–0.60),
“substantial” (0.61–0.80), and “almost perfect” (0.81–1.00). Kendall’s tau b is interpreted
as “negligible” (0.00 to 0.30 /0.00 to -0.30), “low” (0.30 to 0.50/−0.30 to−0.50),
“moderate” (0.50 to 0.70/ -0.50 to−0.70), “high” (0.70 to 0.90/−0.70 to−0.90), and
“very high” (0.90 to 1.00/ -0.90 to−1.00) positive or negative correlation [30]. Kendall’s W results in values ranging from 0 to 1 with 0 indicating no correlation
and 1 indicating perfect correlation. All 36 measured ATs were pooled and analysed
independently of side.
Results
Descriptive analysis
The 18 participants included in this study had a mean VISA-A score of 71±15 indicating
presence of Achilles tendinopathy [23]
[31] ([Tab. 1]). From 36 examined ATs, 24 tendons were symptomatic and 12 asymptomatic. 6 subjects
had bilateral, 12 unilateral AT pain. [Tab. 2] displays average tendon thickness of symptomatic and asymptomatic tendons.
Tab. 1 Anthropometrical data [mean±SD].
Gender [♂/♀]
|
Age [yrs]
|
Height [m]
|
Weight [kg]
|
VISA-A Score
|
10/8
|
37±11
|
1.77±0.1
|
77.7±19.6
|
71±15
|
(VISA-A=Victorian Institute of Sports Assessment-Achilles questionnaire)
Tab. 2 Tendon thickness in mm at different locations assessed by EI [mean±SD].
Location [n=number of AT]
|
Proximal insertion calcaneus
|
2 cm proximal to calcaneus
|
longitudinal thickest localization
|
transversal thickest localization [a-p]
|
Symptomatic AT [n=24]
|
5.0±1.2
|
5.8±1.0
|
7.4±1.8
|
7.0±1.7
|
Asymptomatic AT [n=12]
|
4.7±0.7
|
5.5±0.9
|
6.1±1.1
|
5.8±0.7
|
All [n=36]
|
4.9±1.1
|
5.7±1.0
|
7.0±1.7
|
6.8±1.6
|
(AT=Achilles tendon; a-p=anterior-posterior)
In M1 IBF was found in 24 of 36 (EI) and in 26 of 36 (II) ATs. In M2 vessels were
detected in 23 of 36 (EI) and in 28 of 36 (II) ATs. 83/79% (EI) and 88/92% (II) of
symptomatic ATs and 33/33% (EI) and 42/50% (II) of the 12 contralateral asymptomatic
ATs had IBF, in M1/M2, respectively. In 94% (EI) and 89% (II) the localization of
maximal IBF (or absence) was identical in M1 and M2. In 72% (M1) and 67% (M2) of examinations,
the localization of maximal IBF (or absence) was identical between the 2 observers.
Consistency of 3 Doppler modes
Kendall’s W between the 3 Doppler modes ranged from 0.97–0.98 for both examiners with
lower absolute agreement of 0.76–0.82 ([Tab. 3]). Furthermore, comparing 2 modes at a time in M1 and M2 for EI and II, Kendall’s
tau b ranged from 0.90–0.98 with lower absolute agreement of 0.64–0.93 ([Tab. 4]). Highest consensus was found for PDU vs. CDU (EI), and lowest consensus for both
PDU and CDU vs. ADF (II and EI) ([Tab. 4]).
Tab. 3 Comparison of 3 Doppler modes in M1 and M2 for II and EI.
Measurement
|
Fleiss Kappa Coefficient [II/EI]
|
Kendall’s W [II/EI]
|
M1 (PDU/CDU/ADF)
|
0.78/0.76
|
0.97/0.98
|
M2 (PDU/CDU/ADF)
|
0.76/0.82
|
0.97/0.98
|
(II=inexperienced investigator; EI=experienced investigator; PDU=power Doppler ultrasound;
CDU=colour Doppler ultrasound; ADF=Advanced Dynamic Flow)
Tab. 4 Pairwise comparison of Doppler modes in M1 and M2 for II and EI.
|
M1
|
M2
|
Doppler modes
|
Cohen’s Kappa [II/EI]
|
Kendall’s tau b [II/EI]
|
Cohen’s Kappa [II/EI]
|
Kendall’s tau b [II/EI]
|
PDU vs. CDU
|
0.81/0.89
|
0.92/0.98
|
0.82/0.93
|
0.94/0.98
|
PDU vs. ADF
|
0.82/0.75
|
0.93/0.93
|
0.68/0.78
|
0.90/0.94
|
CDU vs. ADF
|
0.71/0.64
|
0.90/0.92
|
0.79/0.74
|
0.92/0.92
|
(II=inexperienced investigator; EI=experienced investigator; PDU=power Doppler ultrasound;
CDU=colour Doppler ultrasound; ADF=Advanced Dynamic Flow)
Intra-and Inter-Observer Reliability
For II, Kendall’s tau b for repeated examinations with the 3 Doppler modes was slightly
lower than for EI (0.84–0.87 vs. 0.90–0.92). Absolute agreement was lower ranging from 0.56–0.71 and 0.72–0.78, respectively ([Tab. 5]).
In total, 56 of 108 (M1) and 54 of 108 (M2) scores matched between investigators for
all DU examination. [Fig. 2]
[3] depict the distribution of scores of both investigators for M1 and M2.
Fig. 2 Distribution of IBF-Scores for M1 (N=36) for the inexperienced (black, II) and experienced
investigator (purple, EI); PDU=power Doppler ultrasound; CDU=colour Doppler ultrasound;
ADF=Advanced Dynamic Flow.
Kendall’s tau b for inter-observer comparison was 0.64–0.69 for M1 and 0.68–0.70 for
M2. Absolute agreement ranged from 0.30–0.46 in M1 and from 0.35–0.39 in M2 ([Tab. 5]).
Tab. 5 Intra- and inter-observer reliability for 3 Doppler modes.
|
PDU
|
CDU
|
ADF
|
Intra-observer [M1 vs. M2]
|
Kappa Coefficient [II/EI]
|
0.71/0.78
|
0.56/0.74
|
0.57/0.72
|
Kendall’s tau b [II/EI]
|
0.87/0.91
|
0.85/0.90
|
0.84/0.92
|
Inter-observer [II vs. EI]
|
Kappa Coefficient [M1/M2]
|
0.46/0.39
|
0.39/0.35
|
0.30/0.35
|
Kendall’s tau b [M1/M2]
|
0.64/0.69
|
0.69/0.68
|
0.68/0.70
|
(II=inexperienced investigator; EI=experienced investigator; PDU=power Doppler ultrasound;
CDU=colour Doppler ultrasound; ADF=Advanced Dynamic Flow)
Discussion
The main purpose of the study was to investigate the consistency and reliability of
3 DU modes which are used in clinical practice and research to assess IBF in ATs.
Additionally, results of an EI and II were analysed to explore the relevance of routine
in applying DU. Consistency between PDU, CDU, and ADF showed excellent agreement for
both investigators. Reliability during re-examination was very high for the EI and
high for the II. However, inter-observer comparison resulted in only moderate correlation
[30].
Consistency of 3 Doppler modes
Overall, PDU, CDU, and ADF revealed excellent consistency during successive examinations
with Kendall’s W ranging from 0.97–0.98 (II, EI). This implies their equal sensitivity
and applicability. Pairwise comparison also showed very high Kendall’s tau b correlation
coefficients between all modes (0.90–0.98) with highest correlation for PDU vs. CDU
(EI) and lowest correlation for PDU and CDU vs. ADF (II,EI). Contrasting this, recommendations
suggest a superiority of PDU over CDU in imaging IBF [10]
[13]
[14]
[15], while ADF has not been directly compared to either of them. Researchers argue that
PDU is more sensitive and precise to detect minimal and slow blood flow compared to
CDU and should be the mode of choice [10]
[13]
[14]
[15]. Strikingly, studies that were conducted so far to examine IBF in ATs, overall reveal
a more frequent usage of CDU than PDU [14]. Consequently, not only the Doppler mode itself but also the quality of technical
equipment and pre-settings seem to influence sensitivity and quality of imaging [14]
[15]. The results presented here confirm that continuous development and improvement
of ultrasound devices result in increasing sensitivity [11], and differences between Doppler modes become negligible [15].
ADF has been developed to improve precision of vascular imaging [17]. Examination of the fetal vascular system has concluded a superiority of ADF over
CDU regarding quality criteria such as “overpainting of vessel walls”, “discrimination
from neighbouring vessels” and “following the course of the vessels” [17]. Our results for consistency suggest that sensitivity of ADF is equivalent but not
superior to PDU or CDU when it comes to IBF imaging in ATs. However, it may be speculated
that the lower absolute agreement between PDU/CDU and ADF is associated to a higher
sensitivity of ADF which might be partly masked by the inaccuracy of the scoring system
([Fig. 3]). Although depiction of flow direction and velocity may not be of primary interest
for examination of IBF, the more precise discrimination of discrete vessels using
ADF seems advantageous in scoring single intratendinous vessels ([Fig. 3]).
Fig. 3 Distribution of IBF-Scores for M2 (N=36) for the inexperienced (black, II) and the
experienced investigator (purple, EI); PDU=power Doppler ultrasound; CDU=colour Doppler
ultrasound; ADF=Advanced Dynamic Flow).
Reliability
Re-examination resulted in very high correlation coefficients for EI ranging from
0.90–0.92. Additionally, results of II, who performed identical examinations, showed
slightly lower test-retest correlations ranging from 0.84–0.87. While re-examination
revealed good repeatability for all modes independent of observer-experience, comparison
between the 2 investigators uniformly showed much lower accordance with correlation
coefficients of 0.64–0.70. Only in 69% of all scanned tendons (M1 and M2), both examiners
agreed on the same location to show highest amount of blood flow. This suggests that
observer experience might have an influence on detection of the amount of IBF in ATs.
Nevertheless, the high Kendall’s coefficients for consistency as well as for intra-
and inter-observer reliability in comparison to variably lower Kappa coefficients
indicate that the degree of deviation in non-matching scores was very small.
The results for inter-observer reliability were considerably lower than previously
reported in literature. 3 studies have investigated the reliability of IBF assessment
in ATs and PTs; 2 found “excellent” inter-observer correlations for experienced investigators
with ICC 0.85 [1] and 0.84 [20], respectively. One study reported lower observer-agreement for ATs (Spearman’s r=0.76,
Kappa=0.63) but higher agreement for PTs (Spearman’s r=0.99, Kappa=0.70) [12]. For reliable assessment of IBF, 2 tasks demand the investigator’s skills. First,
the investigator has to manually perform the DU examination and detect highest amount
of blood flow [32]. This requires minimal probe pressure to avoid obliteration of vessels [1]
[8]
[27], patience to not overlook small vessels, and a steady handling of the probe to prevent
presence of artefacts [33]. Second, he has to evaluate and score the DU image adequately [32]. The lower inter-observer reliability in the present study compared to previous
research might be due to the lower experience of one investigator in applying ultrasound.
The evaluation of DU images with the applied scoring system itself was considered
very reliable in the prior pilot study.
Another aspect that could have influenced the different results to previous studies
is the type of evaluation system to quantify IBF. Sengkerij et al. [1] scored the amount of blood flow using a modified Öhberg score, grading 0–3 vessels
as “0” to “3+” and more than 3 vessels as “4+“. This didn’t seem applicable in the
present study due to the inaccuracy for higher amount of blood vessels detectable
with modern devices. The 4-grade score used by Sunding et al. [12] determining no, mild, moderate and severe IBF, is comparable to scores used in rheumatology
[34] but seemed too unspecific for quantification of IBF. Cook et al. [20] investigated the total vessel length in the ultrasound image in mm. Although this
grading system showed excellent results for reliability of re-scoring the same image
[20], it remains debatable if the assessment of total vessel length is relevant for clinical
practice. The score applied in the present study was recently determined for ADF imaging
examining IBF in AT [2]
[4] and showed very good reliability in a pilot study. It quantifies the number of vessels
in the region of interest also for a high amount of blood flow and is convenient for
clinical setting.
Limitations
A limitation of this study is the consecutive performance of all measurements for
test and retest on a single day. This design was chosen to eliminate confounding factors
on presence of IBF during re-examination such as day-to-day variability [26] or varying prior physical activity [6]. Exercise-abstinence before M1 was standardized, since there is conflicting evidence
on the effect of exercise on presence of IBF [6]
[7]
[9]
[11] and lack of knowledge about the duration of its influence in DU examinations [1]
[2]
[6]
[7]
[8]. Additionally, the measurement procedure and device pre-settings e. g., CG, PRF,
CV, filter, and size box for ROI were standardized to eliminate any further confounders
on IBF detectability [8]
[14]
[15].
Another limitation was the procedure of determining the highest amount of IBF in ATs
in each examination by using PDU as reference. This proceeding was necessary for defining
the location of maximal blood flow to enable assessment of comparable results for
the subsequent examination with all 3 randomized modes. It cannot be ruled out that
this had an influence on the results for consistency of Doppler modes.
As a consequence from preceding studies, the generalizability of the results for mode-consistency
is limited to comparable technical standards and individual optimal specified pre-settings.
To ensure highest possible reliability of IBF assessment, future research and application
should insist on uniform procedural and technical standardization as well as controlling
for physical activity prior to all DU examinations. Furthermore, it is recommended
that DU re-examinations should be performed by the same investigator [12]. The effect of exercise on IBF presence and the necessary duration to control for
exercise prior to ultrasound examinations remains to be clarified in further studies.
Additionally, in the course of increasing sensitivity of ultrasound devices to slower
blood flow and smaller vessels, a more precise scoring system should be considered
to differentiate between potential physiological and pathological IBF as already proposed
by Boesen et al. [7]
[11].
Conclusion
IBF assessment in AT with PDU, CDU, and ADF revealed excellent consistency, thus indicating
equivalent applicability of all 3 modes in research and clinical practice. Reproducibility
of re-examination by the same examiner was good independent of experience. Inter-observer
comparison, however, only revealed moderate reliability indicating the challenge in
DU examinations to assess the amount of IBF. It is recommended to perform DU re-examinations
of IBF by the same investigator. Further investigations might clarify if the suggested improved precision of ADF imaging
allows for a more precise quantification of IBF using an adequate evaluation system.