Keywords ACL - reliability - universal goniometer - inclinometer - smartphone app
Knee range of motion (ROM) is a common clinical measurement used to monitor the progress
of patients after an anterior cruciate ligament (ACL) injury, as one out of three
patients will experience loss of knee extension ROM at 12-month follow-up after this
surgery.[1 ] This deficit could be attributed to multiple of reasons including, but not limited
to, inflammation and swelling, surgical problems (e.g., cyclops lesion, graft impingement
of suboptimal placement, intra-articular loose bodies), persistent muscle guarding,
and pain inhibition.[1 ]
[2 ] Inadequate knee extension ROM postsurgery is associated with gait defects, altered
biomechanics, knee osteoarthritis, impaired quadriceps function, and patient-reported
outcomes.[2 ] All the above highlights the importance of identifying knee extension deficits in
ACL patients as early as possible during the rehabilitation process.
In daily practice, objective, reproducible, and reliable measurements of clinical
outcomes, such as knee extension, are paramount when documenting the treatment efficacy
of any rehabilitation intervention. Consequently, the reliability of the tools used
should be established so that clinicians can determine whether differences during
repeated testing are related to measurement error of the device or represent actual
changes in the outcome itself.[2 ] Health care practitioners have used several devices for measuring knee extension
ROM, as well as different positions (e.g., supine and prone). The most common device
is the universal (long-armed) goniometer.[3 ] Recently, an increasing number of clinicians report the use of inclinometers and
smartphone apps to measure ROM.[3 ]
[4 ]
[5 ]
[6 ]
[7 ]
[8 ] Several studies have been conducted investigating reliability and minimal detectable
change (MDC) for these devices with the results ranging widely from poor to excellent.[9 ]
[10 ]
[11 ]
[12 ]
[13 ]
[14 ]
[15 ]
[16 ]
[17 ] In these studies, the populations varied, with most recruiting either patients with
total knee arthroplasty or healthy participants. The heterogeneity of these results
and the absence of studies conducted on ACL patients preclude recommendations from
being made in this population for whom knee extension ROM is an important clinical
measure.[18 ]
[19 ]
Another key measurement property that has direct clinical implementation is the MDC.
The MDC reflects the smallest change in measurement that is beyond the measurement
“noise” and therefore is the minimum amount that needs to be overcome before ascribing
any “real” change in a measure to a change in patient status.[20 ]
As neither the reliability (inter- and intrarater) nor the MDC is available for these
methods of measuring knee extension ROM in a population of ACL-injured patients, the
primary aim of this study is to describe the intrarater, interrater, and test–retest
reliabilities of the goniometer, inclinometer, and smartphone application for measuring
the knee extension ROM in ACL patients. Additionally, a secondary objective of this
study is to establish the MDC associated with these methods, which will better inform
clinical choices in ACL-injured patients.
Methods
The current observational cohort study was conducted at the assessment unit of Aspetar
Orthopaedic and Sports Medicine Hospital, Doha, Qatar. It was a single-visit design
study and ethical approval was obtained by the Aspire Zone Foundation Institutional
Review Board (E202202031).
Participants
Participants of this study were 92 patients undertaking rehabilitation after ACL injury
referred to the assessment unit, as part of routine testing between February 2021
and August 2022. Patients were included if they were treated conservatively or surgically
after an ACL injury. A pilot study using 20 participants was performed in February
2021, to estimate the sample size for the current study. Prior to the pilot study,
two training and familiarization sessions of 2 hours each were performed by the two
testers. This pilot investigation showed an intraclass correlation coefficient (ICC2,1 ) value of ∼0.85, 0.90, and 0.90 for the universal goniometer, inclinometer, and smartphone
app, respectively. The subsequent a priori power analysis (testing the null hypothesis
H0 : p = 0.8 ICC vs. H1 : p > 0.8 ICC) suggested a minimum of 50 participants were required to have an observed
power of 80% (α = 0.05) with two measurements per participant.[21 ]
Procedures
The knee extension measurement is part of the mandatory periodic assessment for patients
with ACL injuries undertaking their rehabilitation at our facility. The routine testing
protocol includes, in order of execution: clinical testing, instrumented laxity test,
functional (movement) testing, and isokinetic strength testing. Participants were
informed regarding the nature of the reliability study, and if they consented, we
performed the additional knee extension measurements during the assessment visit ([Fig. 1 ]). Two independent, experienced physical therapists (M.P. and R.K.) performed the
knee extension measurements in a single session. To prevent carryover effects (i.e.,
systematic effects of one test on the other), the test sequence was randomized.
Fig. 1 Flow chart of the study design.
At the beginning of the routine testing protocol, the patients undertook three knee
extension measurements with three different devices in a randomized order: universal
goniometer, inclinometer, and smartphone app. One examiner (M.P.) performed two measurements
with each device to determine the intrarater reliability. Between the two measurements,
the examiner had to replace the instrument to storage and repalpate the anatomical
landmarks. A second examiner (R.K.) performed one measurement with each instrument
to determine the interrater reliability. At the end of the entire routine testing
protocol, participants were measured again by tester 1, to evaluate the test–retest
reliability.
Results were recorded separately by an administrative assistant who ensured each tester
was blinded to the results of other; however, the practitioners were not blinded to
the measurement method.
Knee Extension Measurements
Participants were supine with their heels resting on a 10-cm thick box and asked to
relax such that their knee fell into passive knee extension without any overpressure.
For the inclinometer measurement, the Empire Magnetic Polycast Protractor with 1 degree
markings was used. This device operates with the assistance of gravity, eliminating
the need for resets between measurements. The edge of the base of the device was placed
5 cm inferior to the tibial tuberosity, at the edge between the muscle belly of the
peroneal muscles and the tibia. Gentle pressure was applied, so the base of the device
would be stable and as parallel as possible to the anterior edge of the tibia ([Fig. 2 ]).
Fig. 2 Demonstration of the three measuring devices for knee extension deficit. (A) Inclinometer,
(B) smartphone app, and (C) universal goniometer measurements.
For the smartphone app measurement, we used the “iHandy Level Free” app (Android and
iOS versions were in common use in the rehabilitation department). The application
was calibrated at a stable and even surface prior to each measurement using a spirit
level. During testing, the phone was placed on the anterior border of the tibia, medial
to the peronei, 5 cm inferior to the tibial tuberosity, in the same position as the
inclinometer. The phone was always facing the same direction and the same side of
the phone was touching the limb of the patient to minimize any differences in measurement
due to the orientation of the gyroscope of the device ([Fig. 2 ]).
For the universal goniometer measurement, a Baseline 360-degree plastic goniometer
was used, with 1-degree markings. Two transparent plastic 30 cm rulers were aligned
and attached to the arms of the goniometer to extend its length and facilitate accurate
positioning. The proximal arm was placed at the greater trochanter of the hip and
parallel to the longitudinal axis of the femur. The axis was placed at the lateral
femoral condyle. The distal arm of the goniometer was placed with its end over the
lateral malleolus ([Fig. 2 ]).
Statistical Analyses
Descriptive statistics of the participants were calculated for age, height, weight,
gender, and sport performed. Intrarater and interrater ICC2,1 and their 95% confidence intervals (CIs) were calculated. Two-way mixed approach
with absolute agreement was used for the intrarater, interrater, and test–retest ICC
analyses. An ICC ≥0.90 was described as “excellent,” between 0.75 and 0.89 as being
“good,” between 0.50 and 0.74 as “moderate,” and below 0.50 as “poor.”[22 ] Additionally, MDC with 95% CI (MDC95 ) was calculated for the three different methods. To examine systematic variability
between the measurements, Bland–Altman plots and joint plots were constructed and
analyzed for each of the possible comparisons ([Supplementary Material ]). All analyses were performed using SPSS v.28 (IBM Corporation, Armonk, NY) and
JMP v.16 (SAS Institute Inc., Cary, NC, 1989–2023).
Results
There were 84 male and 8 female participants. The mean age of the participants was
29.5 years (standard deviation [SD]: 9.4 years, and range 17–52 years). Their average
height was 174 cm (SD: 8.3) and the average weight was 81.8 kg (SD: 16.5). Forty athletes
were participating at a competitive level and 52 recreational athletes were included.
Seventy-two patients were evaluated at an average of 6.0 months (SD: 4.7) following
ACL reconstruction, while 20 conservative/preoperative patients were assessed at an
average of 4.4 months (SD: 3.9) postinjury. Out of the 72 patients who were treated
surgically, there were 48 with hamstring graft, 18 with bone-patellar tendon-bone
graft and 6 with quadriceps graft. Twenty-five patients underwent meniscal repair,
17 underwent meniscectomy, and 30 did not require any meniscal intervention. Among
the six patients with cartilage injuries, only one underwent surgical treatment as
determined by the treating surgeon.
Intrarater ICC
Intrarater reliability was excellent for all three methods ([Table 1 ]) ranging from 0.92 to 0.94.
Table 1
Intrarater, interrater, test–retest ICC2,1 and MDC results
Universal goniometer
Inclinometer
Smartphone app
Intrarater ICC (95% CI)
0.93 (0.90–0.95)
0.94 (0.91–0.96)
0.92 (0.89–0.95)
Interrater ICC
0.36 (0.17–0.53)
0.80 (0.71–0.86)
0.79 (0.70–0.86)
Test–retest ICC (n = 75)
0.83 (0.74–0.89)
0.89 (0.84–0.93)
0.86 (0.79–0.91)
MDC95 intrarater (deg)
3.5
2.0
2.2
MDC95 interrater (deg)
10.4
3.7
4.0
MDC95 test–retest (deg) (n = 75)
5.4
2.6
2.9
Abbreviations: CI, confidence interval; ICC, intraclass correlation coefficient; MDC,
minimal detectable change.
Interrater ICC
The interrater results differed between the three devices ([Table 1 ]). The inclinometer and the smartphone app had good interrater reliability with an
ICC2,1 of 0.80 and 0.79, respectively. The reproducibility for the universal goniometer
was poor with an ICC2,1 of 0.36.
Test–Retest ICC
Only 75 of the total 92 participants were able to perform the final (retest) measurement
at the end of the assessment procedure due to personal time constraints. The test–retest
reproducibility was good for the three devices, ranging from 0.83 to 0.89 ([Table 1 ]).
MDC95
The intrarater, interrater, and test–retest MDC95 values are shown in [Table 1 ]. The inclinometer had the smallest intrarater, interrater, and test–rest MDC values,
followed by the smartphone app, and the universal goniometer. The latter displayed
the worst results of these three approaches.
Mean Differences, Bland–Altman Plots, and Joint Plots
The mean differences between measurements for each of the nine conditions are presented
in [Table 2 ] which shows no systematic error except for interrater reliability for the app (1.145-degree
bias). Bland–Altman plots and joint plots revealed no variation in error across the
ranges of motion observed ([Supplementary Figs. S1–S18 ]).
Table 2
Mean differences (and 95% CIs) for each of the comparisons along with the standard
error and p -value for the difference
Comparison
Mean difference (deg) (95% CI, standard error of the difference)
p -Value (mean difference)
Intrarater inclinometer
−0.054 (0.096 to −0.205, 0.076)
0.4765
Interrater inclinometer
−0.036 (0.238 to −0.310, 0.139)
0.7965
Test–retest inclinometer
0.201 (0.450 to −0.047, 0.126)
0.1119
Intrarater app
0.001 (0.152 to −0.149, 0.076)
0.9886
Interrater app
−1.145 (−0.849 to −1.442, 0.150)
<0.0001
Test–retest app
−0.125 (0.135 to −0.385, 0.132)
0.3441
Intrarater goniometer
0.177 (0.484 to −0.129, 0.155)
0.2557
Interrater goniometer
0.511 (1.240 to −0.218, 0.369)
0.1685
Test–retest goniometer
0.387 (0.915 to −0.141, 0.267)
0.1500
Abbreviation: CI, confidence interval.
Discussion
In this study, the first to examine the reliability of measuring knee extension ROM
in patients after ACL injury, we demonstrated excellent intrarater reliability for
all three devices, suggesting that they can be deemed appropriate for clinical use
when measurements are performed by a single individual. Conversely for the interrater
and the test–retest, the reliability varied markedly, with the inclinometer and the
smartphone app demonstrating good reliability, while the universal goniometer exhibited
poor reliability. These findings should inform clinical practice.
Inclinometer
The inclinometer displayed the best intrarater, interrater and test–retest results
among the three devices. Regarding the intrarater reproducibility, the results were
excellent as well, similar to the other two devices. We hypothesize that the main
reason for that was the minimal palpation that is required when using this device
and the lack of any calibration requirement, as opposed to the smartphone app. Additionally,
both testers in this study reported that the inclinometer was the easiest device to
operate of the three.
This standard inclinometer has been used by several studies.[23 ]
[24 ]
[25 ] Unfortunately, direct comparisons of their results with our study could not be made
since the methodology used, the statistical approach, and the target populations were
different. Specifically, Maltais et al (2019) conducted a study on children with cerebral
palsy, Reurink et al (2013) with acute hamstring injured patients, and Piva et al
(2006) used patellofemoral pain patients. All the previously mentioned studies included
measurements of the knee joint, yet none of them assessed knee extension in supine.
The intrarater results of Maltais et al (2019) were slightly lower than ours (ICC:
0.87). This can be attributed to the longer period between the two measurements of
the same tester, on average, 5.4 days, whereas in the current study, the second measurement
was repeated with the patient in the same position only 30–60 seconds after the first
measurement. The interrater ICC was slightly higher compared with our study, with
an ICC of 0.86. Short time intervals between the measurements (5 minutes between the
two testers) and standardization of the patient's position could be a possible explanation
since a second physiotherapist was helping to stabilize the pelvis of the subjects.
It has to be mentioned from a methodological perspective that the shorter the time
interval between the measurements and the similar patients' position across the measurements
can positively affect the ICC results.[14 ]
[26 ] In the study of Reurink et al (2013), the number of participants and the population
characteristics were similar to our research, albeit hamstring rather than ACL-injured
patients. Only interrater ICC values were calculated, 0.77 for the injured limb and
0.69 for the uninjured limb. The time intervals between the measurements were not
mentioned, and the testing position of the patient was different compared with our
study though, consequently direct result comparisons could not be made. Finally, Piva
et al (2006) performed four different lower limb measurements with an inclinometer
in patients with patellofemoral pain syndrome. Apart from the Craig's test, where
the interrater ICC was rated as “poor,” 0.45 (95% CI: 0.10–0.70), for the other three
tests, the interrater ICC scores were excellent, ranging from 0.91 to 0.97, slightly
higher than the results observed in the current study, perhaps attributable to the
patient's position and the different statistical approach used.
Smartphone App
The second approach used in this study was the smartphone app “iHandy.” The intrarater,
interrater and test–retest results were slightly poorer compared to the inclinometer.
After using the smartphone app extensively internally, we concluded that the sensors
of the smartphone devices are sensitive to movements in cardinal planes.[5 ] Consequently, even the slightest movement when transferring the device from the
table where we calibrated the device to the patient's limb could affect the results,
likely explaining the slightly worse results compared with the inclinometer. Future
research may examine whether improved training or modifications to the apps could
mitigate this. Additionally, during the study, recalibrating the device after each
measurement was time consuming, and aligning it to the tibia proved challenging due
to the protective phone cases used by the testers, unlike the inclinometer. Moreover,
the smartphone app was the only device where we observed a statistically significant
interrater mean difference (1.4 degrees). We noticed several measurements which were
almost exactly the same in absolute terms but opposite sign, for example, 4.5 degrees
tester 1, −4.3 degrees tester 2. We suspect that operator error may be to blame as
the smartphone app displays a numeric result only which makes it relatively easy to
mistakenly record a small flexion value for slight hyperextension or vice versa. Finally,
while this was the least expensive option (if the physiotherapist already owns a smartphone)
with the app being free, these results should be weighed when considering clinical
implementation.
Knee extension was measured using a smartphone app in two studies.[11 ]
[13 ] Different instrument placement, smartphone apps, patients' position, and active
rather than passive knee extension were used across those studies, so direct comparisons
with our results could not be made. The intrarater ICC results ranged from 0.94 to
0.96, being similar or slightly higher compared with our study.
Pereira et al (2017) reported an interrater ICC of 0.10 which is markedly worse than
the results reported here. The authors of this study note that the presence of pain
during the measurements could be an explanation for the poorer reliability observed.
In our study, the patient was lying supine, and the knee was extended passively without
any overpressure. Pain was not reported by any patients in this study, nor have we
observed this in routine clinical practice.
Universal Goniometer
For the universal goniometer, while the intrarater reproducibility was excellent,
the interrater ICC results for the goniometer were the lowest of the three devices
and were characterized as poor. This can be attributed to the difficulty of accurately
palpating the anatomical landmarks, especially in patients with a large soft tissue
mass at the lateral hip. Such difficulties in palpation can influence the reliability
scores, as even the slightest misplacement of the instrument on the anatomical landmarks
can lead to measurement discrepancies.[11 ]
[13 ]
[14 ] Finally, both testers expressed a preference for the other two devices due to less
time required to perform measurements and easier handling compared with the goniometer.
Previous studies measuring the knee extension showed similar results for the intrarater
reproducibility with ICC scores ranging between 0.82 and 0.99.[9 ]
[11 ]
[13 ]
[16 ]
[27 ]
[28 ]
[29 ] Only in the study of dos Santos et al (2012), the intrarater reliability was significantly
lower (ICC 0.49) which was likely a result of the time between the two measurements
being the longest at 7 days apart.
The interrater ICC results of the study of Pereira et al (2017) and Peters et al (2011)
were lower than ours with reported ICC of 0.05 and 0.21, respectively. In the study
of Pereira et al (2017), the measurements were performed only a few days after the
surgery (8.5 ± 7.4 days, PO group) which could explain the lower interrater scores
since pain was reported during the measurements. The results of the healthy (likely
pain-free) population of their study (HS group) were similar to our results with an
interrater ICC of 0.40. Finally, the studies of Brosseau et al (2001), Lenssen et
al (2007), Pandya et al (1985), dos Santos et al (2012), and Verhaegen et al (2010)
used similar patient position and methodology to our study. Still, they reported better
results with an interrater ICC ranging between 0.55 and 0.93. For the studies of Lenssen
et al (2007), dos Santos et al (2012), and Verhaegen et al (2010), we are not aware
if the patient changed position between the measurements of the two testers or remained
at the same position which could have affected positively the ICC values. Finally,
in the study of Pandya et al (1985), the measurements were recorded to the nearest
5 degrees, affecting the precision of the measurement and the ICC results consequently.
MDC95
MDC95 values are perhaps more clinically important than correlation coefficients as they
represent the minimal level of change in the measurements that can be attributed to
a “real” change in ROM beyond measurement error of the device. In this study, the
inclinometer exhibited the smallest MDC95 for all three measurements. The smartphone app exhibited the second smallest intrarater,
interrater and test-retest MDC95 values, 0.2, 0.3 and 0.3 degrees less than the inclinometer, respectively. The results
of the universal goniometer were lower, especially for the interrater value which
was 10.4°. Clinically, this device is only of use in patients with severe knee extension
deficits.
Test–Retest—No “Warming-up” Effect Seen
Another purpose of this study was to investigate if performing the measurements at
the end of the assessment routine would affect the knee extension measurements, that
is, was there any “warming-up” effect on knee extension flexibility. For this reason,
a third measurement was performed at the end of the assessment. The number of the
participants was smaller (n = 75) as some patients were unable to remain after all the standard testing was completed
due to personal time constraints. Overall, the test–retest reliability was good but
slightly lower than the intrarater results. Importantly, there appeared to be no bias
from the tests done at the start of the assessment routine compared with the end.
We had postulated that the warming-up effect of the testing process (which includes
a formal cycling warm-up, isokinetic knee strength testing, jump testing, etc.) would
result in increased knee flexibility at the end of the session, which was not the
case. From a clinical perspective, the MDC values were 0.6 and 0.7 degrees higher
for the inclinometer and the smartphone app and 1.9 degrees for the universal goniometer.
This deviation, especially for the inclinometer and the smartphone app, are relatively
small which suggests that measurements of knee extension can be performed at any time
during an assessment session with relatively minor effects on the ultimate result.
Where accuracy is critical, it is suggested that the order of tests should be standardized.
Limitations
The recruited patients were mostly adult males which reflects our clinical caseload
in this consecutive series, but we should be cautious in extrapolating these results
to females and adolescents. Additionally, this study was a single-visit study and
the time intervals between the measurements were short, therefore tests done at longer
intervals apart cannot be compared. It is crucial to consider this aspect when extrapolating
the findings to clinical setting where multiple measurements on the same patient may
be taken days or weeks apart. It should be noted that we modified the universal goniometer
with the addition of plastic extensions to facilitate placement over the proximal
and distal landmarks, yet this was our least accurate measure. We suspect that without
this addition, aligning the goniometer arms visually would result in worse reliability
again. Furthermore, it has to be mentioned that the testers used two different smartphone
devices with different operating systems (Android and iOS) which could influence these
results.[5 ]
[30 ] Finally, it was infeasible to blind the participants and testers to the measurement
methods, but this may have influenced the results.
Conclusion
The inclinometer and the smartphone app displayed the best intrarater and interrater
reliability. Although the smartphone app achieved similar results, it is suggested
the device of choice is the inclinometer as it is inexpensive and was reported to
be easier to use than the universal goniometer or the smartphone app. In a clinical
setting where only one practitioner will conduct the measurements, all three devices
had satisfactory results.