Validity of Routinely Reported Rutherford Scores Reported by Clinicians as Part of Daily Clinical Practice

Abstract Routinely reported structured data from the electronic health record (EHR) are frequently used for secondary purposes. However, it is unknown how valid routinely reported data are for reuse. This study aimed to assess the validity of routinely reported Rutherford scores by clinicians as an indicator for the validity of structured data in the EHR. This observational study compared clinician-reported Rutherford scores with medical record review Rutherford scores for all visits at the vascular surgery department between April 1, 2016 and December 31, 2018. Free-text fields with clinical information for all visits were extracted for the assignment of the medical record review Rutherford score, after which the agreement with the clinician-reported Rutherford score was assessed using Fleiss' Kappa. A total of 6,633 visits were included for medical record review. Substantial agreement was shown between clinician-reported Rutherford scores and medical record review Rutherford scores for the left ( k  = 0.62, confidence interval [CI]: 0.60–0.63) and right leg ( k  = 0.62, CI: 0.60–0.64). This increased to the almost perfect agreement for left ( k  = 0.84, CI: 0.82–0.86) and right leg ( k  = 0.85, CI: 0.83–0.87), when excluding missing clinician-reported Rutherford scores. Expert's judgment was rarely required to be the deciding factor (11 out of 6,633). Substantial agreement between clinician-reported Rutherford scores and medical record review Rutherford scores was found, which could be an indicator for the validity of routinely reported data. Depending on its purpose, the secondary use of routinely collected Rutherford scores is a viable option.

The Rutherford classification is an example of routinely reported structured data in the EHR that is frequently used in studies as part of their inclusion criteria but is also relevant in daily practice as it may guide further treatment. 5The Rutherford classification captures the different stages of peripheral artery disease (PAD), a frequently studied, common and progressive disease due to narrowed arteries increasingly reducing the blood flow to the legs. 6ix stages of increasing PAD severity and one asymptomatic stage are distinguished. 7Each stage is defined by patient symptoms (i.e., pain intensity and location), Doppler pressures, and the presence or absence of ulcers.Where the difference between some stages may be clear, for example, because they can be distinguished by the presence of wounds, it may be more subtle for other stages and less straightforward without the criteria at hand, resulting in interclinician variation in reported scores as part of daily clinical practice.
Retrospective studies mostly rely on the Rutherford scores as reported by clinicians in the EHR or the health insurance system to decide on the inclusion of patients. 8,9ome studies check the correctness of the structured data of all patients manually to investigate who fulfills the inclusion criteria, which is a time-consuming activity and may not be feasible for large databases. 10To the best of our knowledge, previous retrospective studies did not report about the validity of Rutherford scores assigned by the clinicians in the EHR.Therefore, it is unknown whether routinely collected Rutherford scores by clinicians as part of daily clinical practice are valid to be reused.The aim of the present study is to assess the validity of routinely reported Rutherford scores by clinicians as part of daily clinical practice, as an indicator for the validity of structured data in general that is routinely collected as part of the EHR and their potential for secondary use for research or quality assessment purposes.

Methods Design
An observational study design was used to compare clinician-reported Rutherford scores with medical record review Rutherford scores.This study is part of a larger retrospective observational study to compare treatments among Critical Limb Threatening Ischemia (CLTI) patients, approved by the ethical committee of UZ/KU Leuven (reference number: s64053), for which all CLTI patients had to be identified.

Terminology
Throughout this article "clinician-reported Rutherford score" refers to the Rutherford scores reported by clinicians as part of daily clinical practice and entered as structured data in the EHR."Medical record review Rutherford score" refers to the Rutherford score that was assigned by a dedicated reviewer examining the available clinical information in free-text fields where clinicians report relevant symptoms, wounds, and other findings. 3

Clinician-Reported Rutherford Scores
From April 2016, the Rutherford scores for all inpatient and outpatient vascular surgery visits of the University Hospitals Leuven (tertiary care hospital, hereafter UH Leuven) could be reported as structured data in the EHR(Nexuzhealth) and remained in use for the entire study period.Since July 2017, the type of consultation, such as outpatient, emergency care, and hospitalized consultation, has been reported per hospital visit as well.Reporting by clinicians was a part of routine care, not forced by any hard stops in the EHR if not filled in, yet promoted by the clinical leadership. 11Reporting was developed to be aligned with workflow of busy clinicians who spent most of their time taking care of patients.The reporting form contains structured fields to enter information about the patient and their medical situation, including Rutherford scores.At each inpatient or outpatient visit, Rutherford scores need to be reported for both the left and the right leg separately.Clinicians have nine radio buttons to rank the vascular status of each patient.The first seven options are Rutherford 0 to 6, the other two options are "amputation" and "acute ischemia" which were added to ensure that all possibilities could be reported so that missing scores are actual missing values rather than a situation where none of the options applied.For each visit, clinicians also describe the clinical condition of the patient in free-text fields.

Medical Record Review Rutherford Score
The Rutherford scores and EHR free-text fields with clinical information were extracted for all inpatient and outpatient visits at the vascular surgery department between April 1 2016 and December 31 2018.Unique hospital visits were extracted because the severity of PAD for a given patient may change over time which will be reflected in a different score for a subsequent visit.These data were imported into an electronic case report form (e-CRF) within REDCap (Research Electronic Data Capture) for further review.REDCap is a secure web application for creating and managing databases. 12From each visit, the clinician-reported Rutherford score was imported as well as all EHR free-text fields with the narrative information and, if available, results of noninvasive vascular tests that were potentially relevant to assign the Rutherford score.These text fields included information about pain intensity, pain location, disease progression, wound situation, medical history, and Doppler pressures, which in combination is the information that may be needed to distinguish between different Rutherford stages.All fields with routinely collected data were locked directly after importing the data in the e-CRF to ensure that these fields could not be changed.The reviewer then assigned a Rutherford score for each leg separately in a new field, following the predefined criteria according to Rutherford et al, 7 for all patients including those with missing Rutherford scores.Records that could not be assigned a Rutherford score based on the available text fields were left empty.All records were reviewed by one reviewer (L.T.) to ensure the most consistent application of scoring criteria.In case of uncertainty, cases were discussed with a second reviewer (L.vdH.); if no consensus was achieved, a third reviewer (I.F.) who is an experienced vascular surgeon was consulted.

Definitions
Medical record review Rutherford scores are based on the documented clinical symptoms, separately for each leg.If needed, Doppler pressures are checked.For clinicianreported Rutherford scores, the same definitions are applicable but knowledge of these definitions and thereby accuracy of the scores may differ between clinicians.Clinicianreported Rutherford scores are mostly based on clinical symptoms reported by the patient, as Doppler pressures are usually not (yet) available at the time of assigning a score.Rutherford 0 is assigned when no symptoms of PAD are documented.Rutherford 1, 2, and 3 denote patients who have mild, moderate, and severe claudication symptoms, respectively. 7A walking distance of >100 m is used for mild symptoms and <100 m for moderate symptoms.Since the Rutherford classification does not specify severe claudication symptoms, we defined this as a patient only being able to walk a few meters or only indoors.Rutherford 4 is assigned for clinical symptoms of pain at rest, defined as intractable foot and ankle pain for more than 2 weeks while at rest.Rutherford 5 includes patients with minor tissue loss/ulcerations.Rutherford 6 denotes patients with major tissue loss and gangrene. 7Amputations are documented if an amputation occurred above the ankle.In case of minor amputations, the classification is reported according to the guidelines. 7Acute ischemia is registered if ischemia resulted from an acute cause with a sudden increase of pain for several hours to days.CLTI is defined as Rutherford scores 4 to 6, and non-CLTI as 0 to 3 separately for each leg.
When Doppler pressures are required, Rutherford 1 requires >50 mm Hg ankle pressure after exercise , at least 20 mm Hg lower than the resting value, Rutherford 3 an ankle pressure < 50 mm Hg after exercise, and Rutherford 2 intermediate ankle pressures.Rutherford 4 requires a resting ankle pressure of < 40 mm Hg and Rutherford 5 and 6 a resting ankle pressure of <60 mm Hg. 7 If a wound or crust is present, that leg is assigned a Rutherford score 5 or 6 depending on the size.Until the wound is completely healed, it remains a wound and thus Rutherford score 5 or 6.In case of a venous wound, a Rutherford 0 is assigned.

Statistical Analysis
Descriptive statistics were used to characterize the patient population, using mean and standard deviation (SD) for normally distributed variables, and median and interquartile range (IQR) for other variables.In all analyses, the medical record review Rutherford scores were considered as the golden standard because Rutherford scores were assigned based on consistent use of the required criteria.For the primary analyses, Rutherford 1 to 3 were combined since the generic nature of the distinction between Rutherford 1 to 3 can result in a difference in the assignment of the Rutherford score between clinicians due to subjectivity.Also, with the Doppler pressures not always available when assigning a score, this would be the reason for a difference between clinician-reported and medical record review Rutherford scores rather than the difference being due to the reliability of clinician-reported scores.These Doppler pressures are not needed to assign the Rutherford scores 4 to 6 so that we kept the asymptomatic (Rutherford 0) and Rutherford 4, 5 and 6 as individual stages.Since the aim of this study is to evaluate the validity of the clinician reporting Rutherford scores and the options to report an amputation or acute ischemia were merely added to ensure completeness, these were classified as "other scores" and combined with missing scores in the analysis.
The Fleiss' Kappa was calculated to assess the level of agreement between clinician-reported and medical record review Rutherford scores.This was calculated for the left and right leg separately and both with and without missing values included, as missing values may indicate that clinicians consciously or subconsciously did not enter a score rather than assigning an incorrect score, so these analyses give additional information.The total numbers can be different for each leg for some analyses if a Rutherford score was assigned for one leg (retained in the analyses) but had a missing value for the other leg.The levels of agreement with the Fleiss Kappa values were classified as follows: 0.00 (poor agreement), 0.01-0.20 (slight agreement), 0.21-0.40(fair agreement), 0.41-0.60(moderate agreement), 0.61-0.80(substantial agreement), and >0.80 (almost perfect agreement). 13In addition, we examined the validity to distinguish between CLTI and non-CLTI patients.Contingency tables were used to calculate the sensitivity, specificity, and positive/negative predictive values (PPV/NPV) for clinicianreported Rutherford scores to correctly identify CLTI versus non-CLTI patients.
As a sensitivity analysis, we hypothesized that there may have been more missing Rutherford scores in 2016 than in later years, since the clinician reporting was introduced in 2016 which may take some time to be fully implemented.To test this hypothesis, we compared the percentage of missing Rutherford scores in 2016 versus all later years using a Chisquare test.
Statistical analysis was performed using SPSS Statistics version 25.The significance level was set at p < 0.05 for all tests.

Results
A total of 6,633 visits were included for medical record review.There were 3,281 unique patients who had 1 to 28 hospital visits per patient with a median of 3 (IQR, 1.0-5.0).The mean age of patients at the hospital visit was 67.9 years (SD, 13.8), and 2,109 out of 3,281 unique patients were male patients.The type of hospital visits was mostly outpatient consultations (36.9%), and most missing data (2,288/2,446) were due to the late start of registering this variable (►Table 1).
►Tables 2 and 3 show the Rutherford scores assigned by clinicians compared with medical record review Rutherford scores for the left and right leg, respectively.The diagonal gives the number of records where scores were the same, International Journal of Angiology © 2023.International College of Angiology.All rights reserved.
Validity of Routinely Reported Rutherford Scores van der Heijden et al.CLTI patients were identified fairly well by clinician reporting, shown by the PPV of 89.8% for the left and 88.3% for the right leg (►Tables 4 and 5).Identification of non-CLTI patients was even better, with NPV of 95.2% for left and 94.9% for the right leg.Important to note is that particularly the NPV was affected by missing medical record review Rutherford scores, meaning that there was insufficient information in the EHR to decide which Rutherford score should be assigned.Sensitivity and specificity were moderate, 71.4 and 78.7% for the left leg and 66.5 and 78.8% for the right leg, respectively.This was mostly due to missing clinicianreported scores.
In the sensitivity analysis, from the 967 clinician-reported scores for the left leg in 2016, 273 (28.2%) scores were missing; for the right leg, 274 out of 967 (28.3%) scores were missing.The proportion of clinician-reported missing values were significantly higher in 2016 than in the

Discussion
The present study has shown a substantial agreement between clinician-reported and medical record review Rutherford scores, with an almost perfect agreement when not considering missing values.Clinician-reported Rutherford scores correctly identified CLTI in almost 90% of the cases and non-CLTI in approximately 95%.These are conservative estimates as they were affected by missing medical record review Rutherford scores due to insufficient information in the EHR.A moderate sensitivity and specificity were shown, mostly due to missing clinician-reported Rutherford scores.Possible explanations for insufficient or missing information in the EHR could be, for example, lack of time, not being aware of the importance, not understanding the system, or just not being willing to enter all the data.It rarely occurred that the expert's opinion was necessary to be the decisive factor (11 out of 6,633), which suggests that little discussion is needed about the score.The percentage of missing clinician-reported scores was higher in the first year compared with the remaining period, as shown by the sensitivity analysis, which could be explained by the fact that surgeons needed a period during the first year to get accustomed to structured reporting of Rutherford scores.
Although the validity of clinician-reported Rutherford scores has not been reported before, a previous study investigated the validity of PAD diagnosis in a national patient registry and found a PPV of 71.9% for vascular surgery departments. 14The present study found considerably better PPVs of 89.8 and 88.3% for left and right respectively, which indicates a better ability to correctly identify CLTI patients.The high PPVs of the present study are most likely the result of frequent instruction and monthly feedback of (un)completed fields.No other validation studies have been published about the reporting of (a subset of) PAD or about specific routinely reported scores.6][17] This could indicate that the present study and its findings are representative for other routinely reported scores or diagnosis.
Strengths of this study include that all patients who visited the vascular surgery department were included and scores validated, rather than a sample.This resulted in a large sample size, including both the start of the implementation and the period in which everyone was used to the need to report the Rutherford scores, thereby presenting real-world  Validity of Routinely Reported Rutherford Scores van der Heijden et al.
data.However, some limitations should be noted.The validity of clinician-reported data depends on data completeness and accuracy of reporting.Missing data affected the sensitivity, specificity, and also the NPV, which may have been underestimated if the lack of documentation means that Rutherford 0 is the appropriate score.In addition, while a dedicated reviewer consistently applying the required criteria for the Rutherford score can be considered a strength compared with doctors who may be preoccupied with many different things during a consultation (i.e., treating the patient optimally), having only one dedicated reviewer is also a limitation in the sense that it may have caused observer bias and influenced the level of agreement.By regular discussions with another reviewer, we tried to minimize this bias.However, it would be even more optimal to have two independent reviewers to check all records, since despite strictly following the guidelines there might still be a judgmental difference.Finally, the single-center nature of the study is a limitation as we validated this specific type of clinician reporting within the EHR system of UH Leuven and it is unknown whether it can be generalized to other hospital types with a different patient mix or other countries with a different health care system.The implications of our findings are that clinicianreported Rutherford scores during daily clinical practice can be used to reliably select CLTI patients for clinical research.Assuming that most of the missing medical record review Rutherford scores will indicate Rutherford 0, we can be rather confident that we will not miss any potentially eligible CLTI patients.These findings may thereby also act as an incentive to raise awareness of the importance of accurate clinician reporting and avoid missing data, particularly knowing that these data need to be reliable if it is reused.The decision whether to use the routinely collected data also depends on how specific patients need to be selected for secondary use.For instance, for studies requiring full agreement of specific Rutherford scores, for example, only 5 and 6, it may not be sufficiently reliable to solely rely on routinely collected data.Still, even in those cases, it may help to start from a smaller selection of CLTI patients rather than reviewing all patients.In case of large quality assessment procedures, a small difference in patients selected might have little impact on the outcome measures.Future research could focus on the validation of other routinely collected scores such as the WifI (wound, ischemia, and foot infection) Classification System and on having two dedicated reviewers reviewing cases independently to further minimize observer bias and assess the interrater reliability. 18In addition, other disciplines than surgery can benefit from the results of this study since the nature of the study focuses on routinely collected data.

Conclusion
A substantial agreement between clinician-reported Rutherford scores and medical record review Rutherford scores was found, which increased to an almost perfect agreement when missing clinician-reported Rutherford scores were excluded suggesting that those assigned by clinicians were valid.This agreement, together with a good ability to identify CLTI patients, makes reuse of these routinely collected Rutherford scores a viable option, particularly if these findings act to stimulate better EHR documentation and fewer missing clinician-reported scores.

Table 1
Baseline characteristicsValidity of Routinely Reported Rutherford Scores van der Heijden et al.
Abbreviations: BMI, body mass index; IQR, interquartile range; SD, standard deviation.International Journal of Angiology © 2023.International College of Angiology.All rights reserved.almost half were assigned a Rutherford 0 score at medical record review (48.1% for left leg and 46.7% for right leg) but also a considerable part was still classified as other or missing (29.5% for left leg and 30.8% for right leg) indicating insufficient information or correctly reported amputations.No acute ischemia was reported by clinicians, whereas 25 (left leg) and 29 (right leg) were classified as such during the

Table 2
Clinician-reported Rutherford scores versus medical record review Rutherford scores for the left leg

Table 3
Clinician-reported Rutherford scores versus medical record review Rutherford scores for the right leg

Table 4
Clinician-reported CLTI diagnosis versus medical record review CLTI diagnosis for the left leg, based on Rutherford scores