Keywords analytics - documentation burden - EHR system - clinical informatics
Background and Significance
Background and Significance
Usage analytics is an emerging source of data for measuring burden related to use
of the electronic health record (EHR),[1 ] investigating the impacts of EHR use on patient care[2 ] and evaluating the impact of interventions[1 ]
[3 ] or events (e.g., COVID-19)[4 ] on EHR use. Analytics data from non-EHR care delivery tools such as nurse call systems
and communication devices have also been used to measure and predict strain amongst
clinicians.[5 ] For EHR burden, measures have traditionally focused on subjective measures through
surveys,[6 ]
[7 ]
[8 ] including validated instruments for concepts like “burnout” (e.g., Mini-Z survey[9 ]). While these approaches allow for characterizing perceived burden and EHR experience,[10 ] discrepancies between what is estimated by end-users and what is observed in the
system have been identified.[8 ] Moreover, it remains difficult to pinpoint the specific challenges that lead to
the burden.[11 ]
[12 ] To identify the root causes of EHR-related burden and evaluate the impact of the
solutions, usage log data will be a critical component toward enabling data-informed
training[13 ] and optimization of the system.[14 ]
[15 ]
[16 ] Over the past decade, EHR usage logs have been increasingly used to gather metrics
for documentation burden concepts[1 ] such as effort (e.g., after-hours work) and time (e.g., average time spent),[17 ]
[18 ] with a recent scoping review[1 ] reporting the use of this method of data collection within 80% of studies across
a variety of hospital settings (e.g., general surgery,[19 ] primary care, and internal medicine[20 ]
[21 ]
[22 ]).
Previous research has demonstrated the need for validity and reliability within EHR-based
metrics and reporting.[23 ]
[24 ] Therefore, before employing these tools to measure efficiency, understanding the
reliability and validity of these tools is important. To ensure administrators and
physicians are confident in using EHR usage log data, the validation process described
in this case report is necessary. Given that each organization has unique workflow
and documentation practices on the EHR system, it is important that the metrics are
able to accurately provide a complete picture of the usage patterns of clinicians.
A systematic review of EHR logs[25 ] used for measuring clinical activity demonstrated that only 22% of the 85 studies
validated the accuracy of the findings. Currently, two approaches are used for validation:
(1) action–activity mappings (by consulting EHR vendors, consensus between researchers,
or direct observation) or (2) activity patterns or duration (using direct observation
or using self-reported data). These “time–motion” studies[25 ] involve real-time observation and comparison of how long a function has been performed
by a clinician (e.g., using a stopwatch or TimeCat[26 ]) against the usage log readings. Use of this method has been fairly successful in
ophthalmology[27 ] and primary care[28 ] settings, among others.[29 ] However, time–motion studies require significant human resources and time to conduct
the validation, and can be difficult to scale in organizations with many departments
and unique workflows. The presence of an observer may also hinder the comfort levels
of patients and providers within the clinical environment, particularly in a mental
health setting. Moreover, given the current pandemic and the rise of telemedicine
or distancing requirements for in-person care, it is not feasible nor appropriate
to introduce an observer within the clinical environment. From a privacy perspective,
recording physician's screens can also introduce a challenge for privacy preservation
as identifying information of patient names, images, and diagnoses are unintentionally
captured.[30 ] Thus, a safe, remote, and less invasive approach for validation of the usage log
data of EHR systems is needed.
Objectives
In this case report, we introduce an approach using clinical test cases that was implemented
to validate the usage log data for an integrated EHR system in use at a large academic
teaching mental health hospital. While other organizations may have used this approach
in their validation, to our knowledge, this is the first study that discusses this
approach within the academic literature. This approach overcomes the limitations identified
above in current validation approaches and provides an effective way to test out a
large number of workflows with limited resources, and in a manner that is noninvasive
to the clinical environment. We highlight the utility of this approach, including
key considerations for applying this methodology and using EHR log data in practice.
Materials and Methods
The approach for validation of usage log data is composed of three phases ([Fig. 1 ]). The validation was conducted in an academic teaching mental health hospital located
in Toronto, Ontario and approval was obtained from the organizational Quality Project
Ethics Review Board.
Fig. 1 Overview of approach for usage log validation.
Phase 1: Creating Test Cases Based on Real-World Clinical Workflows
Given that the ultimate goal of EHR usage data is to accurately capture the usage
patterns of clinicians using the EHR, a guiding principle of this stage was to develop
use cases (hereafter called “test runs”) that mimicked as closely to real-world clinical
workflows used by physicians at the organization. To create these test runs, we began
by consulting the organization's EHR training modules that are available to all physicians
on our intranet, and provide detailed line-by-line physician workflows of how to carry
out common tasks within the EHR. Through consultation from senior clinical leadership
including the Chief Medical Informatics Officer (T.T., who is also a practicing physician
within the organization), physician super users of our EHR system, and our clinical
informatics nurse, we identified common inpatient and outpatient physician workflows
and reconciled any differences between training documents and real-world EHR usage.
The clinical informatics nurse consulted is solely responsible for EHR education and
training for all 400 physicians at our organization and hence well versed in physicians'
use of the EHR. Using an agile, iterative approach, consultation with these aforementioned
individuals helped develop test runs that closely resembled real-world physician workflows
(see [Table 1 ] for example test runs; see the [Supplementary Material ] for all test runs [available in the online version]). Since organizations have unique
clinical workflows, we wanted to ensure that these test cases were specific to our
organization, and therefore did not consult the literature for this step.
Table 1
Example of a test run used for validation of the usage log data
Example inpatient test run: preparing for inpatient discharge
Example outpatient test run: regular follow-up at clinic
• Search and open MRN
• Add discharge diagnosis (bipolar disorder)
• Initiate a discharge order for patient, including the location of discharge
• Complete discharge summary note for the patient
• Search and open MRN
• Order “lorazepam”
• Complete an outpatient progress note
• Search and open another MRN
• Review laboratory results
• Complete discharge documentation for the patient
Abbreviations: MRN, medical record number.
Phase 2: Execution: Conducting the Clinical Workflows in the Test Environment
This involved setting up a dedicated mock physician account in a nonproduction version
of the EHR system, so that all activity within this account was recorded. The only
difference between the test environment used within this study and our organization's
production environment is the absence of real patient data. EHR functionality and
modules were identical between the two environments, and our vendor provided confirmation
that the methods used to measure usage metrics were identical between the two environments.
Mock patients within the test environment were developed by our clinical informatics
nurses that mimic real patients with representative diagnoses, care plans, etc. The
Research Assistant (RA) performed the test runs that were developed in phase 1, using
pauses and variations in documentation time to resemble real-world interruptions and
workflows.
For data collection, two complementary techniques were used: (1) a spreadsheet with
automatic time-log capture[31 ] was developed: when each task within a test run was complete, the RA manually indicated
completion in the respective cell, which then automatically logged the time of completion;
(2) a screen recording tool was used: this method was used to re-measure the execution
of the test runs through measuring the times by a stopwatch. The spreadsheet was used
to easily calculate the time spent across each task of a test run (as well as the
entire test run) and the screen recording allowed for retrospective review and confirmation
of time spent.
We gathered EHR-usage data from the back-end analytics platform from the EHR vendor.
By logging the exact date and time of the interactions in the test account, the EHR
vendor extracted the most granular level of detail during that time period to allow
for comparison. The following data were sent through a spreadsheet:
Phase 3: Data Analysis: Comparison between Usage Logs and Test Case Observations
To determine if the time extracted from the EHR back-end analytics platform was comparable
to the data we collected through the spreadsheet and screen recording, we converted
all time measures to seconds. We compared (1) time spent per patient for the following
metrics: total time in EHR, documentation time, order time, chart review time, allergies
time, problem and diagnosis time, and (2) counts for the following metrics: patients
seen, notes written.
Utility was defined by the absolute differences and percent differences observed between
the two methods. To explore the discrepancies between recorded values and those found
within the analytics platform, the RA replicated the tasks and consulted with our
vendor to identify the root causes.
Results
A total of 10 test runs were conducted by the RA (A.K.) across 3 days, with one system
interruption reported in Run 8. Differences in measurements found between the two
methods of data collection (i.e., RA recorded values and usage analytics platform)
for time-based metrics averaged across 3 days are summarized in [Fig. 2 ] and count-based metrics are summarized in [Fig 3 ]. Results of independent t -tests performed for all eight metrics are highlighted in [Supplementary Table S1 ]; however, it should be noted that these results are based on one user with 3 days
of data. A summary of measurements extracted from the analytics platform compared
with recorded values is outlined in [Supplementary Table S2 ] (available in the online version). The percent difference between measurements recorded
by the RA and the usage analytics platform ranged from 9 to 60%. The discrepancies
observed in time in EHR and order time in EHR were relatively small (<20%). Of the
3 concurrent days of data collection of documentation time in EHR and chart review
time per EHR metrics, findings from one of these days yielded large % differences
(57–60%) between time captured by the usage analytics log and our spreadsheet.
Fig. 2 Measurements recorded by usage analytics platform and test case observations for
time-based metrics (averaged across 3 days).
Discussion
Validation of metrics is often considered a barrier toward the full uptake and use
of usage log data to support characterization and mitigation of the EHR burden. This
is mainly due to the resource and time required to conduct robust time validation
studies, hindering its practical execution. This work outlines a feasible and resource-efficient
approach for validating usage log data for use in practice. Previous validation studies
using screen recordings of EHR sessions have demonstrated that metrics such as total
time spent time within the EHR correlated strongly with observed metrics (r = 0.98, p < 0.001), where each minute of audit-log-based time corresponded to 0.95 minutes
of observed time[32 ] across a variety of provider roles. Other research within ophthalmology clinics
validating EHR log data using time–motion observations have demonstrated that 65 to
69% of the total EHR interaction estimates were ≤3 minutes from the observed timing,
and 17 to 24% of the EHR interaction estimates >5 minutes from the observed timing.[33 ]
Lessons Learned
We learned the following lessons from our technique:
Ensure partnership with EHR vendor : this approach allowed for a collaborative review of discrepancies by both organization
and vendor. We identified some discrepancies amongst our metrics after splitting it
by the reported number of patients seen. Upon review with our vendor developers, the
number of patients seen is only recorded if they only complete a certain action (e.g.,
completing a form as opposed to an actual documentation). Additionally, when looking
at our after-hours metric, the hours were predetermined by our vendor (6 p.m.–6 a.m.),
and this time frame might be different from physician's work hours. This helped us
explain why some values were overinflated against our own measurements, and also helped
us brainstorm other situations (e.g., physicians signing on resident's notes) that
might impact the time spent.
Iteratively test and maximize data transparency : the resource-efficient approach of this method allows us to adjust the level of
depth of our workflows and repeat the runs as necessary (e.g., after EHR upgrades).
We began our validation with less complex test runs that repeated the essential tasks
in a very controlled setting (results not shown here) and gradually increased to more
sophisticated test runs as it aligns to our metrics. Thus, this method allows for
a step-wise, controlled validation approach that can be embedded as part of the implementation
lifecycle. The screen-recording helped us explore the rationales for the discrepancies.
These recordings also allow for transparency and replicability of the results.
Interpret results appropriately
: our findings provide support of the accuracy of the usage log data to our clinical
workflows. Previous studies have reported variations of overestimations of 43% (4.3
vs. 3.0 minutes) to underestimations by 33% (2.4 ± 1.7 vs. 1.6 ± 1.2 minutes).[25 ] For most of our metrics, the discrepancy between the usage analytics platform and
our observation was fairly consistent across the 3 days of test runs, which suggests
that the metrics are consistently calculated on a day-to-day basis. For large differences
within our data (e.g., 124% underestimation of Allergies time), it is important to
note that these differences can be amplified when small amounts of time are spent
carrying out a particular task (i.e., <30 seconds/patient). Moreover, there is a slight
variation in the back-end calculation of certain metrics where it might appear that
the user is “documenting.” In these cases, back-end timers may have counted this task
within a different area of the chart since the documentation sat within a larger section
(e.g., a workflow page). While it is very difficult to obtain very high accuracy as
per the nature of log data (e.g., when mouse stops moving),[34 ] the presence of these results provided confidence on the level of accuracy (i.e.,
range of error) we can expect in practice. When differences between observed and measured
time are vast, organizations can still make use of usage analytics tools to evaluate
the impact of initiatives; however, in these cases, organizations will need to consider
the % change instead of absolute value of time spent in EHR activities pre- and post-initiative.
Limitations
Several limitations should be considered for these results and methodology. Foremost,
while the test runs created for use in our validation closely resemble physician behavior
in the EHR, we did not use real-world observational data of multiple physicians, which
highlights the variety in practice and use of the EHR. The human-factors bias of using
a single RA for conducting the tests is an added limitation. We used test runs inclusive
of pauses and interruptions to EHR use to resemble real-world interruptions and workflows;
however, we recognize that there could be longer workflow interruptions caused on
a clinical unit.
Moreover, we used a test environment (or “instance”) of the EHR instead of a production
environment. However, since back-end metrics calculation within the test environment
was confirmed (by our vendor) to be identical with how metrics are calculated in the
production environment, the impact of difference in environments is negligible. Moreover,
the current validation was only conducted at one mental health organization. Despite
the context being a large academic mental health hospital, we caution generalizing
these results as the workflows likely differ across organizations.
Future Directions
The findings from the use of this methodology identify a few areas for future consideration.
Foremost, based on our information gathering,[8 ] we only validated a small number of usage log metrics that were considered a priority
for our organization. While we anticipate that this method should suffice for other
metrics, future clinician metrics considered important for reducing EHR burden[3 ] (e.g., medication reconciliation) should also be validated using a similar approach
mentioned in this report. Moreover, it would be useful to explore the application
of this approach to other EHR systems at different organizations. Lastly, we continue
to see a paucity of data validation studies being published. In an effort to promote
transparency and understanding of the utility of usage log data, we encourage other
mental health and nonmental health organizations to share their validation results.
Once validation of usage log data has been conducted, organizations can use appropriate
metrics to measure EHR-related burden prior and after implementing initiatives aimed
at reducing burden.
Conclusion
This case report introduces a novel minimally invasive approach toward validating
usage log data. By applying it to usage data at a Canadian mental health hospital,
we demonstrated the flexibility and utility of the approach in comparison to conventional
time–motion studies. Future studies should aim to explore and optimize this methodology
for validation of usage data across various EHR systems and practice settings.
Clinical Relevance Statement
Clinical Relevance Statement
This initiative will provide a feasible and low-burden approach to validating EHR
usage data for further optimization and quality-improvement initiatives.
Multiple Choice Questions
Multiple Choice Questions
When studying EHR burden, what are the some of the EHR analytics metrics to consider?
Correct Answer: The correct answer is option d, all of the above. All the three metrics listed above
are valuable in measuring the amount of burden caused by the EHR, and could be helpful
in measuring the impact of interventions.
When interpreting EHR analytics data, what is a good metric to use for measuring change
due to an intervention?
% difference
Mean difference
Median difference
Standard deviation
Correct Answer: The correct answer is option a, % difference. While the mean or median difference
is an absolute measure of measuring change, using the percent (%) difference measure
takes into account minor discrepancies between actual time spent and time recorded
by analytics systems.
Fig. 3 Measurements recorded by usage analytics platform and test case observations for
count-based metrics (averaged across 3 days).