Background and Significance
Health care-associated infections (HCAIs), also termed nosocomial infections, are
infections in patients that are caused by the health care environment which can cause
significant mortality and morbidity.[1]
[2]
[3]
[4] HCAIs are caused by the use of invasive devices such as catheters and ventilators.[1] One type of HCAI, catheter-associated urinary tract infection (CAUTI), has been
identified as a priority for prevention, as the rate of urinary catheterization has
been found to be between 12 and 19% in hospitals in different countries,[5]
[6]
[7] and that in the United States, the risk of acquiring CAUTI increases by 3 to 7%
per catheter day.[6]
For these reasons, the World Health Organization (WHO), the Gulf Cooperation Council,
and agencies in individual countries such as the Centers for Disease Control and Prevention
(CDC) in the United States, have published guidance for the prevention, treatment,
and surveillance of CAUTI in health care settings.[6]
[8]
[9] However, the process of collecting CAUTI surveillance data, storing it, and leveraging
the data to inform health policy has been fraught with challenges. In the United States,
the National Health Safety Network (NHSN) was designed as a tracking system for CAUTI
and other HCAI that health care settings can opt into to use.[10]
While the NHSN presents one solution, the system has been shown to have many problems.
The NHSN requires use of standardized forms for data collection procedures, and because
these forms are used across the very diversified U.S. health care system, they become
prohibitive for individual centers to complete.[10]
[11] Also, NHSN results are of questionable use for evidence-based policy; NHSN rates
of CAUTI are extremely sensitive to case definition features,[12] miss many important cases differentially by location,[13] and suffer from unacceptably low interrater reliability,[14] to name a few issues. Further, the NHSN acknowledged in 2013 that there were issues
with HCAI case definitions that affected surveillance estimates, so it undertook a
process of revision.[15]
While the NHSN policies and procedures might represent the most widely used system
for CAUTI surveillance globally, as many countries outside the United States follow
NHSN's lead,[16]
[17]
[18] it is an incomplete solution. Unfortunately, these challenges mean that the work
of determining optimal reporting and visualization of CAUTI data to guide health care
policy has not even started.[19] The authors could find no recommendations for optimal visual presentation of such
data, even from NHSN. It was not possible to consider improving upon the NHSN system
due to the many technical issues arising from having one centralized system for surveillance
of many different types of infections.
This project represents a prototype based on a use-case for a real organization that
will be referred to from now on as Organization A. The dashboard solution was custom-developed
for a particular health care corporation and involved subject matter experts from
this organization. For the purposes of protecting privacy, a fictitious scenario based
on the real case was created. Since Organization A's reach covers a wide range of
geographic territory and represents an integrated delivery system of health care with
a multitude of both general and specialty hospital settings as well as outpatient
clinics, to function as a useful prototype for this use-case, intensive care units
and other units that use urinary catheters were selected to be included in the dashboard
from four locations: two general hospitals, a specialty heart hospital, and a specialty
cancer center. The dashboard prototype was developed as part of an upgrade of the
CAUTI surveillance system that was conducted at Organization A.
Methods
This prototype is based on a real use-case, but for the purposes of protecting privacy,
a fictitious scenario based on the real case was created, and will be described below.
Setting
In the fictional scenario, a town in the Midwestern United States named Holiday is
created to mimic a few of the real settings in Organization A. Organization A's two
general hospitals were recreated into the prototype as Holiday General Hospital (HGH)
and smaller Anderson-Walker Hospital (AWK). The Holiday Heart Hospital (HH) and Holiday
Cancer Center (CC) were included as the specialty settings, respectively. The structure
of the data for this prototype was based on the structure of the data developed for
Organization A, but the data used in this project were fabricated.
Surveillance System Upgrade
Originally, the system at Organization A for CAUTI was a manual process designed to
track data about each CAUTI case. The director of the infectious disease department
(DI) led educational interventions so her team was apprised of the NHSN case definition
of CAUTI.[6] When a potential case of CAUTI was identified, the DI would be notified, and this
was followed by a laboratory-confirmed diagnosis and treatment for the patient. A
paper form would be completed and stored documenting case data along with laboratory
reports.
However, it was found that the data collected were not standardized in any way. They
did not adhere to standards set by regulatory bodies (e.g., the CDC), nor did they
follow internally set standards. A review of the literature revealed that risk factors
for CAUTI vary by country and risk factor groups (e.g., older age groups, immunocompromised)[7]
[20]
[21]; and that there were no global data standards for data entry, storage, and analysis
of laboratory results. For noncommunicable diseases, the WHO recommended using a STEPwise
approach to developing unique surveillance systems customized for conditions and risk
factors in a particular population.[22] The STEPwise model was used in redesigning the surveillance system.
Data Collection
In the new system, a risk factor form was created called a case surveillance form
(CSF) and was developed in Microsoft Word. The DI worked with one author (M.M.W.)
to develop the questions on the CSF, which gathers the following risk factor information:
if patient was a transfer, if they were receiving antibiotics within 90 days of admission,
if they had at least one chronic disease upon admission, if they were bedridden, if
they were incontinent, if they were immunocompromised, if they had an infection at
any site at the time of admission, if they had undergone urinary surgery within 7
days of admission, and if they were a neonate. Risk factors were selected based on
priorities set by the DI and local policy.[22]
For this demonstration project, one author (M.M.W.) generated data in the same structure
as the data from the CSF (called CSF data), with one case per row. One author (M.M.W.)
also determined the data structure for the laboratory data (called Laboratory data),
which was to list one organism per row, plus all its attributes (drug sensitivities
and resistances). Each case is assigned a confidential CAUTI identity (ID), and this
identifier is included in each row of the CSF data. For the Laboratory data, the CAUTI
ID is present on each row, as well as the date of the report, and the organism on
the report, creating a triple index to make the row unique. This does not demonstrate
a real-time data processing system, but instead a warehousing approach on which a
dashboard could be placed. This is because CAUTI is so rare that real-time processing
is not necessary.
Data Structure
[Table 1] summarizes the data structure behind the CAUTI surveillance system at Organization
A, and [Fig. 1] shows the entity-relationship diagram to depict how the two tables are related,
and the composite index on each table that makes each row unique.
Table 1
Data structure behind the CAUTI surveillance system analytic data
Table
|
Domain
|
Data collection
|
Row entity (Indexes)
|
CSF
|
Administrative
|
Data transcribed from medical record
|
Primary key: Case ID assigned by DI from roster
Composite index: None, but theoretically:
patient medical record plus
admit date
|
Demographics – Clinical
|
Risk factors for CAUTI
|
Laboratory
|
Pathogen
|
Transcribed from laboratory report and classified to standard abbreviations
|
Primary key: Laboratory ID (automatically generated)
Composite index:
Case ID
date of laboratory report
pathogen
Foreign key to CSF:
Case ID
|
Drugs and resistance
|
Drugs and sensitivity
|
Abbreviations: CAUTI, catheter-associated urinary tract infection; CSF, case surveillance
form; DI, director of the infectious disease; ID, identity.
Fig. 1 Entity-relationship diagram. This shows how the two tables described in [Table 1] are related in the surveillance system. FK, foreign key; IDX, index; PK, primary
key.
As is seen in [Table 1] and [Fig. 1], two data tables are used in the surveillance system, CSF and Laboratory, and each
has their own primary key. They are related through the primary key for CSF, Case
ID, being present in the Laboratory table in each record as a foreign key.
Choices for Data Visualization and Dashboard Development
After the upgrade, one author (M.M.W.) wrote a report for Organization A that included
data visualizations. This experience prompted the choices that were made for data
visualization used in the dashboard implementation. In contrast to the static report,
the dashboard allows the user to interact in real-time to visualize data in the dashboard.
Choice of Rshiny Dashboard Solution
SAS and SPSS are statistical software that have traditionally been used for health
data analytics, but they do not have advanced visualization capabilities and are commercial
products.[23] Making a dashboard using the Web visualization approach of Python programming supporting
a JAVA display requires knowledge of both Python and JAVA, and therefore is difficult
to maintain. Using a visualization package like Tableau is limiting, in that it does
not have its own programming language. Tableau, however, does have some programming
capabilities, with functions like those in Microsoft Excel, and also interfaces with
R to make it more extensible.[24] Also, considerations must be made as to how the final dashboard can interface with
the health care organizations' internal data systems when selecting software at the
design stage.[25]
R is an open source software that allows for programmers in the community to contribute
“packages” as optional add-ons to R.[23] Although there are many ways to program R, R developers created two popular programmer
interfaces: R GUI and RStudio.[23] R GUI simply provides a programming “console” window where code is run, and scripts
windows where code can be programmed and saved. If code run in R GUI generates a plot,
it opens in a separate window. In contrast, RStudio is an integrated development environment
that allows for advanced visualization capabilities that can seamlessly interface
with the World Wide Web, and is necessary to use for dashboard development.[26] In RStudio, Web applications can be developed, where running code outputs graphics
to a Web browser. Because of this structure, the code is easy to maintain, and because
it is open source, it is easy to interface R with most data systems.
A package, Rshiny,[27] has been developed specifically for dashboarding visualizations; Rshiny can be leveraged
with a constellation of other visualization packages that can be added to R to achieve
various visual and data management goals. R script files, which are basic text files
saved with the .R extension, allow users to deploy an Rshiny application to a local
or remote server. The application code can incorporate both back- and front-end components
in one long program, or can be modularized into independent code segments.[28] Modularization provides flexibility, because modules of code can be selectively
called by the front-end code through available packages and functions.[28]
Ultimately, R's flexibility that allows programmers to build visualizations on data
coupled with its data management and analysis capabilities were the main reasons why
R, and specifically Rshiny, was chosen for this prototype solution. Another important
reason why R was chosen is that it is open source, which allows for the following
features: (1) no cost to obtain the software, and (2) the potential for sharing the
solution across platforms to promote adoption. The packages shiny and shinydashboard
were used to achieve all the functionality described in this prototype.[27]
[29] The plotting package called ggplot2[30] was used for the bar charts, and package UpSetR was used for the other plots.[31] A resource online called Shiny from RStudio provides a gallery of dashboards contributed
by community users, along with documentation.[32] RStudio makes a public Web page available called ShinyApps that provides a space
for R developers to share their Rshiny dashboard solutions.[33] The dashboard solution that is the subject of this article is published there, and
the source code is available on GitHub (https://github.com/NatashaDukach/CAUTI_dashboard/blob/master/app.R).[34]
Dashboard Structure
The dashboard was divided into three topics: visualizing risk factors specific for
a physical location and date; visualizing rates of detected microorganisms and patterns
of resistance and sensitivity of the organisms to drugs at the specific location and
time; and visual comparison of selected plots. The user sets parameters using controls
on the dashboard, and the dashboard reacts by updating the display. [Fig. 2] represents the default dashboard display (after dismissing splash screen).
Fig. 2 Dashboard structure. This is the default display of the dashboard after dismissing
the splash screen. In terms of the dashboard structure, buttons in the upper right
can be clicked to display “about,” “instructions,” and “contact” information. Across
the top are controls that set the parameters for location and time range for data
display. The button with three parallel lines on the upper left can be clicked to
autohide the menu down the left side. In the center of the default display is the
plot visualizing risk factors from the case surveillance form (CSF). On the right,
location and subunit codes are used to indicate specific subunits within locations.
Codes use the logic of location abbreviation, followed by an underscore, followed
by a subunit abbreviation (e.g., HMC_PEDS). Across the bottom, left to right, are
the risk factors that were selected for inclusion in the surveillance system: All
cases, having received antibiotics within 90 days of admission, having a chronic disease,
bedridden at admission, incontinent at admission, immunocompromised at admission,
already colonized with infection at any site upon admission, having had urological
surgery within 7 days of admission, and being a neonate.
This dashboard has several structural components: tabs, controls, and charts. Each
tab is a page which consists of three basic elements: header, sidebar, and body. The
sidebar is the vertical panel on the left side of the dashboard that can be used to
navigate to the tabs, and the body portion of the tab includes all controls and graphical
visualization. Per [Fig. 2], Rshiny default provides three areas of the screen: the top band, the side bar,
and the middle section. The programmer can choose to place tabs on the side bar to
allow the tab navigation. Controls capture user input; [Fig. 2] shows controls both above and to the right side of the plot.
In addition to being the default first tab of the dashboard, [Fig. 2] displays a bar plot of risk factor prevalence from the CSF data. Across the top
of [Fig. 2], there are two controls in the body that allow the user to set parameters: a location
dropdown, and a date range selector. In the upper right on the header, there are three
buttons; the “about” button describes the project and repeats information that displays
on the initial splash screen, the “instructions” button opens a splash page that provides
directions about how to use the dashboard, and the “contact” button opens a splash
page that displays the information about how to contact the authors. In the upper
left, the name of the dashboard is present, and to the right of this are three horizontal
lines. This is a button that, when clicked, will expand the middle white part of the
screen while autohiding the blue, vertical menu on the left. Clicking the same button
again will toggle back to the default display in [Fig. 2].
Dashboard Features
In Rshiny dashboards, the application programming code can be roughly classified into
two categories: back-end programming and front-end programming. Raw data are typically
imported into R using back-end scripts for the purposes of achieving data manipulation
and staging data for display. Hence, back-end files are usually snippets of R code
written for a specific data management task. On the contrary, the front-end programming
includes code that manipulates the display. Back-end and front-end files are stored
in the same application directory and must be present together for the Rshiny application
to function.
Four of the five tabs feature a reactive plot with user-manipulated controls, and
the last tab features two plots side by side, with a set of controls determining which
plots are displayed. The second tab, “Detected Microorganisms,” displays an upset
plot showing the pattern of organisms in cases in the system. The third tab, “Antibiotic
Resistance,” and the fourth tab, “Antibiotic Sensitivity,” display upset plots showing
patterns of drug resistance and sensitivity in cases, respectively. Finally, the “Compare
Plots” tab allows the user to select two of the aforementioned plots so they can be
displayed and compared side by side.
Use of Shiny Widgets for Controls
The package shinyWidgets can be installed on top of shiny and shinydashboard to provide
a collection of responsive input controls that users can manipulate to select one
or several items listed in a dropdown menu.[35] In this dashboard, the two types of controls used from the shinyWidgets package
were the dropdown picker and the date range picker. For the dropdown picker, the pickerInput
function controls the available choices in a dropdown based on input from other controls.
The second type of control used from the shinyWidgets package was the dateRangeInput
function, which allows the user specify a time range to filter the results reported
between two dates (see [Fig. 2]). This date filter restricts the data being plotted by only including cases with
a date of admission in the date range. The values in the location and date range controls
are applied to the visualization on all tabs, and if these values change, visualizations
on all tabs are updated.
Use of Shiny Server for Reactive Output
To enable the shinyWidgets to function with real-time reactivity, the shiny application
must be set up with code in a specific format in order for the application to run.[36] Through the user making selections using the controls, input values change, and
the new information which is passed to the server-side of the code. To achieve this,
the dashboard application scans every few milliseconds to evaluate the current value
of every object that is being used during the session, and if it detects any change
in object value, R reevaluates and displays a new output. To facilitate redisplay
after parameters are changed by the user through controls, server code must include
the reactive function expression input$x, where x is a blank placeholder which will
be populated by the user-selected value after the user chooses the value through the
controls. As R continues to search for updated values in controls, it will repopulate
x if it finds that a change has been made, and this will cascade to update the display.[37]
Reactive functions used in this dashboard were: observeEvent, updatePickerInput, and
renderPlot. It is the observeEvent function that essentially “listens” for changes
in parameters through the user manipulating controls. The observeEvent function should
be set on every control that can be dynamically updated and should therefore reactively
update the display. For the dynamic dropdowns in this dashboard, if the observeEvent
function detects the event of a change of location or date range in the header, it
launches the updatePickerInput function which then updates the dropdowns dependent
upon these parameters. For example, if the location selected were changed from AWK
to HGH, the observeEvent function would detect this, and then the updatePickerInput
function would change the risk factor dropdown choices (as shown in [Fig. 2]) to offer only those available for location HGH.
[Fig. 3] provides an example of where reactive output was facilitated by the observeEvent
function. [Fig. 3] shows an example of the upset plot for detected microorganisms. Note that the top
controls indicate the locations and time frame selected. For the data that meet these
criteria, only eight microorganisms were available for plotting. Were the user to
change the selection of the plotted microorganisms, or change the selection of the
header controls, the observeEvent function would notice the changes, and re-render
the plot with renderPlot. If location changed, this would change the microorganisms
in the picklist, so the updatePickerinput function would arrange for this.
Fig. 3 Example of upset plot for detected microorganisms. The header controls indicate the
locations and time frame selected. On the right, abbreviations for the microorganisms
available for plotting for the locations and time frame selected appear in a dropdown.
For the data that meet these criteria, only eight microorganisms were available for
plotting. These same abbreviations are used on the y-axis of the upset plot.