Results
Search Results and Screening
Overall, we screened 16,589 abstracts with 244 unique studies being included in this
review with 11 instances of study-topic overlap ([Supplementary Appendix S3 ] [available in the online version] for inclusion/exclusion numbers by outcome). The
most represented topics were in-hospital mortality, pain, and length of stay. The
least commonly represented topics included unit culture, burnout, and AI/ML credibility
and acceptance. [Fig. 1 ] illustrates the most common predictor variables, among the most common being demographics,
diagnoses, and laboratory tests. Evaluating the most represented topics from year
to year could help in making recommendations for future study areas.
Fig. 1 Twelve most frequent predictor variables among the data science literature relating
to nursing practice.
Several data science methods were identified in the literature as outlined in [Fig. 2 ]. Generalized linear models were the most common and were only included in this review
if they were used in tandem with more advanced methods or were used as prospective,
clinical prediction models. More advanced data science methods were also used, including
supervised ML (n = 102), unsupervised ML (n = 42), neural networks (n = 28), and NLP (n = 19).
Fig. 2 Use of methods among the data science literature relating to nursing practice.
The data extracted from each study are listed in [Supplementary Appendix S4 ] (available in the online version). The following results are presented by topic,
summarizing study designs, data science methods, and implications for nursing practice.
Artificial Intelligence/Machine Learning Credibility and Acceptance
We identified three studies for the topic of AI/ML credibility and acceptance. All
three studies used a retrospective cohort design.[9 ]
[10 ]
[11 ] The data sources used by the studies varied among an Italian national injury database,[11 ] nine public computer-aided diagnostic datasets,[10 ] and endoscopic images.[9 ] Study populations included adults[9 ]
[10 ] and an adult/pediatric mixture.[11 ] One study was based in the United States,[10 ] one in Korea,[9 ] and one in Italy.[11 ] Sample sizes ranged from 13 to 76,911 observations.
Studies of AI/ML credibility and acceptance explored various outcomes. These included
violent injury,[11 ] risk prediction, and colorectal cancer.[9 ] Studies also used a variety of predictor variables with two[9 ]
[10 ] using demographic and diagnostic data, one using case reports,[11 ] and one using endoscopic scans.[9 ] Studies used several different data science methods to address explainability for
these two primary outcomes. First, a hybrid model using semantic frames and long short-term
memory (LSTM) was used in NLP to extract concepts related to violent injury from notes.[11 ] Second, an adaptive-weighted method was used with a gradient-boosted classifier
(adaBoost) to better understand feature contribution in diagnosis classification.[10 ] Third, Choi et al[9 ] combined class activation and neural network learning to display concerning regions
that their computer-aided colorectal cancer diagnostic approach identified from colonoscopy
images.
AI/ML-based risk stratification tools can support clinicians in decision-making. This
is especially true for nurses who are expected to be the last check for most treatments
and interventions that patients receive.[12 ]
[13 ] Increased explainability or interpretability may provide nurses with the necessary
explanation that builds trust in AI/ML-based advice to support their expertise. However,
only one of the studies includes nurses in the research[10 ] as domain experts. We found studies that approached explainability from a more quantitative
analysis rather than a qualitative assessment. For example, these studies tried to
improve explainability by increasing interpretation of contributing predictors[10 ]
[11 ] and visually highlighting concerning anatomical areas for human confirmation.[9 ]
Burnout
In all, only two articles met inclusion criteria for the topic of burnout. One article
used applied system dynamics modeling[14 ] and the other employed an open trial design.[15 ] Although only two studies were able to be included for this topic, the data sources
were somewhat novel: one used synthetic data with an unreported sample size[14 ] and the other used mobile devices and sensors to collect primary data from 83 medical
students. Medical students[14 ] and nurses[15 ] were the subjects of the burnout studies. One study[15 ] was based in Portugal, and the other[14 ] used synthetic data not tied to any country (but authors were based in Canada).
The two studies approached burnout from different perspectives as follows: one focused
on the stress level of nurses,[15 ] directly measuring physical responses to stress through a smartphone, while the
other focused on burnout's effects on quality of care[14 ] using synthetic “clinician-generated” data. The methods were also very diverse,
with the study focusing on system dynamics modeling[14 ] following a more established mathematical approach, while the study that collected
sensor and smartphone data,[15 ] opted to predict stress levels with an ML model.
Even with the introduction of the quadruple aim in health care,[16 ] clinician well-being does not seem to be a primary focus in the data science literature.
It could be speculated that data relating to clinician burnout is not readily accessible.
Many of the study designs in this review are noted to be retrospective, meaning that
data had, at one point, already been collected. In the health care field, including
nursing, we do not see regular data collection about the clinician. Although a limited
sample, this literature shows that it is possible to analyze burnout in clinicians
using data science methods, but one of the challenges remains in how to facilitate
consistent data capture that is clinician centric. Multiple studies were noted to
address electronic health record (EHR) burden in this review but did not use data
science methods. Future use of data science may be helpful in furthering our understanding
of how to address burnout in health care professionals.
Complex Care (Outpatient)
We identified nine studies for the topic of complex care. Of these studies, five used
a retrospective cohort design.[17 ]
[18 ]
[19 ]
[20 ]
[21 ] Other study designs included a combination of a retrospective cohort and a prognostic
design built on an ML model,[22 ] a prognostic design,[23 ] a longitudinal analysis of a continuously recruited national cohort,[24 ] and a comparative design with a retrospectively identified cohort which was then
matched to a referent cohort from the general population.[25 ] A majority of studies used administrative database.[17 ]
[18 ]
[19 ]
[21 ]
[22 ] Two studies used EHR data,[20 ]
[23 ] while the remaining two studies used either data warehouse/registry from the National
Patient Register[25 ] or a questionnaire/survey.[24 ] Study populations included older adults with intellectual disability,[25 ] home health care,[19 ]
[20 ] complex care needs,[17 ]
[24 ] adults with cancer,[23 ] veterans with diabetes,[22 ] Medicare recipients with dementia,[18 ] and sepsis survivors.[21 ] Most studies were based in the United States, but other study locations included
Sweden,[25 ] Canada,[17 ] and New Zealand.[24 ] Average sample sizes were large, ranging from 7,936 to 275,190 observations.
Studies of complex care explored various outcomes. These included hospitalizations,[17 ]
[18 ]
[19 ] emergency department (ED) use,[17 ]
[19 ] mortality,[21 ]
[22 ]
[23 ] hospice use,[21 ] health care utilization,[20 ]
[25 ] and falls risk.[24 ] Studies used a wide range of predictor or explanatory variables, including home
health care agency characteristics,[19 ] continuity of primary and specialty physician care,[17 ] prognostic indices based on patient demographics, comorbidities, procedure codes,
laboratory values and anthropomorphic measurements, medication history, and previous
health service utilization,[22 ] patients' demographic characteristics, comorbidities,[21 ]
[25 ] clinical characteristics,[21 ] racial/ethnic disparities,[20 ] dementia severity,[18 ] and urinary and fecal incontinence.[24 ] Regression was a popular method being used in eight studies with a variety of approaches.
Six studies used either multivariable or multilevel regression model to find predictors
of home health care agency characteristics for hospitalization and emergency room
visits,[19 ] predictors of patients' demographic characteristics, comorbidities, and clinical
characteristics for mortality,[21 ] predictors of racial/ethnic disparities for health care utilization,[20 ] predictors of dementia severity for hospitalization,[18 ] associations of urinary and fecal incontinence with fall risks,[24 ] and associations of different diagnoses and specialist psychiatric health care utilization.[25 ] One study used a Cox's regression model to explore continuity of primary and specialty
physician care for hospitalization and emergency room visits.[17 ] A second study used a combination of regression and ML methods to select variables
associated with mortality risk and create prognostic indices for 5- and 10-year mortality.[22 ] A third study used a gradient-boosted ML algorithm to predict 180-day mortality
among outpatients with cancer.[23 ]
Outpatient complex care in reviewed studies often occurred in home health care in
the community setting as a continuation or transition of care from hospital settings.
Data science methods relied heavily on administrative databases and sometimes on EHR
data. Management of complex care requires comprehensive data sources and inputs of
health care teams, and it might obscure nursing specific data and render nursing specific
data not easily distinguishable. Outcomes and measures for complex care used to build
prediction models reflect the all-encompassing nature of addressing complex care management
that involves the whole health care team. The reviewed studies demonstrated a lack
of electronic data to represent nurses' presence and contributions in home health
care. There also appeared to be a lack of method diversity in building predictive
models or exploring associations of variables and outcomes related to outpatient complex
care.
Emergency Department Visits
For the topic of ED visits, we identified 14 studies. These studies used a retrospective
cohort design except for one[26 ] which used a prospective cohort design. A vast majority of studies used EHR data,
while two studies used administrative and claims as the primary dataset.[27 ]
[28 ] Study populations included adults in the ED,[26 ]
[27 ]
[28 ]
[29 ]
[30 ]
[31 ]
[32 ]
[33 ]
[34 ]
[35 ]
[36 ]
[37 ] home care patients,[38 ] and a mixture of adult and pediatric ED patients.[39 ] Most studies were based in the United States, but other study locations included
Hong Kong,[27 ] Germany,[32 ] Italy,[39 ] Portugal,[37 ] and South Korea.[34 ]
[35 ] Sample size ranged from 199 to 2,910,321 observations.
Outcomes addressed in studies included mortality,[29 ]
[30 ] future posttraumatic stress disorder (PTSD) sequelae,[26 ] the novel coronavirus disease 2019 (COVID-19) infection status,[31 ]
[35 ]
[39 ] ED wait time,[27 ] intensive care unit (ICU) admission from the ED,[37 ] need for head computed tomography (CT),[32 ] cardiac arrest,[34 ] stroke severity,[28 ] and ED utilization.[30 ]
[33 ]
[36 ]
[38 ] ML methods were used in several studies, including the use of logistic regression,
generalized linear models, neural network, and decision tree–based models to apply
statistical learning to the prediction of deterioration,[30 ]
[37 ] stroke severity,[28 ] COVID-19 diagnosis,[31 ]
[35 ]
[39 ] wait time,[27 ] and need for head CT,[32 ] while the other study used autoregressive integrated moving average to explore time-dependent
patient flow.[33 ]
[36 ] NLP was an alternative method used in four studies. Two used NLP to predict patient
deterioration,[37 ]
[38 ] one used NLP to extract concepts related to the need for a head CT,[32 ] and another identified stroke-related concepts from notes to aid in stroke scoring.[28 ]
The ED is a highly collaborative setting where the medical and nursing domains often
overlap. Most of the ED-specific AI/ML studies were related to both nurses and physicians,
except for one that predicted ICU admission[37 ] which is in the physician domain. Many studies have the potential to impact nurse's
future clinical practice. First, the study by Schultebraucks et al[26 ] prospectively creates risk prediction for the development of PTSD after ED visits.
This work may influence the discharge teaching that patients at high risk for PTSD
will receive from nurses. Second, three studies modeled ED utilization[33 ]
[36 ] and wait time,[27 ] including the innovative use of weather as a predictive factor. These studies promise
to solve the intractable problem of nurse surge (short-term) staffing where it is
difficult to understand who will be entering the ED for care. Third, the study by
Topaz et al[38 ] in the home care setting helps to risk stratify prehospital patients at high risk
for ED visits. This study may help EDs to forecast home care patients that will be
visiting the hospital. Fourth, nurses are increasingly being asked to collect patient's
socioeconomic status (SES) data in the hospital. Schuler et al.[30 ] used SES data in their modeling to improve health care utilization prediction. Finally,
2020 was the year of the COVID-19 pandemic, with ED's being impacted significantly.
There were fewer ED COVID-19 papers than expected, possibly because ED clinicians
have been too burdened by work demands to publish. However, three studies used data
science methods to help answer ED COVID-19 clinical questions: if computer vision
could be used to aid in diagnosing COVID-19-related pneumonia,[35 ] if EHR data predict COVID-19 absent laboratory test confirmation,[39 ] and if COVID-19 predicts routine blood tests.[31 ]
Falls
For the topic of falls, we identified 24 studies that met inclusion criteria. Of these
studies, eight used a retrospective cohort design[40 ]
[41 ]
[42 ]
[43 ]
[44 ]
[45 ]
[46 ]
[47 ]; seven used a prospective cohort design[48 ]
[49 ]
[50 ]
[51 ]
[52 ]
[53 ]
[54 ]; six were secondary analyses of research data obtained from prospective, retrospective,
and cross-sectional studies[55 ]
[56 ]
[57 ]
[58 ]
[59 ]
[60 ]; one used mixed methods wherein data from a public dataset were used in conjunction
with measurements collected from sensors[61 ]; and one was a meta-analysis of prospective cohort and observational studies.[62 ] Ten of the studies used health records as a source of data but in two of these studies,[44 ]
[47 ] it was not clear whether the records were electronic when they were obtained. Several
of the studies, including two of the secondary analyses, incorporated data from mobility
and gait sensors.[48 ]
[49 ]
[51 ]
[53 ]
[55 ]
[60 ]
[61 ] Registries and administrative datasets were used in eight studies,[40 ]
[41 ]
[42 ]
[43 ]
[45 ]
[46 ]
[50 ]
[56 ] while questionnaires or surveys were a source of data for four studies.[49 ]
[51 ]
[57 ]
[60 ] With the exception of one study that employed sensor data from 17-year-old persons,[55 ] all study participants were community dwelling, inpatient, and outpatient adults.
Adults with chronic diseases of all types were included, but three of the studies
included adults with specific conditions. The conditions were postpolio syndrome,[51 ] neurology, neurosurgery, hematology, oncology,[52 ] and neurology.[53 ] Most studies were conducted in the United States, but studies were also completed
in Italy,[49 ] England,[43 ] Japan,[44 ]
[51 ] Poland,[59 ] and South Korea.[52 ] The 11 studies included in the meta-analysis[62 ] were conducted in seven countries. Sample sizes ranged from 42 to 275,940 observations.
The outcome of falls was defined and measured in a variety of ways. In several studies,
the fall was self-reported,[49 ]
[51 ]
[57 ]
[60 ]
[63 ] but if it occurred while the participant was in an inpatient setting or being tracked
in an outpatient setting, it was often documented in medical records, registries,
or administrative databases used to track adverse events.[40 ]
[41 ]
[42 ]
[43 ]
[44 ]
[45 ]
[46 ]
[47 ]
[48 ]
[50 ]
[52 ]
[56 ] These differences in measuring the outcome are important, in that predictive models
may then be more or less accurate, merely because of the accuracy or inaccuracy of
the outcome measurement. The types of predictors across studies were quite consistent,
nevertheless. Age was a demographic predictor for all studies. Gender was tested but
not always a significant predictor. Diagnoses and/or symptoms of the participants
were tested as predictors in most of the studies.[40 ]
[41 ]
[42 ]
[43 ]
[44 ]
[45 ]
[46 ]
[47 ]
[49 ]
[50 ]
[51 ]
[52 ]
[54 ]
[56 ]
[57 ]
[58 ]
[59 ]
[60 ]
[62 ]
[63 ] Several categories of predictors were noteworthy, including strength, balance, and
gait test scores[40 ]
[46 ]
[47 ]
[48 ]
[49 ]
[50 ]
[51 ]
[52 ]
[53 ]
[54 ]
[55 ]
[56 ]
[57 ]
[59 ]
[60 ]
[61 ]
[62 ]
[63 ] and nutritional status.[42 ]
[56 ]
[59 ] In 15 studies, prediction models were developed and evaluated with regression models.[40 ]
[42 ]
[43 ]
[44 ]
[45 ]
[46 ]
[47 ]
[51 ]
[52 ]
[56 ]
[57 ]
[59 ]
[60 ]
[62 ]
[63 ] Data science methods also leveraged supervised and unsupervised ML methods, including
neural networks for developing risk prediction models, improving prediction of fall
risk, and automating selection of data from electronic records for use in fall risk
prediction algorithms.[41 ]
[45 ]
[46 ]
[48 ]
[49 ]
[50 ]
[52 ]
[53 ]
[54 ]
[55 ]
[58 ]
[61 ]
Data science studies included in this review appeared to reveal a step forward in
methods for predicting fall risk. Various activity monitors and robotics technology
are capable of creating large datasets of time series tracings that can be examined
for patterns suggesting motor movements or muscle weaknesses that predispose a person
to falls.[48 ]
[49 ]
[53 ]
[55 ]
[60 ]
[61 ] Preprocessing and analysis of such datasets present major challenges that are difficult
to manage using traditional statistical techniques and programs, but it is now possible
to use ML and other data science methods to determine the patterns in data that are
associated with the devastating outcome of falls. From what is observed in the studies
published in 2020, the use of devices and sensors is likely to increase in the future
exploration of factors that predict falls.
Healthcare-associated Infections
We identified 11 studies for the topic of health care–associated infections (HAIs).
Of these 11 studies (five used a retrospective cohort design,[64 ]
[65 ]
[66 ]
[67 ]
[68 ] three used an observational design,[69 ]
[70 ]
[71 ] two used a case-control design,[72 ]
[73 ] and one used a prospective cohort design) were included.[74 ] A vast majority of studies solely used EHR data, while two studies added to EHR
data with breath sensor data[73 ] and National Database of Nursing Quality Indicator (NDNQI) and Catheter Associated
Urinary Tract Infection (CAUTI) datasets.[71 ] One study used the National Institutes of Health (NIH) Gene Expression Omnibus data.[72 ] Most studies focused on adult inpatients, while three studies included adult surgical
patients[64 ]
[65 ]
[67 ] and one study focused on pediatric cardiology surgery patients.[70 ] Most studies were based in the United States but other study locations included
Taiwan,[73 ] Italy,[66 ] China,[70 ] France, and Switzerland.[74 ] Patients were the unit of analysis for most studies, while one study analyzed ICU
admissions,[71 ] one examined operative events,[64 ] and one focused on hospitalizations.[68 ] Samples sizes varied widely from study to study, ranging from 20 to 897,344 observations.
Studies explored various outcomes and included candidemia infection,[74 ] cardiac implantable device infections,[67 ] CAUTIs,[71 ]
Clostridium difficile infections (CDIs),[69 ] urinary tract infection (UTIs),[66 ]
[68 ] surgical site infections (SSIs),[64 ]
[65 ] and ventilator-associated pneumonia (VAP).[72 ]
[73 ] The majority of studies used some combination of demographic, diagnosis, laboratory,
vital sign, and/or medication data as predictor variables. Several studies used additional
unique predictors such as data on patient movement,[69 ] nurse staffing,[71 ] breath compounds,[73 ] and differentially expressed genes.[72 ] Several different data science methods were used. First, logistic regression was
used to predict HAI outcomes in three studies.[67 ]
[70 ]
[74 ] Many studies compared the predictive performance of various supervised ML models
including support vector machines,[66 ]
[71 ]
[73 ] neural networks,[66 ]
[68 ]
[73 ] decision trees,[68 ]
[71 ]
[73 ] and/or random forest models.[72 ]
[73 ] Two studies used multilayer perceptrons,[65 ]
[72 ] one study used naïve Bayes' classification[73 ] and one study conducted social network analysis.[69 ] NLP was used in one study to extract data from clinical notes and operative reports
for surveillance of SSIs,[64 ] and another study used text mining of clinical notes to inform model development
and case identification.[67 ]
Of the studies using data science techniques to examine or predict HAIs, two specifically
addressed nursing implications and included nurse authors.[68 ]
[71 ] Park et al demonstrated a knowledge discovery and data mining approach and aimed
to describe techniques that could be used to further nursing practice and guide nursing
professionals in the use of data science methods. Zachariah et al[68 ] described the benefit of risk stratification systems in relieving the burden on
nurses to complete and document traditional risk assessment forms. While Mancini et
al[66 ] do not specifically name nurses as a target audience, they do describe their data-science-as-a-service
system as an online, user friendly platform that can help domain experts, such as
clinicians and validate simple predictive models. From these studies, nursing administrators
may gain valuable insights on the role of intrahospital transfers on HAIs to inform
patient-placement strategies[69 ] and the use of predictive risk models in dressing type selection to prevent SSIs
and the estimated cost savings.[65 ] For nurses interested in exemplars of data visualization techniques, the publication
by Cai et al[72 ] showcases some impressive data visualizations.
Health Care Utilization and Costs
For the topic of health care utilization and costs, 24 articles were included in this
review. Most were retrospective cohort studies,[75 ]
[76 ]
[77 ]
[78 ]
[79 ]
[80 ]
[81 ]
[82 ]
[83 ]
[84 ]
[85 ]
[86 ]
[87 ] while six used prospective cohort studies,[88 ]
[89 ]
[90 ]
[91 ]
[92 ]
[93 ] four used a cross-sectional design,[94 ]
[95 ]
[96 ]
[97 ] and one used a survey for primary data collection.[98 ] Most studies used the EHR and administrative databases to collect data but three
used surveys,[87 ]
[96 ]
[98 ] two used public datasets,[78 ]
[95 ] one used mobile phone data,[97 ] one used images,[90 ] and one used data from social media.[92 ] All studies were adult based with the exception of one study examining families.[98 ] Most studies were based in the United States, with the exception of three from Singapore,[83 ]
[85 ]
[93 ] one from China,[80 ] one from Brazil,[77 ] one from Italy,[88 ] one from Canada,[90 ] and one from the United Kingdom.[87 ] Sample sizes ranged from 190 to 780,295 observations.
While most studies focused on cost and included some form of cost analysis, several
studies examined behaviors related to costs such as predicting health care utilization,[83 ] quantifying reliance on health care services,[84 ] verifying complete surgical removal of tumors,[90 ] and predicting no-shows.[93 ] As expected, many studies incorporated costs and insurance status as predictor variables
but patient-reported variables were also common. Most studies used supervised ML.
Unsupervised learning and linear models were common too, and often, multiple models
were compared in search for the most accurate. One study conducted geospatial analysis.[98 ]
As would be expected, many of the articles included in this review used data science
methods to predict cost. Such information would be helpful to hospital administration,
but this does not necessarily pertain directly to nursing practice. Instead, nursing
may focus its efforts on developing interventions to increase adherence to care. One
example may be following-up with patients who are predicted to have a high risk of
missing an important magnetic resonance imaging (MRI). While this information would
be important for executives to know and potentially avoid loss of revenue, nursing
can use this as an opportunity to support continuity of care.
Hospitalization
We identified 21 studies for the topic of hospitalization. Of these studies, 13 used
a retrospective cohort design,[68 ]
[86 ]
[99 ]
[100 ]
[101 ]
[102 ]
[103 ]
[104 ]
[105 ]
[106 ]
[107 ]
[108 ]
[109 ] 1 used an observational design,[110 ] 1 used a cross-sectional design,[111 ] 2 adopted a prognostic approach,[30 ]
[112 ] 2 performed a longitudinal analysis,[113 ]
[114 ] and 2 used survey data.[115 ]
[116 ] A vast majority of studies used EHR data, while the remaining eight studies used
administrative databases[100 ]
[106 ]
[107 ]
[109 ]
[111 ]
[114 ] or surveys as the primary collection tool.[115 ]
[116 ] Study populations included adults[30 ] with chronic illnesses,[100 ]
[105 ]
[107 ]
[114 ] pediatrics,[86 ]
[106 ] veterans,[116 ] COVID-19 patients,[99 ]
[101 ]
[102 ]
[103 ]
[108 ]
[110 ]
[113 ] hospice patients,[104 ] inpatients,[68 ]
[112 ] and Medicare recipients.[109 ]
[111 ]
[115 ] All studies were based in the United States. Sample sizes ranged from 207 to 3,100,000
observations.
Studies of hospitalization explored various areas. These included hospitalization,[101 ]
[107 ]
[108 ] hospital readmissions,[110 ] hospitalization rates,[30 ]
[105 ]
[106 ]
[114 ] hospitalization risks,[103 ]
[113 ]
[116 ] health care utilization,[86 ]
[109 ] level-of-care requirements,[102 ] medication orders,[112 ] risk of urinary tract infections (UTI) during hospitalization,[68 ] risk for critical COVID-19,[99 ] risk of live discharge,[104 ] ischemic strokes,[111 ] recovery of function following hospitalization,[115 ] and the Functional Independence Measure (FIM) instrument score.[100 ] Interestingly, most studies used a cluster of various characteristics as predictor
variables including demographics,[30 ]
[100 ]
[103 ]
[105 ]
[106 ]
[108 ]
[110 ]
[111 ]
[113 ]
[116 ] sociodemographic,[106 ]
[110 ]
[113 ] or neighborhood SES[30 ] or neighborhood level characteristics,[104 ] patient level characteristics,[104 ] clinical data,[30 ]
[68 ]
[99 ]
[100 ]
[101 ]
[102 ]
[103 ]
[105 ]
[108 ]
[110 ]
[113 ]
[116 ] medication data,[112 ] social determinants of health (SDH) data,[86 ]
[111 ] administrative data,[100 ] claims data,[30 ] patient reported outcomes,[108 ]
[116 ] geriatric syndrome risk factors,[109 ] air quality,[106 ] and cost trajectories.[115 ] Two studies used a single variable, either body mass index[107 ] or food swamp severity,[114 ] as their predictor variable. Regression was applied in most of the studies. The
remaining studies used ML,[30 ]
[68 ]
[86 ]
[99 ]
[102 ]
[112 ] NLP,[102 ] and geospatial coding.[110 ]
[114 ] Studies used ML to build a prediction model of clinical data for risk for critical
COVID-19,[99 ] level-of-care requirements,[102 ] and risk of UTI during hospitalization,[68 ] SDH data for health care utilization,[86 ] and EHR data for medication orders.[112 ]
The reviewed studies demonstrated a broad range of foci, from unique patient populations
and conditions to health care management and utilization. Data science methods employed
in these studies incorporated mostly EHR data sources in addition to administrative
databases and occasionally survey data. Study outcomes and variables were often a
cluster of characteristics that branched into the administrative and clinical domains
and occasionally neighborhood and community level of characteristics. Nursing specific
data were embedded and not easily distinguished. There appears to be continued needs
for nursing specific considerations in studies related to hospitalization using data
science methods. However, many outcomes and variables have great implications for
nursing care because nursing plays a critical part in health care teams.
In-Hospital Mortality
We identified 59 studies for the topic of in-hospital mortality.[117 ]
[118 ]
[119 ]
[120 ]
[121 ]
[122 ]
[123 ]
[124 ]
[125 ]
[126 ]
[127 ]
[128 ]
[129 ]
[130 ]
[131 ]
[132 ]
[133 ]
[134 ]
[135 ]
[136 ]
[137 ]
[138 ]
[139 ]
[140 ]
[141 ]
[142 ]
[143 ]
[144 ]
[145 ]
[146 ]
[147 ]
[148 ]
[149 ]
[150 ]
[151 ]
[152 ]
[153 ]
[154 ]
[155 ]
[156 ]
[157 ]
[158 ]
[159 ]
[160 ]
[161 ]
[162 ]
[163 ]
[164 ]
[165 ]
[166 ]
[167 ]
[168 ]
[169 ]
[170 ]
[171 ]
[172 ]
[173 ]
[174 ]
[175 ] While the majority of studies used a retrospective cohort design, 11 used a prospective
approach,[120 ]
[122 ]
[129 ]
[140 ]
[143 ]
[166 ]
[169 ]
[170 ]
[172 ]
[174 ]
[175 ] and two were meta-analysis studies.[154 ]
[157 ] The majority of studies used EHR data, while 16 studies used some kind of registry
data,[117 ]
[118 ]
[121 ]
[127 ]
[128 ]
[131 ]
[132 ]
[138 ]
[140 ]
[141 ]
[142 ]
[150 ]
[153 ]
[155 ]
[156 ]
[161 ]
[174 ] four studies used questionaries/surveys,[166 ]
[169 ]
[170 ]
[172 ] and two studies used administrative data.[134 ]
[167 ] With the emergence of public databases containing COVID-19 data, since 2020, many
studies used these databases for their studies.[120 ]
[123 ]
[128 ]
[131 ]
[140 ]
[142 ]
[143 ]
[146 ]
[169 ] Study populations primarily comprised adults (sometimes limited to subpopulations
such as those with chronic illnesses [e.g., Takada et al[164 ] and Sukmark et al[163 ]]), but three studies included pediatric populations,[138 ]
[154 ]
[161 ] one study included newborns,[150 ] and two studies included elder patients aged over 65 years.[121 ]
[155 ] Sample sizes ranged from 15 to 9,000,000 observations.
To predict in-hospital mortality, studies used several methodological approaches.
The majority of studies used a regression model including Cox's proportional hazards[131 ]
[167 ] and mixed effect models.[141 ]
[158 ] More contemporary techniques included neural networks,[117 ]
[118 ]
[125 ]
[126 ]
[127 ]
[134 ]
[139 ]
[142 ]
[165 ]
[171 ]
[173 ] random forests,[124 ]
[126 ]
[133 ]
[135 ]
[139 ]
[142 ]
[144 ] gradient boosting,[127 ]
[128 ]
[129 ]
[130 ]
[134 ]
[135 ]
[139 ]
[140 ]
[144 ]
[168 ] and NLP.[127 ] Four studies[137 ]
[149 ]
[159 ]
[173 ] leveraged unsupervised methods, with or without supervised methods. Almost all studies
performed some level of validation, such as bootstrapping, cross-validation, or a
hold-out approach. A variety of predictors were used as input for these models. Almost
all studies included demographics and medical diagnosis. The majority of the studies
also used medications and some sort of diagnostic techniques (e.g., laboratory values,
images, vital signs, or surgical data). Some studies used COVID-19-specific data.[120 ]
[123 ]
[128 ]
[131 ]
[140 ]
[142 ]
[143 ]
[146 ]
[169 ] Some studies used clinical notes,[127 ]
[145 ]
[154 ]
[170 ]
[173 ] and two studies used socioeconomic data.[126 ]
[167 ] The inclusion of a variety of variables is possible as a result of a large sample
size for the majority of the studies.
Notable aspects of the in-hospital mortality literature include the use of frailty
as a predictor in two studies, either as a way of predicting mortality or as a better
clinical measure for symptom representation,[160 ]
[162 ] as well as the use of portable lung ultrasound findings as predictors.[122 ] Although many studies included vital signs which are often collected by nurses,
there were no studies evaluating how other aspects of nursing care delivery can predict
in-hospital mortality. The use of publicly available datasets (e.g., Awad et al,[124 ] Baxter et al,[127 ] and Kong et al[144 ]) facilitates reproducibility and allows future investigators to explore additional
data science methods, including the use of novel predictors, such as innovative features
generated from text data. Notably, there were limited pediatric/neonatal population
studies and limited inclusion of socioeconomic predictors which could be opportunities
for future research.
Length of Stay
We identified 26 studies regarding the prediction of the hospital length of stay that
used data science methods. Twenty-three studies used a retrospective cohort design,[126 ]
[133 ]
[159 ]
[173 ]
[176 ]
[177 ]
[178 ]
[179 ]
[180 ]
[181 ]
[182 ]
[183 ]
[184 ]
[185 ]
[186 ]
[187 ]
[188 ]
[189 ]
[190 ]
[191 ]
[192 ]
[193 ]
[194 ] while three were prospective cohort studies.[129 ]
[195 ]
[196 ] Data sources mostly used administrative databases[126 ]
[133 ]
[179 ]
[180 ]
[182 ]
[186 ]
[191 ]
[192 ]
[194 ] and EHRs,[129 ]
[133 ]
[176 ]
[183 ]
[184 ]
[188 ]
[190 ]
[192 ]
[196 ] while other studies used publicly available datasets,[159 ]
[173 ]
[178 ]
[187 ]
[189 ] data warehouses and registries,[133 ]
[177 ]
[180 ]
[195 ] paper clinical notes,[193 ] paper patient records,[185 ] research electronic data capture systems,[188 ] trial datasets,[181 ] questionnaires,[196 ] and routine bedside monitors.[176 ] Sample sizes ranged from 143 to 2,997,249 patients. Study populations included surgical
patients,[133 ]
[159 ]
[177 ]
[179 ]
[181 ]
[182 ]
[183 ]
[195 ]
[196 ] ICU patients,[173 ]
[176 ]
[178 ]
[187 ]
[189 ]
[190 ] medical-surgical patients,[126 ]
[129 ]
[180 ]
[191 ] patients presenting to the ED,[184 ]
[188 ]
[193 ]
[194 ] and psychiatric patients.[185 ]
[186 ]
[192 ] Most studies were conducted using U.S. patient data,[129 ]
[133 ]
[159 ]
[173 ]
[176 ]
[177 ]
[178 ]
[181 ]
[182 ]
[183 ]
[187 ]
[189 ]
[191 ] while other studies used patient data from Australia,[126 ]
[179 ]
[193 ]
[194 ] Brazil,[186 ]
[188 ] Canada,[195 ]
[196 ] China,[180 ] England,[190 ] Germany,[192 ] Switzerland,[185 ] and Taiwan.[184 ]
Studies about length of stay also investigated other outcomes, such as mortality,[126 ]
[129 ]
[133 ]
[159 ]
[173 ]
[178 ]
[181 ]
[187 ]
[188 ]
[196 ] clinical and functional complications (e.g., surgical, respiratory complications,
or disability),[126 ]
[133 ]
[159 ]
[183 ]
[192 ]
[196 ] readmission,[129 ]
[173 ]
[182 ] discharge destination,[126 ]
[183 ]
[193 ] patient-reported outcome measures,[182 ] patient phenotyping,[178 ] and hospital admission.[188 ] Demographic data were used as a predictor variable in all studies, while another
common predictor was medical diagnosis.[126 ]
[133 ]
[159 ]
[177 ]
[178 ]
[181 ]
[182 ]
[183 ]
[186 ]
[188 ]
[190 ]
[191 ]
[192 ]
[195 ] Other predictors used clinical data,[126 ]
[176 ]
[177 ]
[178 ]
[180 ]
[181 ]
[184 ]
[185 ]
[187 ]
[190 ]
[194 ]
[195 ] laboratory tests,[126 ]
[129 ]
[133 ]
[173 ]
[178 ]
[181 ]
[182 ]
[187 ]
[189 ]
[190 ] vital signs,[126 ]
[129 ]
[133 ]
[173 ]
[176 ]
[178 ]
[184 ]
[187 ]
[189 ] hospitalization data (e.g., admission/discharge data and hospital characteristics),[126 ]
[129 ]
[133 ]
[159 ]
[186 ]
[191 ]
[192 ]
[194 ] surgery data,[133 ]
[177 ]
[179 ]
[181 ]
[182 ]
[195 ]
[196 ] anthropometric data,[159 ]
[177 ]
[178 ]
[181 ]
[184 ]
[187 ]
[195 ] scales/instruments,[126 ]
[180 ]
[188 ]
[192 ]
[196 ] social data,[126 ]
[181 ]
[185 ]
[186 ]
[195 ] medications,[129 ]
[133 ]
[177 ]
[183 ] insurance status/type,[133 ]
[179 ]
[180 ]
[191 ] clinical notes,[173 ]
[184 ]
[193 ] services used,[183 ]
[186 ]
[194 ] and data collected by nurses using the International Classification of Functioning,
Disability and Health (ICF).[180 ] Studies used supervised ML algorithms,[126 ]
[129 ]
[133 ]
[159 ]
[176 ]
[177 ]
[181 ]
[183 ]
[185 ]
[187 ]
[189 ]
[192 ]
[193 ]
[194 ] generalized linear models,[178 ]
[180 ]
[182 ]
[184 ]
[186 ]
[188 ]
[190 ]
[191 ]
[192 ]
[195 ]
[196 ] deep learning models,[126 ]
[173 ]
[178 ]
[179 ]
[181 ]
[187 ]
[189 ]
[193 ] as well as unsupervised ML algorithms,[176 ]
[186 ]
[187 ] and NLP[184 ]
[193 ]. Among the supervised ML methods, random forest was one of the most used classification
algorithms.[126 ]
[133 ]
[159 ]
[176 ]
[177 ]
[181 ]
[183 ]
[189 ]
[193 ] Deep learning architectures, such as neural networks, were applied in studies with
a large amount of data. Unsupervised algorithms used clustering methods to mine datasets
and find patient data features to be used for predicting length of stay. NLP was used
to extract data from clinical notes for predicting length of stay and discharge destinations.
Interestingly, more than one data science method was used in some studies. For example,
in one study,[187 ] supervised, unsupervised, and deep learning algorithms were applied to develop a
predictive model for determining length of stay. In another study,[193 ] supervised, deep learning algorithms, and NLP were used to predict length of stay
and discharge destination.
Future prospective studies are needed for external validation of the models developed.
Unstructured data (e.g., clinical notes) and structured data (e.g., administrative
data) have commonly been used in the studies. However, we did not find any study that
used a combination of both. Further studies are required to incorporate these two
types of data in the same prediction model because patient information is typically
found in unstructured and structured data. Nursing-generated data were mentioned only
in two studies with nursing notes and assessment data using a nonmedical classification
(i.e., the ICF). Nurses represent the largest health care profession worldwide and
the profession that generates the most data about the patient condition; therefore,
failing to use these nursing-generated data could become a significant issue. Further
studies should use both unstructured and structured nursing-generated data (e.g.,
standard nursing terminologies) jointly with the commonly used predictors to build
prediction models.
Pain
Out of the total of 27 studies that were identified for the topic of pain, 14 used
a prospective cohort design,[197 ]
[198 ]
[199 ]
[200 ]
[201 ]
[202 ]
[203 ]
[204 ]
[205 ]
[206 ]
[207 ]
[208 ]
[209 ]
[210 ] 11 used an observational design,[200 ]
[204 ]
[205 ]
[207 ]
[210 ]
[211 ]
[212 ]
[213 ]
[214 ]
[215 ]
[216 ] 6 used a retrospective cohort design,[211 ]
[214 ]
[215 ]
[217 ]
[218 ]
[219 ] 4 used a randomized control trial,[201 ]
[212 ]
[220 ]
[221 ] 1 used a cross-sectional design,[222 ] and 1 used mixed methods.[223 ] Most studies used questionnaire/survey data, but eight used administrative databases,[206 ]
[207 ]
[208 ]
[210 ]
[212 ]
[220 ]
[221 ]
[222 ] seven used mobile devices/sensors,[200 ]
[203 ]
[204 ]
[205 ]
[210 ]
[216 ]
[220 ] and four used a data warehouse or registry.[198 ]
[203 ]
[208 ]
[214 ] Study populations were mostly done with adults in the outpatient setting but four
were inpatient[197 ]
[201 ]
[211 ]
[223 ] and one was done with a pediatric population.[205 ] Although many studies were conducted in the United States, others included China,[213 ]
[214 ]
[215 ]
[222 ] Australia,[207 ] Canada,[202 ] the Netherlands,[199 ]
[212 ] Germany,[208 ]
[210 ]
[211 ] Norway,[201 ] Finland,[204 ] South Korea,[203 ] Argentina,[219 ] Portugal,[197 ] Japan,[209 ] and Spain.[206 ] Sample sizes ranged from 10 to 6,316 observations.
Studies explored various outcomes including surgical applications such as determination
of postsurgical measures based on residual pain,[197 ] predicting patellofemoral pain 1 year after intervention,[201 ] predicting neuropathic pain,[202 ] predicting chronic pain of 7 to 10 years into the future,[217 ] predicting complex regional pain syndrome,[207 ] predicting pain relief for knee osteoarthritis patients,[209 ] detection of pain,[210 ]
[214 ]
[216 ]
[222 ] and pain intensity estimation/classification.[205 ]
[213 ]
[215 ]
[220 ] Other outcomes focused on pain as a predictor of anxiety and depression, coronary
heart disease,[199 ] health status,[218 ] noncancer pain as predictor of brain aging,[208 ] and length of stay.[211 ] For patients with low back pain, societal cost,[212 ] and clinical and sociodemographic predictors of increased disability[221 ] have been studied. Some outcomes focused on the data science method as a clinical
tool such as NLP of pain context from clinical notes.[219 ]
[223 ] There were several novel data sources included, such as the use of physiologic signals
from electroencephalograms (EEG),[213 ] electromyography,[204 ]
[220 ] spectrogram,[205 ] electrodermal activity,[216 ] sensor data,[200 ] MRIs,[198 ]
[208 ]
[214 ]
[222 ] kinematics/motion data,[203 ]
[210 ]
[220 ] and medical images.[197 ]
[198 ]
[208 ]
[210 ]
[214 ]
[215 ]
[222 ]
Many of these studies have significant impact on nursing, most notably in situations
where pain cannot be feasibly assessed (e.g., patients who are nonverbal). The ability
to use data science methods for analyzing facial expressions, medical images, vital
signs, and other biomechanical data could augment existing conventional methods in
classifying and quantifying pain experience. Using EEG and electromyography data have
high potential for improving pain assessment. Leveraging ML on geospatial and kinematic
data can provide benefits not just for nursing assessment but also in other medical/health
disciplines.
Patient Safety
We identified seven studies exploring patient safety. The majority of studies were
retrospective cohort designs.[52 ]
[224 ]
[225 ]
[226 ]
[227 ] Two used cross-sectional designs.[228 ]
[229 ] Four studies used patient safety or incident reports as primary data for analysis,[224 ]
[227 ]
[228 ]
[229 ] two used EHR data,[52 ]
[226 ] and one used a publicly available dataset.[225 ] Study populations primarily consisted of adult inpatients who had an event or safety
report submitted during their inpatient stay.[52 ]
[226 ]
[227 ]
[228 ]
[229 ] Studies were based in the United States,[224 ]
[225 ]
[226 ]
[229 ] China,[227 ] Korea,[52 ] and the United Kingdom.[228 ] Sample sizes ranged from 348 to 1,740,770 observations.
Studies explored various outcomes, including predicting allergic reactions,[229 ] classifying medication incidents,[227 ] identifying falls incidents from event reports,[52 ]
[226 ] identifying drug-to-drug interactions,[225 ] and classifying the contents of safety reports.[224 ]
[228 ] Data science methods included NLP,[226 ] deep neural networks,[227 ]
[229 ] support vector machines,[225 ]
[228 ] logistic regression,[52 ] and naïve Bayes' classification.[224 ]
Maintaining patient safety in the inpatient setting requires a high level of diligence
and oversight by members of the health care team and primarily rests with nurses who
provide the majority of care while patients are hospitalized. Patient safety studies
using data science methods could advance the health care team's ability to intervene
before events occur or improve the efficiency and accuracy in the classification of
patient safety events, so that improvement activities are more focused. While studies
of patient safety and the reporting of patient safety events are directly related
to the daily work of nurses and their diligence at the bedside, only one study was
led by a nurse.[52 ] Two other studies included one nurse in the study team.[224 ]
[226 ]
Pressure Injuries
We identified 13 studies for the topic of pressure injuries (PIs). Of these 13 studies,
7 studies used a retrospective cohort design,[230 ]
[231 ]
[232 ]
[233 ]
[234 ]
[235 ]
[236 ] 3 used a prospective cohort design,[237 ]
[238 ]
[239 ] 1 used a clinical trial,[240 ] 1 used a cross-sectional design,[241 ] and 1 used secondary data analysis.[242 ] A variety of data sources were used for the studies, including EHR data,[233 ]
[234 ]
[238 ]
[239 ] data warehouses,[230 ]
[231 ]
[235 ]
[236 ] a publicly available dataset,[232 ] sensor data,[240 ]
[242 ] and surveys as the primary collection tool.[237 ] The samples across studies were adult patients admitted in hospitals,[230 ]
[231 ]
[232 ]
[234 ]
[236 ]
[238 ]
[239 ] adults in residential hospices,[237 ] elderly patients in nursing homes (NHs),[241 ] Medicare beneficiaries,[235 ] and adults (unspecified).[240 ]
[242 ] Six studies were based in the United States,[230 ]
[231 ]
[232 ]
[234 ]
[235 ]
[236 ] with other study locations including Brazil,[239 ] Canada,[240 ] France,[242 ] Indonesia,[241 ] Italy,[237 ] South Korea,[238 ] and Taiwan.[233 ] Sample sizes ranged from 12 to 2,091,058 observations.
Most studies used the incidence rate of PIs as the outcome, except for one study[234 ] that projected PI closure and two studies[240 ]
[242 ] that explored PI images. Various data science methods were used to detect or predict
PIs including logistic regression,[230 ]
[231 ]
[232 ]
[234 ]
[239 ] generalized estimating equations,[237 ] multiple regression,[235 ] path analysis,[241 ] supervised ML,[233 ]
[236 ]
[238 ] and imaging processing.[240 ]
[242 ] The predominant predictor variables used across studies included demographics and
diagnoses,[231 ]
[232 ]
[233 ]
[234 ]
[235 ]
[236 ]
[237 ]
[239 ] followed by clinical assessment data,[231 ]
[233 ]
[235 ]
[236 ]
[237 ]
[238 ]
[239 ]
[241 ] Braden's scale,[231 ]
[232 ]
[236 ]
[237 ]
[239 ] laboratory tests,[233 ]
[234 ]
[236 ]
[238 ]
[239 ] and medications.[232 ]
[236 ]
[239 ] Two studies used organizational factors such as nursing unit characteristics, nurse
job satisfaction, facility types, or rural/urban hospital location.[230 ]
[235 ]
The prevention and management of PIs remains a challenge. The prediction models developed
in these studies can help nurses screen high-risk groups and manage risk factors of
PIs. The predictive models could create a monitoring system that provides real-time
warnings of PI onset or worsening trajectory to nurses and other health care providers
and prompt them to personalize PI prevention interventions. The use of bed sheet sensors
through PI classification or prediction modeling could develop an automated feedback
system with body pressure mapping and consequently, changing posture or redistributing
pressure, which would allow remote monitoring.[240 ] Repositioning in bed could be rescheduled or individualized according to patient
conditions. Also, the study by Baernholdt et al[230 ] on the predictive impact of organizational factors on PI rates suggests that hospitals
should focus on organizational structures to improve nurses' work environments and
workflow, so that nurses can enhance PI interventions. Although these predictive models
are promising, the generalizability and overfitting possibility need to be carefully
considered due to the high heterogeneity of samples across studies and the small sample
sizes in some studies. Further validation studies of such risk prediction models are
needed.
Readmissions
We identified nine studies for the topic of readmissions. Of these nine studies, seven
used a retrospective cohort design[243 ]
[244 ]
[245 ]
[246 ]
[247 ]
[248 ]
[249 ] and two used a prospective cohort design.[244 ]
[246 ] Seven studies primarily used EHR data[243 ]
[244 ]
[245 ]
[246 ]
[247 ]
[248 ]
[250 ] stored in a data warehouse of the affiliated facility,[243 ]
[246 ]
[248 ]
[249 ]
[250 ] with one in combination with other data sources that included mobile device sensor
data[244 ] and one with a governmental administrative database (Medicare).[245 ] Study populations included adults in hospital intensive care,[173 ]
[250 ] those hospitalized with medical conditions,[243 ] those having had cardiac surgery in a progressive care unit (PCU),[244 ] and those who utilized Medicare services.[245 ] Two studies focused on Medicare patient data,[245 ]
[249 ] and one study of Medicare patients included encounter information from a nonhospital
setting (i.e., inpatient rehabilitation, skilled nursing, and home health services).[245 ] Sample sizes ranged from 100 patients[244 ] to over 1 million patients.[249 ] Data in each study were collected from health care facilities in the United States.
Risk prediction outcomes in each study included acute care readmissions occurring
at 7, 30, or 90 days of hospital discharge. One study looked at readmission back to
the ICU.[250 ] In addition to acute care readmissions, some data were used to predict length of
stay of postoperative cardiac patients,[244 ] hospital or 180 mortality,[246 ] and elective surgery mortality at 30 and 90 days.[249 ] Studies generally included predictor variables comprising length of stay, gender,
number of recent admissions, age, surgical procedure, admission location (e.g., ED,
clinic, and physician referral), insurance type, diagnosis, procedures, medication,
vital signs, and comorbidities. Methods used to predict readmissions included ML,[173 ]
[243 ]
[244 ]
[246 ]
[247 ]
[248 ]
[249 ]
[250 ] NLP,[173 ]
[246 ]
[248 ]
[249 ]
[250 ] general linear regression,[173 ]
[243 ]
[247 ]
[248 ] a combination of statistical modeling and ML,[245 ] and combined structured and unstructured data neural network.[173 ] Interestingly, Saleh et al[248 ] used an existing 30-day prediction model to compare strengths of predictors in 7-day
readmissions. Only one study focused on social determinants of health that may be
predictors in readmissions.[247 ]
The importance of hospital patient readmissions in a 30-day (or less) time interval
is viewed as a quality metric by the Medicare program and other insurers. Reimbursement
changes are occurring in government programs that incentivize hospitals for quality
and penalize hospitals if quality metrics are not maintained. Nurses have a role in
assessment, planning, and implementation of an accurate discharge plan that can help
identify patients most at risk for readmission due to health condition, comorbidities,
or other risk factors. ML, NLP, and predictive modeling with EHR data can provide
valuable information to assist in risk identification of importance to nursing care
and discharge planning. As structured and unstructured data in the health record can
be combined through the design of multimodal architecture to support understanding
of risk reduction, nurses can use these data in the care of at-risk populations.
Staffing
We identified four studies for the topic of staffing. One study used a prospective
cohort design[251 ] and the remaining three used a retrospective cohort design.[252 ]
[253 ]
[254 ] Three of these studies were conducted in the United States[252 ]
[253 ]
[254 ] with one study[251 ] using data from a single ICU in an Italian medical center. All studies used a combination
of EHR data plus administrative or systems data.[251 ]
[252 ]
[253 ]
[254 ] Study populations varied with the adult medical-surgical population used in two
studies,[251 ]
[253 ] a NH population in one[252 ] and one study used a pediatric population.[254 ] Scheduling or workload studies were not discovered in the search. Sample sizes ranged
from 148 to 30,679 observations.
Operational outcomes comprised (pediatric) readmissions,[254 ] the prediction of adverse events,[251 ] leaving (ED) without being seen,[253 ] and infection risk.[252 ] Unadjusted logistic regression was used to evaluate each response on the tool (insurance
type, home medical equipment, home nursing, home therapy, and others) with weighted
scores assigned to each category. In attempting to determine if a tool (the Patient
Acuity and Complexity Score) developed for their study of the prediction of adverse
events, the Sanson research group (2020) sought to discriminate between patients having
experienced/not experienced a serious event in the discharged unit after intensive
care was received. In a study of NH quality,[252 ] tree-based gradient-boosting algorithms were used to evaluate the risk of COVID-19
infection (the presence of at least one confirmed COVID-19 resident in the NH). A
logistic regression model and two-layer feedforward neural network were also developed
using the identified stable predictors (including the number of care personnel/1,000
feet) to serve as benchmark predictive models for comparison.
Interestingly, only one study reported a traditional measure of staffing,[252 ] the number of patients per nurse. A new variable, leaving without being seen,[253 ] could spark further interest in the layered relationships of systems/administrative
data when coupled with what is traditionally termed “clinical data,” particularly
when clear administrative implications emerge as is the case in this study (administrative
actions on ED process variables, e.g., wait times). The collection of data in 1-hour
increments[253 ] could also prove a necessary improvement in studies with administrative variables
(e.g., door-to-provider time), yet will demand further methodological scrutiny if
the variability of certain hourly measures (number of persons in waiting room) outdistances
that of nurse or other provider variables known to impact outcomes.
Unit Culture
Only one study explored a unit cultural element using data science methods. This study
used the Hospital Survey of Patient Safety Culture to predict if a patient safety
event would be voluntarily reported.[255 ] This study was conducted in the United States with a sample size of 526,645 survey
responses.
The study included regression techniques to validate that many of elements of patient
safety culture influence the possibility that a patient safety event would be voluntarily
reported. Some examples of these elements include communication openness, teamwork,
staffing, and hand-offs and transitions. Outcomes explored in this study included
frequency of events reported, near-miss events, no potential for harm events, and
potential for harm events.
The study included in this review explored how a culture of patient safety influenced
voluntary reporting of patient safety events. While an argument could be made that
this article may be better suited in the patient safety category, we included this
as a unit cultural element due to the impact unit level dynamics have on creating
a patient safety culture. More exploration is needed on unit culture using data science
methods that could help explain and explore those behaviors from leaders and nurses
that promote positive cultures on patient units.