Summary
Background: Because 5% of patients incur 50% of healthcare expenses, population health managers
need to be able to focus preventive and longitudinal care on those patients who are
at highest risk of increased utilization. Predictive analytics can be used to identify
these patients and to better manage their care. Data mining permits the development
of models that surpass the size restrictions of traditional statistical methods and
take advantage of the rich data available in the electronic health record (EHR), without
limiting predictions to specific chronic conditions.
Objective: The objective was to demonstrate the usefulness of unrestricted EHR data for predictive
analytics in managed healthcare.
Methods: In a population of 9,568 Medicare and Medicaid beneficiaries, patients in the highest
5% of charges were compared to equal numbers of patients with the lowest charges.
Contrast mining was used to discover the combinations of clinical attributes frequently
associated with high utilization and infrequently associated with low utilization.
The attributes found in these combinations were then tested by multiple logistic regression,
and the discrimination of the model was evaluated by the c-statistic.
Results: Of 19,014 potential EHR patient attributes, 67 were found in combinations frequently
associated with high utilization, but not with low utilization (support>20%). Eleven
of these attributes were significantly associated with high utilization (p<0.05).
A prediction model composed of these eleven attributes had a discrimination of 84%.
Conclusions: EHR mining reduced an unusably high number of patient attributes to a manageable
set of potential healthcare utilization predictors, without conjecturing on which
attributes would be useful. Treating these results as hypotheses to be tested by conventional
methods yielded a highly accurate predictive model. This novel, two-step methodology
can assist population health managers to focus preventive and longitudinal care on
those patients who are at highest risk for increased utilization.
Citation: Sheets L, Petroski GF, Zhuang Y, Phinney MA, Ge B, Parker JC, Shyu C-R. Combining
contrast mining with logistic regression to predict healthcare Appl Clin Inform 2017;
8: 430–446 https://doi.org/10.4338/ACI-2016-05-RA-0078
Keywords
Data mining - prediction models - clinical decision support - data reuse - practice
management