Summary
Objectives: Hospital discharge databases store hundreds of thousands of patients. These datasets
are usually used by health insurance companies to process claims from hospitals, but
they also represent a rich source of information about the patterns of medical care.
The proposed subgroup discovery method aims to improve the efficiency of detecting
interpretable subgroups in data.
Methods: Supervised descriptive rule discovery techniques can prove inefficient in cases when
target class samples represent only an extremely small amount of all available samples.
Our approach aims to balance the number of samples in target and control groups prior
to subgroup discovery process. Additionally, we introduce some improvements to an
existing subgroup discovery algorithm enhancing the user experience and making the
descriptive data mining process and visualization of rules more user friendly.
Results: Instance-based subspace subgroup discovery introduced in this paper is demonstrated
on hospital discharge data with focus on medical errors. In general, the number of
patients with a recorded diagnosis related to a medical error is relatively small
in comparison to patients where medical errors did not occur. The ability to produce
comprehensible and simple models with high degree of confidence, support, and predictive
power using the proposed method is demonstrated.
Conclusions: This paper introduces a subspace subgroup discovery process that can be applied in
all settings where a large number of samples with relatively small number of target
class samples are present. The proposed method is implemented in Weka machine learning
environment and is available at http://ri.fzv.uni-mb.si/ssd.
Keywords
Subgroup discovery - data mining - descriptive trees