Discovering Subgroups Using Descriptive Models of Adverse Outcomes in Medical Care

G. Stiglic; P. Kokol

doi:10.3414/ME11-02-0040

Methods of Information in Medicine, Inhaltsverzeichnis

Methods Inf Med 2012; 51(04): 348-352
DOI: 10.3414/ME11-02-0040

Focus Theme – Original Articles

Schattauer GmbH

Discovering Subgroups Using Descriptive Models of Adverse Outcomes in Medical Care

Autoren

G. Stiglic

¹Faculty of Health Sciences, University of Maribor, Slovenia

²Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia
P. Kokol

¹Faculty of Health Sciences, University of Maribor, Slovenia

²Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia

Abstract

Summary

Objectives: Hospital discharge databases store hundreds of thousands of patients. These datasets are usually used by health insurance companies to process claims from hospitals, but they also represent a rich source of information about the patterns of medical care. The proposed subgroup discovery method aims to improve the efficiency of detecting interpretable subgroups in data.

Methods: Supervised descriptive rule discovery techniques can prove inefficient in cases when target class samples represent only an extremely small amount of all available samples. Our approach aims to balance the number of samples in target and control groups prior to subgroup discovery process. Additionally, we introduce some improvements to an existing subgroup discovery algorithm enhancing the user experience and making the descriptive data mining process and visualization of rules more user friendly.

Results: Instance-based subspace subgroup discovery introduced in this paper is demonstrated on hospital discharge data with focus on medical errors. In general, the number of patients with a recorded diagnosis related to a medical error is relatively small in comparison to patients where medical errors did not occur. The ability to produce comprehensible and simple models with high degree of confidence, support, and predictive power using the proposed method is demonstrated.

Conclusions: This paper introduces a subspace subgroup discovery process that can be applied in all settings where a large number of samples with relatively small number of target class samples are present. The proposed method is implemented in Weka machine learning environment and is available at http://ri.fzv.uni-mb.si/ssd.

Keywords

Subgroup discovery - data mining - descriptive trees

Volltext

Referenzen

References
1 Li J, Fu AW, Fahey P. Efficient discovery of risk patterns in medical data. Artificial Intelligence in Medicine 2009; 45: 77-89.
2 Li B, Evans D, Faris P, Dean S, Quan H. Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases. BMC Health Serv Res 2008; 8: 12
3 Kloesgen W. Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge discovery and data mining.American Association for Artificial Intelligence 1996: 249-271.
4 Wrobel S. An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery. Springer 1997; 1263 LNAI 78-87.
5 Kralj-Novak P, Lavrac N, Webb GI. Supervised descriptive rule discovery: a unifying survey of constrast set, emerging pattern and subgroup mining. J Mach Learn Res 2009; 10: 377-403.
6 Lavrac N, Cestnik B, Gamberger D, Flach PA. Decision support through subgroup discovery: three case studies and the lessons learned. Machine Learning 2004; 57: 115-143.
7 Nannings B, Abu-Hanna A, de Jonge E. Applying PRIM (Patient Rule Induction Method) and logistic regression for selecting high-risk subgroups in very elderly ICU patients. Int J Med Inform 2008; 77 (04) 272-279.
8 Nannings B, Bosman RJ, Abu-Hanna A. A subgroup discovery approach for scrutinizing blood glucose management guidelines by the identification of hyperglycemia determinants in ICU patients. Methods Inf Med 2008; 47 (06) 480-488.
9 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA Data Mining Software: An Update. SIGKDD Explorations 2009; 11: 1
10 Webb G, Butler S, Newlands D. On Detecting Differences between Groups. In Proceedings of the Ninth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining. 2003: 256-265.
11 Gamberger D, Lavrac N. Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research 2002; 17: 501-527.
12 Webb GI. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Mach Learn 2008; 71: 307-323.
13 Leman D, Feelders A, Knobbe A. Exceptional model mining. In: Proceedings of the ECML/PKDD 2008; 2: 1-16.
14 Miller R, Siegmund D. Maximally selected chi-square statistics. Biometrics 1982; 38: 1011-1016.
15 National Center for Health Statistics, National Hospital Discharge Survey (NHDS) data, U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics, Hyattsville, Maryland available at: http://www.cdc.gov/nchs/nhds.htm (2008).
16 U.S. Department of Health and Human Services Centers for Disease Control and Prevention, Centers for Medicare and Medicaid Services. Official version International Classification of Diseases, Ninth Revision, Clinical Modification, Sixth Edition. DHHS Pub No. (PHS) 06-1260 (2006)
17 Shreve J, van Den Bos J, Gray T, Halford M, Rustagi K, Ziemkiewicz E. The Economic Measurement of Medical Errors. Society of Actuaries 2010.
18 Curns AT, Steiner CA, Sejvar JJ, Schonberger LB. Hospital charges attributable to a primary diagnosis of infectious diseases in older adults in the United States, 1998 to 2004. J Am Geriatr Soc 2008; 56: 969-975.
19 Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann 2011.
20 Azevedo PJ, Jorge AM. Comparing rule measures for predictive association rules. In: ECML ’07. Proceedings of the 18th European conference on Machine Learning. 2007: 510-517.
21 Abu-Hanna A, Nannings B, Dongelmans D, Hasman A. PRIM versus CART in subgroup discovery: When patience is harmful. J Biomed Inform 2010; 43: 701-708.
22 Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees, statistics/probability series. Belmont, California, USA: Wadsworth Publishing Company; 1984.
23 Stiglic G, Kokol P. Interpretability of Sudden Concept Drift in Medical Informatics Domain. In ICDM-W ’11: Workshop proceedings of the 11th International Conference on Data Mining 2011. In press