Lessons Learned from Data Mining of WHO Mortality Database

W. Paoin

doi:10.3414/ME10-02-0019

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2011; 50(04): 380-385
DOI: 10.3414/ME10-02-0019

Special Topic – Original Articles

Schattauer GmbH

Lessons Learned from Data Mining of WHO Mortality Database

Authors

W. Paoin

¹Faculty of Medicine, Thammasat University, Pathumthani, Thailand

Further Information

Publication History

received: 02 March 2010

accepted: 17 June 2010

Publication Date:
18 January 2018 (online)

Permissions and Reprints

Summary

Objectives: The objectives of this research were to test the ability of classification algorithms to predict the cause of death in the mortality data with unknown causes, to find association between common causes of death, to identify groups of countries based on their common causes of death, and to extract knowledge gained from data mining of the World Health Organization mortality database.

Methods: The WEKA software version 3.5.3 was used for classification, clustering and association analysis of the World Health Organization mortality database which contained 1,109,537 records. Three major steps were performed: Step 1 – preprocessing of data to convert all records into suitable formats for each type of analysis algorithm; Step 2 – analyzing data using the C4.5 decision tree and Naïve Bayes classification algorithm, K-means clustering algorithm and Apriori association analysis algorithm; Step 3 – interpretation of results and hypothesis testing after clustering analysis.

Results: Using a C4.5 decision tree classifier to predict cause of death, we obtained 440 leaf nodes that correctly classify death instances with an accuracy of 40.06%. Naïve Bayes classification algorithm calculated probability of death from each disease that correctly classify death instances with an accuracy of 28.13%. K means clustering divided the data into four clusters with 189, 59, 65, 144 country-years in each cluster. A Chi-square was used to test discriminate disease differences found in each cluster which had different diseases as predominant causes of death. Apriori association analysis produced association rules of linkage among cancer of the lung, hypertension and cerebrovascular diseases. These were found in the top five leading causes of death with 99–100% confidence level.

Conclusion: Classification tools produced the poorest results in predicting cause of death. Given the inadequacy of variables in the WHO database, creation of a classification model to predict specific cause of death was impossible. Clustering and association tools yielded interesting results that could be used to identify new areas of interest in mortality data analysis. This can be used in data mining analysis to help solve some quality problems in mortality data.

Keywords

Mortality statistics - data mining - classification - clustering - association analysis

References
1 Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. CA: Elsevier Inc; 2007. pp 5-27.

Search in Google Scholar
Download RIS citation
2 Tan PN, Steinbach M, Kumar V. Introduction to Data Mining. MA: Pearson Education Inc; 2006

Search in Google Scholar
Download RIS citation
3 Peek N, Combi C, Tucker A. Biomedical Data Mining. Methods Inf Med 2009; 48: 225-228.

Thieme Connect Search in Google Scholar
Download RIS citation
4 Thailand Ministry of Public Health. Public Health Statistics, A. D. 1996-2005. 2005

PubMed Search in Google Scholar
Download RIS citation
5 Hanmer L, Lensink R, White H. Infant and child mortality in developing countries: Analysing the data for Robust determinants. Journal of Development Studies 2003; 40 (01) 101-118.

Crossref Search in Google Scholar
Download RIS citation
6 McMichael AJ, McKee M, Shkolnikov V, Valkonen T. Mortality trends and setbacks: global convergence or divergence?. Lancet 2004; 363: 1155-1159.

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Patil BM, Joshi RC, Toshniwal D, Biradar S. A New Approach: Role of Data Mining in Prediction of Survival of Burn Patients. Journal of Medical System, Online First 2010; February 20, 2010

PubMed Search in Google Scholar
Download RIS citation
8 Ramon J, Fierens D, Guiza F, Meyfroidt G, Blockeel H, Bruynooghe M, Berghe GVD. Mining data from intensive care patients. Advanced Engineering Informatics 2007; 23: 243-256.

Search in Google Scholar
Download RIS citation
9 Zhang D, Ha QC, Lu M. Mining California Vital Statistics Data. Data Mining 2001 ICDM 2001. Proceeding of IEEE Conference on Data Mining 2001 pp 671-672.

PubMed Search in Google Scholar
Download RIS citation
10 Murillo J, Min S. An Outcome Discovery System to Determine Mortality Factors in Primary Care Facilities. Proceeding of the third international workshop on Data and text mining in bioinformatics, 2009. Hong Kong. Association for Computing Machinery. New York: 2009. pp 95-96.

Search in Google Scholar
Download RIS citation
11 WHO Mortality Database (internet). World Health Organization (cited Sep 6, 2008). Available from http://www.who.int/healthinfo/morttables/en/index.html

Download RIS citation
12 WHO. Reported information on the mortality statistics. (internet). World Health Organization (cited Sep 6, 2008). Available from http://www.who.int/healthinfo/mort2005survey/en/index.html

Download RIS citation
13 Moser K, Shkolnikov V, Leon DA. World Mortality 1950-2000:divergence replaces convergence from the late 1980s. Bulletin of the World Health Organization 2005; 83: 202-209.

PubMed Search in Google Scholar
Download RIS citation
14 Witten IH, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd ed. CA: Elsevier Inc; 2005. pp 365-368.

Search in Google Scholar
Download RIS citation
15 World Health Organization.. International Classification of Disease and Related Health Problems, 10th Revision. 2nd ed. 2004 pp 1163-1166.

PubMed Search in Google Scholar
Download RIS citation
16 Richards G, Rayward VJ, Sonksen PH, Carey S, Weng C. Data minings for indicators or early mortality in a databases of clinical records. Artificial Intelligence in Medicine 2001; 22: 215-231.

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Young MC, Hye SK, Kwan CT, Hyun JP, Seung HH. Analysis of healthcare quality indicator using data mining and decision support system. Expert Systems with Applications 2003; 24: 167-172.

Crossref Search in Google Scholar
Download RIS citation
18 Mullins IM, Siatady MS, Lyman J, Scully K, Garrette CT. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine 2006; 36: 1351-1357.

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Chen YW, Larbani M, Cheng-Yen H, Chao-Wen C. Introduction of affinity set and its application in data-mining example of delayed diagnosis. Expert Systems with Applications 2009; 36: 10883-10889.

Crossref Search in Google Scholar
Download RIS citation
20 Bratu CV, Muresan T. Improving classification accuracy through feature selection. In: Proceedings of the 4th International Conference on Intelligent Computer Communication and Processing; 2008 Aug 28-30; Cluj-Napoca, Romania: IEEE; 2008

Search in Google Scholar
Download RIS citation
21 Mathers CD, Ma Fat D, Inoue M, Rao C, Lopez AD. Counting the dead and what they died of: an assessment of the global status of cause of death data. Bulletin of the World Health Organization 2005; 83: 171-177.

PubMed Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Lessons Learned from Data Mining of WHO Mortality Database

Authors

Publication History

Summary

Keywords

References