Methods Inf Med 2011; 50(06): 536-544
DOI: 10.3414/ME11-06-0002
Original Articles
Schattauer GmbH

Data Analysis and Data Mining: Current Issues in Biomedical Informatics

R. Bellazzi
1   Dipartimento di Informatica e Sistemistica, University of Pavia, Pavia, Italy
,
M. Diomidous
2   Department of Public Health, Faculty of Nursing, University of Athens, Athens, Greece
,
I. N. Sarkar
3   Center for Clinical and Translational Science, Department of Microbiology and Molecular Genetics, and Department of Computer Science, University of Vermont, Burlington, VT, USA
,
K. Takabayashi
4   Division of Medical Informatics and Management, Chiba University Hospital, Chiba, Japan
,
A. Ziegler
5   Institut für Medizinische Biometrie und Statistik, University of Luebeck, Luebeck, Germany
,
A. T. McCray
6   Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
22 January 2018 (online)

Summary

Background: Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research.

Objectives: To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. Methods: On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, which reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field.

Results: The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology.

Conclusions: Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers.

 
  • References

  • 1 Masys DR, Ellison D, Stead WW. Presentation of the 2007 Morris F, Collen award to William W, Stead, MD, including comments from recipient. J Am Med Inform Assoc 2008; 15 (Suppl. 03) 302-306.
  • 2 Haux R. Medical informatics: past, present, future. Int J Med Inform 2010; 79 (Suppl. 09) 599-610.
  • 3 Haux R, Aronsky D, Leong TY, McCray AT. Methods in year 50: preserving the past and preparing for the future. Methods Inf Med 2011; 50 (Suppl. 01) 1-6.
  • 4 Hendler J. Avoiding another AI Winter. IEEE Intelligent Systems 2008; 23 (Suppl. 02) 2-4.
  • 5 Anderson JG. Social, ethical and legal barriers to e-health. Int J Med Inform 2007; 76 5–6 480-483.
  • 6 Lau F, Kuziemsky C, Price M, Gardner J. A review on systematic reviews of health information system studies. J Am Med Inform Assoc 2010; 17 (Suppl. 06) 637-645.
  • 7 Payne PR, Embi PJ, Niland J. Foundational biomedical informatics research in the clinical and translational science era: a call to action. J Am Med Inform Assoc 2010; 17 (Suppl. 06) 615-616.
  • 8 Sarkar IN. Biomedical informatics and translational medicine. J Transl Med 2010; 8: 22.
  • 9 Smith A, Balazinska M, Baru C, Gomelsky M, McLennan M, Rose L, Smith B, Stewart E, Kolker E. Biology and data-intensive scientific discovery in the beginning of the 21st century. OMICS 2011; 15 (Suppl. 04) 209-212.
  • 10 Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 2007; 99 (Suppl. 02) 147-157.
  • 11 Markie P. Rationalism vs. Empiricism. The Stanford Encyclopedia of Philosophy (Fall 2008 Edition). Edward N, Zalta (ed). http://plato.stanford.edu/archives/fall2008/entries/rationalism-empiricism/ (last accessed Aug 17, 2011)
  • 12 Patel VL, Shortliffe EH, Stefanelli M, Szolovits P, Berthold MR, Bellazzi R, Abu-Hanna A. The coming of age of artificial intelligence in medicine. Artif Intell Med 2009; 46 (Suppl. 01) 5-17.
  • 13 Louie B, Mork P, Martin-Sanchez F, Halevy A, Tarczy-Hornoch P. Data integration and genomic medicine. J Biomed Inform 2007; 40 (Suppl. 01) 5-16.
  • 14 Prokosch HU, Ganslandt T. Perspectives for Medical Informatics: Reusing the Electronic Medical Record for Clinical Research. Methods Inf Med 2009; 48: 38-44.
  • 15 Suzuki T, Yokoi H, Fujita S, Takabayashi K. Automatic DPC Code Selection from Electronic Medical Records : Text Mining Trial of Discharge Summary. Methods Inf Med 2008; 47: 541-548.
  • 16 Haux R. Individualization, globalization and health about sustainable information technologies and the aim of medical informatics. Int J Med Inform 2006; 75: 795-808.
  • 17 Chung W, Oh SM, Suh T, Lee YM, Oh BH, Yoon CW. Determinants of length of stay for psychiatric inpatients: analysis of a national database covering the entire Korean elderly population. Health Policy 2010; 94: 120-128.
  • 18 Yang JY, Yang MQ, Zhu M, Arabnia HR, Deng Y. Promoting synergistic research and education in genomics and bioinformatics. BMC Genomics 2008; 9 Suppl I 1-5.
  • 19 Burgun A, Bodenreider O. Accessing and integrating data and knowledge for biomedical research. Yearb Med Inform 2008 pp 91-101.
  • 20 Tran DH, Satou K, Ho TB, Pham TH. Computational discovery of miR-TF regulatory modules in human genome. Bioinform 2010; 4: 371-377.
  • 21 Hey T. The fourth Pradigm: Data-intensive scientific discovery. http://research.microsoft.com.fourthparadigm/
  • 22 Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: an overview. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. (eds.). Advances in knowledge discovery and data mining. Menlo Park, CA, USA: American Association for Artificial Intelligence; 1996. pp 1-34.
  • 23 Iavindrasana J, Cohen G, Depeursinge A, Müller H, Meyer R, Geissbuhler A. Clinical data mining: a review. Yearb Med Inform 2009 pp 121-33.
  • 24 Evans JA, Rzhetsky A. Advancing Science through Mining Libraries, Ontologies, and Communities. J Biol Chem 2011; 286 (Suppl. 27) 23659-23666.
  • 25 Fernandez-Luque L, Karlsen R, Bonander J. Review of extracting information from the Social Web for health personalization. J Med Internet Res 2011; 13 (Suppl. 01) e15.
  • 26 Deléger L, Grouin C, Zweigenbaum P. Extracting medical information from narrative patient records: the case of medication-related information. J Am Med Inform Assoc 2010; 17 (Suppl. 05) 555-558.
  • 27 Cohen KB, Johnson HL, Verspoor K, Roeder C, Hunter LE. The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics 2010; 11: 492.
  • 28 Bellazzi R, Sacchi L, Concaro S. Methods and tools for mining multivariate temporal data in clinical and biomedical applications. Conf Proc IEEE Eng Med Biol Soc 2009; 2009: 5629-5632.
  • 29 van der Aalst W. Process Mining: Discovery, Conformance and Enhancement of Business Processes. Berlin: Springer Verlag; 2011
  • 30 Bhavnani SK, Bellala G, Ganesan A, Krishna R, Saxman P, Scott C, Silveira M, Given C. The nested structure of cancer symptoms. Implications for analyzing co-occurrence and managing symptoms. Methods Inf Med 2010; 49 (Suppl. 06) 581-591.
  • 31 Zupan B, Holmes JH, Bellazzi R. Knowledge-based data analysis and interpretation. Artif Intell Med 2006; 37 (Suppl. 03) 163-165.
  • 32 Yang JY, Niemierko A, Bajcsy R, Xu D, Athey BD, Zhang A, Ersoy OK, Li GZ, Borodovsky M, Zhang JC, Arabnia HR, Deng Y, Dunker AK, Liu Y, Ghafoor A. 2K09 and thereafter: the coming era of integrative bioinformatics, systems biology and intelligent computing for functional genomics and personalized medicine research. BMC Genomics 2010; 11 (Suppl. 03) I1.
  • 33 Bellazzi R, Zupan B. Intelligent data analysis--special issue. Methods Inf Med 2001; 40 (Suppl. 05) 362-364.
  • 34 Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J Biomed Inform 2007; 40 (Suppl. 06) 787-802.
  • 35 Bellazzi R, Zupan B. Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 2008; 77 (Suppl. 02) 81-97.
  • 36 Holmes JH, Peek N. Intelligent data analysis in biomedicine. J Biomed Inform 2007; 40 (Suppl. 06) 605-608.
  • 37 Nuzzo A, Riva A, Bellazzi R. Phenotypic and genotypic data integration and exploration through a web-service architecture. BMC Bioinformatics 2009; 10 (Suppl. 12) S5.
  • 38 Klimov D, Shahar Y, Taieb-Maimon M. Intelligent interactive visual exploration of temporal associations among multiple time-oriented patient records. Methods Inf Med 2009; 48 (Suppl. 03) 254-262.
  • 39 Demsar J, Zupan B, Leban G, Curk T. Orange: From Experimental Machine Learning to Interactive Data Mining, Knowledge Discovery in Databases: PKDD 2004. Lecture Notes in Computer Science 2004; 3202/2004: 537-539.
  • 40 Augusto JC. Temporal reasoning for decision support in medicine. Artificial Intelligence in Medicine 2005; 33 (Suppl. 01) 1-24.
  • 41 Adlassnig KP, Combi C, Das AK, Keravnou ET, Pozzi G. Temporal representation and reasoning in medicine: Research directions and challenges. Artificial Intelligence in Medicine 2006; 38 (Suppl. 02) 101-113
  • 42 Roddick JF. Spiliopoulou M: A Survey of Temporal Knowledge Discovery Paradigms and Methods. IEEE Transactions on Knowledge and Data Engineering. 2002; 14 (Suppl. 04) 750-767
  • 43 Post AR, Harrison JH. Temporal data mining. Clinics in Laboratory Medicine 2008; 28 (Suppl. 01) 83-100.
  • 44 Mitsa T. Temporal Data Mining. CRC Press; 2010
  • 45 Shahar Y. A framework for knowledge-based temporal abstraction. Artificial Intelligence 1997; 90 1–2 79-133.
  • 46 Panzarasa S, Maddè S, Quaglini S, Pistarini C, Stefanelli M. Evidence-based careflow management systems: the case of post-stroke rehabilitation. J Biomed Inform 2002; 35 (Suppl. 02) 123-139.
  • 47 Peleg M, Yeh I, Altman RB. Modelling biological processes using workflow and Petri Net models. Bioinformatics 2002; 18 (Suppl. 06) 825-837.
  • 48 Mans R, Schonenberg H, Leonardi G, Panzarasa S, Cavallini A, Quaglini S, van der Aalst W. Process mining techniques: an application to stroke care. Stud Health Technol Inform 2008; 136: 573-578.
  • 49 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (Suppl. 02) 124-130.
  • 50 http://www.ehr4cr.eu (last accessed April 14, 2011)
  • 51 Zapletal E, Rodon N, Grabar N, Degoulet P. Methodology of integration of a clinical data warehouse with a clinical information system: the HEGP case. Stud Health Technol Inform 2010; 160 Pt 1 193-197.
  • 52 Sintchenko V, Coiera E. Developing decision support systems in clinical bioinformatics. Methods Mol Med 2008; 141: 331-351.
  • 53 Szalma S, Koka V, Khasanova T, Perakslis ED. Effective knowledge management in translational medicine. J Transl Med 2010; 8: 68.
  • 54 Hothorn T, Leisch F, Zeileis A, Hornik K. The design and analysis of benchmark experiments. J Comput Graph Statist 2005; 14: 675-699.
  • 55 König IR, Malley JD, Pajevic S, Weimar C, Diener H-C, Ziegler A. et al. Patient-centered yes/no prognosis using learning machines. Int J Data Min Bioinform 2008; 2: 289-341.
  • 56 Hand D. Classifier technology and the illusion of progress. Stat Sci 2006; 21: 1-14.
  • 57 Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 2003; 95: 14-18.
  • 58 Bradley AA, Schwartz SS, Hashino T. Sampling uncertainty and confidence intervals for the Brier score and Brier skill score. Weather Forecast 2008; 23: 992-1006.
  • 59 Ferro CAT. Comparing probabilistic forecasting systems with the Brier score. Weather Forecast 2007; 22: 1076-1088.
  • 60 Tango T. Equivalence test and confidence interval for the difference in proportions for the paired-sample design. Stat Med 1998; 17: 891-908.
  • 61 Newcombe RG. Improved confidence intervals for the difference between binomial proportions based on paired data. Stat Med 1998; 17: 2635-2650.
  • 62 Zhou XH, Qin GS. A supplement to: “A new confidence interval for the difference between two binomial proportions of paired data”. J Stat Plan Inference 2007; 137: 357-358.
  • 63 Newcombe RG. Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 1998; 17: 873-890.
  • 64 Butte AJ. Translational bioinformatics: coming of age. J Am Med Inform Assoc 2008; 15 (Suppl. 06) 709-714.
  • 65 Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick‘s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res 2009; 37 Database issue D793-6.
  • 66 Sarkar IN, Butte AJ, Lussier YA, Tarczy-Hornoch P, Ohno-Machado L. Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 2011; 18 (Suppl. 04) 354-357.
  • 67 Sujansky W. Heterogeneous database integration in biomedicine. J Biomed Inform 2001; 34 (Suppl. 04) 285-298.
  • 68 Weber GM, Murphy SN, McMurry AJ. et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. Journal of the American Medical Informatics Association: JAMIA 2009; 16 (Suppl. 05) 624-630.
  • 69 Kollmann M, Sourjik V. In silico biology: from simulation to understanding. Curr Biol 2007; 17 (Suppl. 04) R132-4.
  • 70 Di Ventura B, Lemerle C, Michalodimitrakis K, Serrano L. From in vivo to in silico biology and back. Nature 2006; 443 7111 527-533.
  • 71 Aubel D, Fussenegger M. Mammalian synthetic biology – from tools to therapies. Bioessays 2010; 32 (Suppl. 04) 332-345.
  • 72 Yildirim MA, Goh KI, Cusick ME, Barabási AL, Vidal M. Drug-target network. Nat Biotechnol 2007; 25 (Suppl. 10) 1119-1126.
  • 73 Morton NE, Chung Chin CS. (eds). Genetic Epidemiology. New York: Academic; 1978
  • 74 Morton NE. Genetic Epidemiology. Annals of Human Genetics 1997; 61 (Suppl. 01) 1-13.
  • 75 Spence MA. Genetic Epidemiology. Encyclopedia of Biostatistics. Wiley Interscience; 2005
  • 76 Hogeweg P, Searls DB. (eds). The Roots of Bioinformatics in Theoretical Biology. PLoS Computational Biology 2011 7. 03
  • 77 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res 2008; 36: 25-30.
  • 78 Barrett T, Edgar R. Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 2006; 411: 352-369.
  • 79 Zhang H, Morrison MA, Dewan A, Adams S, Andreoli M, Huynh N, Regan M, Brown A, Miller JW, Kim IK, Hoh J, Deangelis MM. The NEI/NCBI dbGAP database: genotypes and haplotypes that may specifically predispose to risk of neovascular age-related macular degeneration. BMC Med Genet 2008; 9: 51.
  • 80 Boussi Rahmouni H, Solomonides T, Casassa Mont M, Shiu S, Rahmouni M. A Model-driven Privacy Compliance Decision Support for Medical Data Sharing in Europe. Methods Inf Med 2011; 50 (Suppl. 04) 326-336.
  • 81 Bardram JE. Pervasive healthcare as a scientific discipline. Methods Inf Med 2008; 47 (Suppl. 03) 178-185.
  • 82 Adams SA. Revisiting the online health information reliability debate in the wake of “web 2.0”: an interdisciplinary literature and website review. Int J Med Inform 2010; 79 (Suppl. 06) 391-400.
  • 83 Musen MA. Architectures for architects. Methods Inf Med 1993; 32 (Suppl. 01) 12-13.