Yearb Med Inform 2016; 25(S 01): S117-S29
DOI: 10.15265/IYS-2016-s033
Original Article
Georg Thieme Verlag KG Stuttgart

Progress in Biomedical Knowledge Discovery: A 25-year Retrospective

L. Sacchi
1  Biomedical Informatics Laboratory “Mario Stefanelli”, Department of Electrical, Computer, and Biomedical Engineering, University of Pavia, Italy
J. H. Holmes
2  Institute for Biomedical Informatics, Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
› Author Affiliations
Further Information

Publication History

02 August 2016

Publication Date:
06 March 2018 (online)


Objectives: We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning.

Methods: We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated.

Results: A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992-2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992.

Conclusions: Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains.

Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.