Appl Clin Inform 2021; 12(02): 407-416
DOI: 10.1055/s-0041-1729752
Review Article

Rethinking PICO in the Machine Learning Era: ML-PICO

Xinran Liu
1   Division of Hospital Medicine, University of California, San Francisco, San Francisco, California, United States
2   University of California, San Francisco, San Francisco, California, United States
,
James Anstey
1   Division of Hospital Medicine, University of California, San Francisco, San Francisco, California, United States
,
Ron Li
3   Division of Hospital Medicine, Stanford University, Stanford, California, United States
,
Chethan Sarabu
4   doc.ai, Palo Alto, California, United States
5   Department of Pediatrics, Stanford University, Stanford, California, United States
,
Reiri Sono
2   University of California, San Francisco, San Francisco, California, United States
,
Atul J. Butte
6   Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, United States
› Author Affiliations

Abstract

Background Machine learning (ML) has captured the attention of many clinicians who may not have formal training in this area but are otherwise increasingly exposed to ML literature that may be relevant to their clinical specialties. ML papers that follow an outcomes-based research format can be assessed using clinical research appraisal frameworks such as PICO (Population, Intervention, Comparison, Outcome). However, the PICO frameworks strain when applied to ML papers that create new ML models, which are akin to diagnostic tests. There is a need for a new framework to help assess such papers.

Objective We propose a new framework to help clinicians systematically read and evaluate medical ML papers whose aim is to create a new ML model: ML-PICO (Machine Learning, Population, Identification, Crosscheck, Outcomes). We describe how the ML-PICO framework can be applied toward appraising literature describing ML models for health care.

Conclusion The relevance of ML to practitioners of clinical medicine is steadily increasing with a growing body of literature. Therefore, it is increasingly important for clinicians to be familiar with how to assess and best utilize these tools. In this paper we have described a practical framework on how to read ML papers that create a new ML model (or diagnostic test): ML-PICO. We hope that this can be used by clinicians to better evaluate the quality and utility of ML papers.

Protection of Human and Animal Subjects

Human and/or animal subjects were not included in this project.




Publication History

Received: 17 November 2020

Accepted: 24 March 2021

Article published online:
19 May 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25 (01) 44-56
  • 2 Peterson ED. Machine learning, predictive analytics, and clinical practice: can the past inform the present?. JAMA 2019; 322 (23) 2283-2284
  • 3 Sevakula RK, Au-Yeung WM, Singh JP, Heist EK, Isselbacher EM, Armoundas AA. State-of-the-art machine learning techniques aiming to improve patient outcomes pertaining to the cardiovascular system. J Am Heart Assoc 2020; 9 (04) e013924
  • 4 Litjens G, Kooi T, Bejnordi BE. et al. A survey on deep learning in medical image analysis. Med Image Anal 2017; 42: 60-88
  • 5 Forney MC, McBride AF. Artificial intelligence in radiology residency training. Semin Musculoskelet Radiol 2020; 24 (01) 74-80
  • 6 Weisberg EM, Fishman EK. Developing a curriculum in artificial intelligence for emergency radiology. Emerg Radiol 2020; 27 (04) 359-360
  • 7 Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med 2019; 380 (14) 1347-1358
  • 8 Liu Y, Chen PC, Krause J, Peng L. How to read articles that use machine learning: users' guides to the medical literature. JAMA 2019; 322 (18) 1806-1816
  • 9 Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: recalibrating expectations. JAMA 2018; 320 (01) 27-28
  • 10 Aslam S, Emmanuel P. Formulating a researchable question: a critical step for facilitating good clinical research. Indian J Sex Transm Dis AIDS 2010; 31 (01) 47-50
  • 11 Leeflang MMG, Allerberger F. How to: evaluate a diagnostic test. Clin Microbiol Infect 2019; 25 (01) 54-59
  • 12 Guyatt GH, Tugwell PX, Feeny DH, Haynes RB, Drummond M. A framework for clinical evaluation of diagnostic technologies. CMAJ 1986; 134 (06) 587-594
  • 13 Thiese MS. Observational and interventional study design types; an overview. Biochem Med (Zagreb) 2014; 24 (02) 199-210
  • 14 Nagendran M, Chen Y, Lovejoy CA. et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689
  • 15 Bouwmeester W, Zuithoff NPA, Mallett S. et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9 (05) 1-12
  • 16 Riley RD, Ensor J, Snell KIE. et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2016; 353: i3140
  • 17 van der Ploeg T, Nieboer D, Steyerberg EW. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury. J Clin Epidemiol 2016; 78: 83-89
  • 18 Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS. Predictive analytics in health care: how can we know it works?. J Am Med Inform Assoc 2019; 26 (12) 1651-1654
  • 19 Antonelli M, Johnston EW, Dikaios N. et al. Machine learning classifiers can predict Gleason pattern 4 prostate cancer with greater accuracy than experienced radiologists. Eur Radiol 2019; 29 (09) 4754-4764
  • 20 Li RC, Asch SM, Shah NH. Developing a delivery science for artificial intelligence in healthcare. NPJ Digit Med 2020; 3: 107
  • 21 Bowman L, Mafham M, Wallendszus K. et al; ASCEND Study Collaborative Group. Effects of aspirin for primary prevention in persons with diabetes mellitus. N Engl J Med 2018; 379 (16) 1529-1539
  • 22 Calvert JS, Price DA, Chettipally UK. et al. A computational approach to early sepsis detection. Comput Biol Med 2016; 74: 69-73
  • 23 Desautels T, Calvert J, Hoffman J. et al. Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. JMIR Med Inform 2016; 4 (03) e28
  • 24 Shashikumar SP, Stanley MD, Sadiq I. et al. Early sepsis detection in critical care patients using multiscale blood pressure and heart rate dynamics. J Electrocardiol 2017; 50 (06) 739-743
  • 25 MIT Critical Data. Secondary Analysis of Electronic Health Records. Cham: Springer International Publishing; 2016
  • 26 Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 2020; 21 (02) 345-352
  • 27 He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019; 25 (01) 30-36
  • 28 Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc 2020; 2020: 191-200
  • 29 Hayati Rezvan P, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol 2015; 15: 30
  • 30 Reddy S, Allan S, Coghlan S, Cooper P. A governance model for the application of AI in health care. J Am Med Inform Assoc 2020; 27 (03) 491-497
  • 31 Rudrapatna VA, Butte AJ. Opportunities and challenges in using real-world data for health care. J Clin Invest 2020; 130 (02) 565-574
  • 32 Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc 2018; 25 (10) 1419-1428
  • 33 Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med 2018; 178 (11) 1544-1547
  • 34 Adamson AS, Welch HG. Machine learning and the cancer-diagnosis problem—no gold standard. N Engl J Med 2019; 381 (24) 2285-2287
  • 35 Beam AL, Kohane IS. Big data and machine learning in health care. JAMA 2018; 319 (13) 1317-1318
  • 36 Mao Q, Jay M, Hoffman JL. et al. Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU. BMJ Open 2018; 8 (01) e017833
  • 37 Taylor RA, Pare JR, Venkatesh AK. et al. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data-driven, machine learning approach. Acad Emerg Med 2016; 23 (03) 269-278
  • 38 Rudrapatna VA, Glicksberg BS, Avila P, Harding-Theobald E, Wang C, Butte AJ. Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ Open Qual 2020; 9 (01) e000856
  • 39 Iwashyna TJ, Odden A, Rohde J. et al. Identifying patients with severe sepsis using administrative claims: patient-level validation of the angus implementation of the international consensus conference definition of severe sepsis. Med Care 2014; 52 (06) e39-e43
  • 40 Rhee C, Dantes R, Epstein L. et al; CDC Prevention Epicenter Program. Incidence and trends of sepsis in US Hospitals using clinical vs claims data, 2009-2014. JAMA 2017; 318 (13) 1241-1249
  • 41 Ribeiro AH, Ribeiro MH, Paixão GMM. et al. Automatic diagnosis of the short-duration 12-lead ECG using a deep neural network: the CODE Study. ArXiv190401949 Cs Eess Stat. Published online April 1, 2019. Accessed May 11, 2019 at: http://arxiv.org/abs/1904.01949
  • 42 Ng A. Machine Learning Yearning. deeplearning.ai; 2017. Accessed May 11, 2019 at: https://www.deeplearning.ai/machine-learning-yearning/
  • 43 Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Paper presented at: Proceedings of the 14th International Joint Conference on Artificial Intelligence—Volume 2. Montreal, Canada: Morgan Kaufmann Publishers Inc.;; 1995: 1137-1143
  • 44 Roberts DR, Bahn V, Ciuti S. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017; 40 (08) 913-929
  • 45 Rabinowicz A, Rosset S. Cross-validation for correlated data. J Am Stat Assoc 2020; 97: 883-897
  • 46 Kaufman S, Rosset S, Perlich C, Stitelman O. Leakage in data mining: formulation, detection, and avoidance. ACM Trans Knowl Discov Data 2012; 6 (04) 1-21
  • 47 Vetter TR, Mascha EJ. Defining the primary outcomes and justifying secondary outcomes of a study: usually, the fewer, the better. Anesth Analg 2017; 125 (02) 678-681
  • 48 Liu VX, Bates DW, Wiens J, Shah NH. The number needed to benefit: estimating the value of predictive analytics in healthcare. J Am Med Inform Assoc 2019; 26 (12) 1655-1659
  • 49 Rácz A, Bajusz D, Héberger K. Multi-level comparison of machine learning classifiers and their performance metrics. Molecules 2019; 24 (15) E2811
  • 50 Botchkarev A. A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdiscip J Inf Knowl Manag 2019; 14: 45-76
  • 51 Dangeti P. Statistics for Machine Learning. Birmingham, United Kingdom: Packt Publishing Ltd; 2017
  • 52 Alba AC, Agoritsas T, Walsh M. et al. Discrimination and calibration of clinical prediction models: users' guides to the medical literature. JAMA 2017; 318 (14) 1377-1384
  • 53 Shah NH, Milstein A, Bagley PhD SC. Making machine learning models clinically useful. JAMA 2019; DOI: 10.1001/jama.2019.10306.
  • 54 Romero-Brufau S, Huddleston JM, Escobar GJ, Liebow M. Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care 2015; 19 (01) 285
  • 55 Ozenne B, Subtil F, Maucort-Boulch D. The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol 2015; 68 (08) 855-859
  • 56 Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin 2015; 65 (02) 87-108
  • 57 Ziaeian B, Fonarow GC. Epidemiology and aetiology of heart failure. Nat Rev Cardiol 2016; 13 (06) 368-378
  • 58 Wild S, Roglic G, Green A, Sicree R, King H. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 2004; 27 (05) 1047-1053
  • 59 Rough K, Dai AM, Zhang K. et al. Predicting inpatient medication orders from electronic health record data. Clin Pharmacol Ther 2020; 108 (01) 145-154
  • 60 Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. Paper presented at: Proceedings of the 23rd International Conference on Machine Learning—ICML '06. Pittsburgh, Pennsylvania: ACM Press;; 2006: 233-240
  • 61 Matheny M, Thadaney Israni S. Artificial Intelligence Special Publication. NAM Special Publication; 2019. . Accessed February 14, 2020 at: https://nam.edu/artificial-intelligence-special-publication/
  • 62 Thomas G, Kenny LC, Baker PN, Tuytten R. A novel method for interrogating receiver operating characteristic curves for assessing prognostic tests. Diagn Progn Res 2017; 1 (01) 17