Appl Clin Inform 2022; 13(01): 056-066
DOI: 10.1055/s-0041-1740923
Research Article

A Graphical Toolkit for Longitudinal Dataset Maintenance and Predictive Model Training in Health Care

Eric Bai*
1   Warren Alpert Medical School, Brown University, Providence, Rhode Island, United States
,
Sophia L. Song*
1   Warren Alpert Medical School, Brown University, Providence, Rhode Island, United States
,
Hamish S. F. Fraser
2   Brown University Center for Biomedical Informatics, Providence, Rhode Island, United States
,
Megan L. Ranney
3   Brown-Lifespan Center for Digital Health, Providence, Rhode Island, United States
› Author Affiliations
Funding This study was funded by Advance-CTR Grant (National Institute of Health; grant no.: U54GM115677), 2020 Brown University Summer Research Assistantship Fund.

Abstract

Background Predictive analytic models, including machine learning (ML) models, are increasingly integrated into electronic health record (EHR)-based decision support tools for clinicians. These models have the potential to improve care, but are challenging to internally validate, implement, and maintain over the long term. Principles of ML operations (MLOps) may inform development of infrastructure to support the entire ML lifecycle, from feature selection to long-term model deployment and retraining.

Objectives This study aimed to present the conceptual prototypes for a novel predictive model management system and to evaluate the acceptability of the system among three groups of end users.

Methods Based on principles of user-centered software design, human-computer interaction, and ethical design, we created graphical prototypes of a web-based MLOps interface to support the construction, deployment, and maintenance of models using EHR data. To assess the acceptability of the interface, we conducted semistructured user interviews with three groups of users (health informaticians, clinical and data stakeholders, chief information officers) and evaluated preliminary usability using the System Usability Scale (SUS). We subsequently revised prototypes based on user input and developed user case studies.

Results Our prototypes include design frameworks for feature selection, model training, deployment, long-term maintenance, visualization over time, and cross-functional collaboration. Users were able to complete 71% of prompted tasks without assistance. The average SUS score of the initial prototype was 75.8 out of 100, translating to a percentile range of 70 to 79, a letter grade of B, and an adjective rating of “good.” We reviewed persona-based case studies that illustrate functionalities of this novel prototype.

Conclusion The initial graphical prototypes of this MLOps system are preliminarily usable and demonstrate an unmet need within the clinical informatics landscape.

Protection of Human and Animal Subjects

This project was deemed exempt from IRB review according to federal and university regulations.


* These authors contributed equally to this work.




Publication History

Received: 26 May 2021

Accepted: 09 November 2021

Article published online:
16 February 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Kruse CS, Goswamy R, Raval Y, Marawi S. Challenges and opportunities of big data in health care: a systematic review. JMIR Med Inform 2016; 4 (04) e38
  • 2 Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policy implications. Health Aff (Millwood) 2014; 33 (07) 1115-1122
  • 3 Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol 2019; 20 (05) e262-e273
  • 4 Rojas JC, Carey KA, Edelson DP, Venable LR, Howell MD, Churpek MM. Predicting intensive care unit readmission with machine learning using electronic health record data. Ann Am Thorac Soc 2018;15(07):
  • 5 Kansagara D, Englander H, Salanitro A. et al. Risk prediction models for hospital readmission: a systematic review. JAMA 2011; 306 (15) 1688-1698
  • 6 Hao S, Wang Y, Jin B. et al. Development, validation and deployment of a real time 30 day hospital readmission risk assessment tool in the maine healthcare information exchange. PLoS One 2015; 10 (10) e0140271
  • 7 Wu CX, Suresh E, Phng FWL. et al. Effect of a real-time risk score on 30-day readmission reduction in Singapore. Appl Clin Inform 2021; 12 (02) 372-382
  • 8 Nemati S, Holder A, Razmi F, Stanley MD, Clifford GD, Buchman TG. An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit Care Med 2018;46(04):
  • 9 Teng AK, Wilcox AB. A review of predictive analytics solutions for sepsis patients. Appl Clin Inform 2020; 11 (03) 387-398
  • 10 Machado CDS, Ballester PL, Cao B. et al. Prediction of suicide attempts in a prospective cohort study with a nationally representative sample of the US population. Psychol Med 2021; 1-12
  • 11 Oliva EM, Bowe T, Tavakoli S. et al. Development and applications of the Veterans Health Administration's Stratification Tool for Opioid Risk Mitigation (STORM) to improve opioid safety and prevent overdose and suicide. Psychol Serv 2017; 14 (01) 34-49
  • 12 Kwon JM, Jeon KH, Kim HM. et al. Deep-learning-based risk stratification for mortality of patients with acute myocardial infarction. PLoS One 2019; 14 (10) e0224502
  • 13 Bala W, Steinkamp J, Feeney T. et al. A web application for adrenal incidentaloma identification, tracking, and management using machine learning. Appl Clin Inform 2020; 11 (04) 606-616
  • 14 Li RC, Asch SM, Shah NH. Developing a delivery science for artificial intelligence in healthcare. NPJ Digit Med 2020; 3: 107
  • 15 Coiera E. The last mile: where artificial intelligence meets reality. J Med Internet Res 2019; 21 (11) e16323
  • 16 Collins GS, Mallett S, Omar O, Yu L-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med 2011; 9 (01) 103
  • 17 Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med 2015; 13 (01) 1-10
  • 18 Collins GS, Omar O, Shanyinde M, Yu L-M. A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. J Clin Epidemiol 2013; 66 (03) 268-277
  • 19 Sendak MP, Ratliff W, Sarro D. et al. Real-world integration of a sepsis deep learning technology into routine clinical care: implementation study. JMIR Med Inform 2020; 8 (07) e15182
  • 20 Schneeweiss S. Learning from big health care data. N Engl J Med 2014; 370 (23) 2161-2163
  • 21 Lee TC, Shah NU, Haack A, Baxter SL. Clinical implementation of predictive models embedded within electronic health record systems: a systematic review. Informatics (MDPI) 2020; 7 (03) 25
  • 22 Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2017; 24 (01) 198-208
  • 23 Sculley D, Holt G, Golovin D. et al. Hidden technical debt in machine learning systems. In: Proceedings of the 28th International Conference on Neural Information Processing Systems;2;2503–2511; 2015
  • 24 Alahdab M, Çalıklı G. Empirical analysis of hidden technical debt patterns in machine learning software. In: Franch X, Männistö T, Martínez-Fernández S, eds. Barcelona, Spain: 20th International Conference, PROFES 2019; 2019
  • 25 Cunningham W. The WyCash portfolio management system. In: OOPSLA '92: Addendum to the proceedings on Object-oriented programming systems, languages, and applications (Addendum); 29-30 1992
  • 26 Makinen S, Skogstrom H, Laaksonen E, Mikkonen T. Who needs MLOps: What data scientists seek to accomplish and how can MLOps help?. In: 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN); 2021
  • 27 Karamitsos I, Albarhami S, Apostolopoulos C. Applying DevOps practices of continuous automation for machine learning. Information (Basel) 2020; 11 (07) 363
  • 28 IEEE. Frontiers of data-intensive compute algorithms: sustainable MLOps and beyond. In: 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC); 2020
  • 29 Sato D, Wider A, Windheuser C. Continuous Delivery for Machine Learning. Accessed July 31, 2021 at: https://martinfowler.com/articles/cd4ml.html
  • 30 Tsay J, Mummert T, Bobroff N, Braz A, Westerink P, Hirzel M. Runway: machine learning model experiment management tool. Accessed December 3, 2021: https://mlsys.org/Conferences/doc/2018/26.pdf
  • 31 Hazelwood K, Bird S, Brooks D, Chintala S, Diril U, Dzhulgakov D. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA); 2018
  • 32 Lamkin T. Kubeflow 1.0: Cloud-Native ML for Everyone - kubeflow - Medium. 2020
  • 33 Katsiapis K, Haas K. Towards ML Engineering with TensorFlow Extended (TFX). In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019
  • 34 Vartak M, Subramanyam H, Lee W-E. et al. ModelDB: a system for machine learning model management. In: HILDA '16: Proceedings of the Workshop on Human-In-the-Loop Data Analytics; 2016
  • 35 Chen A, Chow A, Davidson A. et al. Developments in MLflow. In: Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning; 2020
  • 36 Kamthan P. Using Personas to Support the Goals in User Stories. In: 2015 12th International Conference on Information Technology - New Generations; 2015
  • 37 Negru S, Buraga S. Towards a conceptual model for describing the personas methodology. In: 2012 IEEE 8th International Conference on Intelligent Computer Communication and Processing; 2012
  • 38 Liebe JD, Hüsers J, Hübner U. Investigating the roots of successful IT adoption processes - an empirical study exploring the shared awareness-knowledge of Directors of Nursing and Chief Information Officers. BMC Med Inform Decis Mak 2016; 16 (01) 10
  • 39 Benda NC, Das LT, Abramson EL. et al. “How did you get to this number?” Stakeholder needs for implementing predictive analytics: a pre-implementation qualitative study. J Am Med Inform Assoc 2020; 27 (05) 709-716
  • 40 Grilo A, Lapao LV, Jardim-Goncalves R, Cruz-Machado V. Challenges for the Development of Interoperable Information Systems in Healthcare Organizations. In: 2009 International Conference on Interoperability for Enterprise Software and Applications China; 2009
  • 41 Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc 2018; 25 (08) 969-975
  • 42 Zhu L, Zhang S, Lu Z. Respect for autonomy: seeking the roles of healthcare design from the principle of biomedical ethics. HERD 2020; 13 (03) 230-244
  • 43 Brennan MD, Duncan AK, Armbruster RR, Montori VM, Feyereisn WL, LaRusso NF. The application of design principles to innovate clinical care delivery. J Healthc Qual 2009; 31 (01) 5-9
  • 44 Walden A, Garvin L, Smerek M, Johnson C. User-centered design principles in the development of clinical research tools. Clin Trials 2020; 17 (06) 703-711
  • 45 Jensen TB. Design principles for achieving integrated healthcare information systems. Health Informatics J 2013; 19 (01) 29-45
  • 46 Hummer W, Muthusamy V, Rausch T. et al. ModelOps: cloud-based lifecycle management for reliable and trusted AI. In: 2019 IEEE International Conference on Cloud Engineering (IC2E); 2019
  • 47 Feizi A, Wong CY. Usability of user interface styles for learning a graphical software application. In:2012 International Conference on Computer & Information Science (ICCIS); 2012
  • 48 Shneiderman B, Plaisant C, Cohen MS, Jacobs SM, Elmqvist N. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Boston, MA: Pearson; 2017
  • 49 Nithya B, Ilango V. Predictive analytics in health care using machine learning tools and techniques. In: 2017 International Conference on Intelligent Computing and Control Systems (ICICCS); 2018
  • 50 Cabitza F, Campagner A, Balsano C. Bridging the “last mile” gap between AI implementation and operation: “data awareness” that matters. Ann Transl Med 2020; 8 (07) 501
  • 51 Brooke J. SUS: A quick and dirty usability scale. Accessed December 3, 2021: https://hell.meiert.org/core/pdf/sus.pdf
  • 52 Bangor A, Kortum PT, Miller JT. An empirical evaluation of the system usability scale. Int J Hum Comput Interact 2008; 24 (06) 574-594
  • 53 Lewis JR, Brown J, Mayes DK. Psychometric evaluation of the EMO and the SUS in the context of a large-sample unmoderated usability study. Int J Hum Comput Interact 2015; 31 (08) 545-553
  • 54 Lewis JR, Sauro J. The Factor Structure of the System Usability Scale. Accessed December 3, 2021: https://measuringu.com/papers/Lewis_Sauro_HCII2009.pdf
  • 55 Peres SC, Pham T, Phillips R. Validation of the system usability scale (SUS): SUS in the wild. Proc Hum Fact Ergon Soc Annu Meet 2013; 57 (01) 192-196
  • 56 Blandford A, Furniss D, Makri S. Qualitative HCI research: Going behind the scenes. Synth lect hum-centered inform. 2016; DOI: 10.2200/S00706ED1V01Y201602HCI034.
  • 57 Sauro J. A Practical Guide to the System Usability Scale: Background, Benchmarks & Best Practices. Denver, CO: CreateSpace Independent Publishing Platform; 2011
  • 58 Sauro J, Lewis JR. Quantifying the User Experience: Practical Statistics for User Research. Waltham, MA: Morgan Kaufmann; 2016
  • 59 Lee KK-Y, Tang W-C, Choi K-S. Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. Comput Methods Programs Biomed 2013; 110 (01) 99-109
  • 60 Barrak A, Eghan EE, Adams B. On the co-evolution of ML pipelines and source code - empirical study of DVC projects. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER); 2021
  • 61 Bezanson J, Edelman A. and Karpinski S, and Shah VB. Julia: a fresh approach to numerical computing. SIAM Rev 2017; 59 (01) 65-98
  • 62 Rossum GDrake FL. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace; 2009
  • 63 R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Accessed December 3, 2021: https://www.gbif.org/tool/81287/r-a-language-and-environment-for-statistical-computing.
  • 64 Mandl KD, Perakslis ED. HIPAA and the leak of “Deidentified” EHR data. N Engl J Med 2021; 384 (23) 2171-2173
  • 65 Peregrin T. Managing HIPAA compliance includes legal and ethical considerations. J Acad Nutr Diet 2021; 121 (02) 327-329
  • 66 Choi YB, Capitan KE, Krause JS, Streeper MM. Challenges associated with privacy in health care industry: implementation of HIPAA and the security rules. J Med Syst 2006; 30 (01) 57-64
  • 67 Zhou Y, Yu Y, Ding B. Towards MLOps: a case study of ML pipeline platform. In: 2020 International Conference on Artificial Intelligence and Computer Engineering (ICAICE); Beijing, China;2020
  • 68 Wong A, Otles E, Donnelly JP. et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021; 181 (08) 1065-1070
  • 69 Purkayastha S, Trivedi H, Gichoya JW. Failures hiding in success for artificial intelligence in radiology. J Am Coll Radiol 2021; 18 (3, pt. B, 3, pt. B): 517-519
  • 70 Davis SE, Greevy Jr. RA, Lasko TA, Walsh CG, Matheny ME. Detection of calibration drift in clinical prediction models to inform model updating. J Biomed Inform 2020; 112: 103611
  • 71 Ho SY, Phua K, Wong L, Bin Goh WW. Extensions of the external validation for checking learned model interpretability and generalizability. Patterns (N Y) 2020; 1 (08) 100129
  • 72 Zhuang F, Qi Z, Duan K. et al. A comprehensive survey on transfer learning. Proc IEEE 2021; 109 (01) 43-76
  • 73 Gupta P, Malhotra P, Narwariya J, Vig L, Shroff G. Transfer learning for clinical time series analysis using deep neural networks. J Healthc Inform Res 2020; 4 (02) 112-137