Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Asa Adadey; Robert Giannini; Lorraine B. Possanza

doi:10.1055/s-0041-1735620

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2021; 60(05/06): 147-161
DOI: 10.1055/s-0041-1735620

Original Article

Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Authors

Asa Adadey

¹Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States
Robert Giannini

¹Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States
Lorraine B. Possanza

¹Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States

Further Information

Permissions and Reprints

Abstract

Background Patient safety event reports provide valuable insight into systemic safety issues but deriving insights from these reports requires computational tools to efficiently parse through large volumes of qualitative data. Natural language processing (NLP) combined with predictive learning provides an automated approach to evaluating these data and supporting the work of patient safety analysts.

Objectives The objective of this study was to use NLP and machine learning techniques to develop a generalizable, scalable, and reliable approach to classifying event reports for the purpose of driving improvements in the safety and quality of patient care.

Methods Datasets for 14 different labels (themes) were vectorized using a bag-of-words, tf-idf, or document embeddings approach and then applied to a series of classification algorithms via a hyperparameter grid search to derive an optimized model. Reports were also analyzed for terms strongly associated with each theme using an adjusted F-score calculation.

Results F₁ score for each optimized model ranged from 0.951 (“Fall”) to 0.544 (“Environment”). The bag-of-words approach proved optimal for 12 of 14 labels, and the naïve Bayes algorithm performed best for nine labels. Linear support vector machine was demonstrated as optimal for three labels and XGBoost for four of the 14 labels. Labels with more distinctly associated terms performed better than less distinct themes, as shown by a Pearson's correlation coefficient of 0.634.

Conclusions We were able to demonstrate an analytical pipeline that broadly applies NLP and predictive modeling to categorize patient safety reports from multiple facilities. This pipeline allows analysts to more rapidly identify and structure information contained in patient safety data, which can enhance the evaluation and the use of this information over time.

Keywords

machine learning - natural language processing - patient safety - algorithms

Ethical Approval

No human and/or animal subjects were involved in this research.

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Publication History

Received: 12 November 2020

Accepted: 05 August 2021

Article published online:
31 October 2021

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Hwang C-Y, Wu C-H, Cheng F-C, Yen Y-L, Wu K-H. A 12-year analysis of closed medical malpractice claims of the Taiwan civil court: a retrospective study. Medicine (Baltimore) 2018; 97 (13) e0237

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Santuzzi NR, Brodnik MS, Rinehart-Thompson L, Klatt M. Patient satisfaction: how do qualitative comments relate to quantitative scores on a satisfaction survey?. Qual Manag Health Care 2009; 18 (01) 3-18

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Boussat B, Kamalanavin K, François P. The contribution of open comments to understanding the results from the Hospital Survey on Patient Safety Culture (HSOPS): a qualitative study. PLoS One 2018; 13 (04) e0196089

Crossref PubMed Search in Google Scholar
Download RIS citation
4 James JTA. A new, evidence-based estimate of patient harms associated with hospital care. J Patient Saf 2013; 9 (03) 122-128

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Makary MA, Daniel M. Medical error-the third leading cause of death in the US. BMJ 2016; 353: i2139

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Lawton R, McEachan RRC, Giles SJ, Sirriyeh R, Watt IS, Wright J. Development of an evidence-based framework of factors contributing to patient safety incidents in hospital settings: a systematic review. BMJ Qual Saf 2012; 21 (05) 369-380

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Pronovost PJ, Morlock LL, Sexton JB. et al. Improving the value of patient safety reporting systems. In: Henriksen K, Battles JB, Keyes MA, Grady ML. eds. Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 1: Assessment). Advances in Patient Safety.. Rockville, MD: Agency for Healthcare Research and Quality; 2008

Search in Google Scholar
Download RIS citation
8 Mitchell I, Schuster A, Smith K, Pronovost P, Wu A. Patient safety incident reporting: a qualitative study of thoughts and perceptions of experts 15 years after 'To Err is Human'. BMJ Qual Saf 2016; 25 (02) 92-99

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Pronovost PJ, Thompson DA, Holzmueller CG. et al. Toward learning from patient safety reporting systems. J Crit Care 2006; 21 (04) 305-315

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Piotrowski MM, Saint S, Hinshaw DB. The Safety Case Management Committee. The Safety Case Management Committee: expanding the avenues for addressing patient safety. Jt Comm J Qual Improv 2002; 28 (06) 296-305

PubMed Search in Google Scholar
Download RIS citation
11 Joshi MS, Anderson JF, Marwaha S. A systems approach to improving error reporting. J Healthc Inf Manag 2002; 16 (01) 40-45

PubMed Search in Google Scholar
Download RIS citation
12 Benn J, Koutantji M, Wallace L. et al. Feedback from incident reporting: information and action to improve patient safety. Qual Saf Health Care 2009; 18 (01) 11-21

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Wang Y, Coiera E, Runciman W, Magrabi F. Using multiclass classification to automate the identification of patient safety incident reports by type and severity. BMC Med Inform Decis Mak 2017; 17 (01) 84

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Throop C, Stockmeier C. SEC & SSER Patient Safety Measurement System for Healthcare (2nd revision). Virginia Beach, VA: Healthcare Performance Improvement, LLC; 2011: 34

Search in Google Scholar
Download RIS citation
15 Patterson ES, Anders S, Moffatt-Bruce S. Clustering and prioritizing patient safety issues during EHR implementation and upgrades in hospital settings. Proc Int Symp Hum Factors Ergon Healthc 2017; 6 (01) 125-131

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Chang A, Schyve PM, Croteau RJ, O'Leary DS, Loeb JM. The JCAHO patient safety event taxonomy: a standardized terminology and classification schema for near misses and adverse events. Int J Qual Health Care 2005; 17 (02) 95-105

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Zhang Y, Jin R, Zhou Z-H. Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 2010; 1 (1–4): 43-52

Crossref Search in Google Scholar
Download RIS citation
18 Leskovec J, Rajaraman A, Ullman JD. eds. Data mining. In: Mining of Massive Datasets. 3rd ed.. Cambridge: Cambridge University Press; 2020: 1-19

Crossref Search in Google Scholar
Download RIS citation
19 Le QV, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning Vol 32. JMLR: W&CP; 2014. Accessed May 20, 2021 at: http://arxiv.org/abs/1405.4053

Download RIS citation
20 Govindan M, Van Citters AD, Nelson EC, Kelly-Cummings J, Suresh G. Automated detection of harm in healthcare with information technology: a systematic review. Qual Saf Health Care 2010; 19 (05) e11

PubMed Search in Google Scholar
Download RIS citation
21 Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 2005; 12 (04) 448-457

Crossref PubMed Search in Google Scholar
Download RIS citation
22 Penz JFE, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform 2007; 40 (02) 174-182

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Gerdes LU, Hardahl C. Text mining electronic health records to identify hospital adverse events. Stud Health Technol Inform 2013; 192: 1145

PubMed Search in Google Scholar
Download RIS citation
24 Weller GB, Lovely J, Larson DW, Earnshaw BA, Huebner M. Leveraging electronic health records for predictive modeling of post-surgical complications. Stat Methods Med Res 2018; 27 (11) 3271-3285

Crossref PubMed Search in Google Scholar
Download RIS citation
25 Zhou S, Kang H, Yao B, Gong Y. An automated pipeline for analyzing medication event reports in clinical settings. BMC Med Inform Decis Mak 2018; 18 (Suppl. 05) 113

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Fong A, Adams KT, Gaunt MJ, Howe JL, Kellogg KM, Ratwani RM. Identifying health information technology related safety event reports from patient safety event report databases. J Biomed Inform 2018; 86: 135-142

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Fong A, Komolafe T, Adams KT, Cohen A, Howe JL, Ratwani RM. Exploration and initial development of text classification models to identify health information technology usability-related patient safety event reports. Appl Clin Inform 2019; 10 (03) 521-527

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
28 AHRQ Patient Safety Organization Program. Common formats. Agency for Healthcare Research and Quality (AHRQ). Accessed September 15, 2020 at: https://pso.ahrq.gov/common-formats

Download RIS citation
29 Benin AL, Fodeh SJ, Lee K, Koss M, Miller P, Brandt C. Electronic approaches to making sense of the text in the adverse event reporting system. J Healthc Risk Manag 2016; 36 (02) 10-20

Crossref PubMed Search in Google Scholar
Download RIS citation
30 Ong M-S, Magrabi F, Coiera E. Automated categorisation of clinical incident reports using statistical text classification. Qual Saf Health Care 2010; 19 (06) e55

Crossref PubMed Search in Google Scholar
Download RIS citation
31 Perkins J. ed. Calculating high information words. In: Python 3 Text Processing with NLTK 3 Cookbook. 2 ed. Packt open source.. Birmingham: Packt Publishing; 2014: 214-219

Search in Google Scholar
Download RIS citation
32 Zhang H. The optimality of naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference. Menlo Park, CA: AAAI Press; 2004: 1-6

Search in Google Scholar
Download RIS citation
33 Lau JH, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany. Stroudsburg, PA: ACL; 2016: 78-86

Search in Google Scholar
Download RIS citation
34 Chai KEK, Anthony S, Coiera E, Magrabi F. Using statistical text classification to identify health information technology incidents. J Am Med Inform Assoc 2013; 20 (05) 980-985

Crossref PubMed Search in Google Scholar
Download RIS citation
35 Kowsari JM, Heidarysafa M, Barnes B. Text classification algorithms: a survey. Information (Basel) 2019; 10 (04) 150

Crossref Search in Google Scholar
Download RIS citation
36 Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM; 2016: 785-794

Crossref Search in Google Scholar
Download RIS citation
37 Pedregosa F, Varoquaux G, Gramfort A. et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; (12) 2825-2830

Search in Google Scholar
Download RIS citation
38 Řehůřek R, Sojka P. Software framework for topic modelling with large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Paris: ELRA; 2010: 45-50

Search in Google Scholar
Download RIS citation
39 Kessler J. Scattertext: a browser-based tool for visualizing how Corpora differ. In: Proceedings of ACL 2017, System Demonstrations. Stroudsburg, PA: ACL; 2017: 85-90

Crossref Search in Google Scholar
Download RIS citation
40 Man Kwon Y, Hee Jun S, Mo Gal W, Jae Lim M. The performance comparison of the classifiers according to binary bow, count bow and Tf-Idf feature vectors for malware detection. Int J Eng Technol. 2018; 7 (3.33): 15-22

Crossref Search in Google Scholar
Download RIS citation
41 Unified Medical Language System® (UMLS®): RxNorm. National Library of Medicine (NLM). Accessed September 15, 2020 at: https://www.nlm.nih.gov/research/umls/rxnorm/index.html

Download RIS citation
42 LOINC®(Logical Observation Identifiers Names and Codes) - home page. Regenstrief Institute, Inc. Accessed September 15, 2020 at: https://loinc.org/

Download RIS citation
43 SNOMED - Home | SNOMED International. SNOMED International. Accessed September 15, 2020 at: https://www.snomed.org/

Download RIS citation

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms

Authors

Abstract

Keywords

Ethical Approval

Supplementary Material

Publication History

References