Subscribe to RSS
DOI: 10.1055/a-2796-1975
Optimizing the Accuracy of Natural Language Processing Tools for Pulmonary Embolism Detection Through Integration with Claims Data: The PE-EHR+ Study
Authors
Funding Information Dr. Bikdeli was supported by a Career Development Award from the American Heart Association and Vascular InterVentional Advances Physicians (#938814) for the PE-EHR+ study.

Abstract
Background
Rule-based natural language processing (NLP) tools can identify pulmonary embolism (PE) via radiology reports. However, their external validity remains uncertain.
Methods
In this cross-sectional study, 1,712 hospitalized patients (with and without PE) at Mass General Brigham (MGB) hospitals (2016–2021) were analyzed. Two previously published NLP algorithms were applied to radiology reports to identify PE. Chart review by two physicians was the reference standard. We tested three approaches: (A) NLP applied to all patients; (B) NLP limited to radiology reports of patients with principal or secondary International Classification of Diseases 10th revision (ICD-10) PE discharge codes; and (C) NLP applied to patients with PE discharge codes or a Present-on-Admission (POA) indicator (“Y”) for PE. All others were assumed PE-negative in Approaches B and C to minimize NLP false positives. Weighted estimates were derived from the MGB hospitalized cohort (n = 381,642) to calculate F1 scores (as the harmonic mean of sensitivity and positive predictive value [PPV]).
Results
In Approach A, both NLP tools showed high sensitivity (82.5%, 93.0%) and specificity (98.9%, 98.7%) but low PPV (60.3%, 59.6%). Approach B improved PPV (95.2%, 94.9%) but reduced sensitivity (74.1%, 76.2%), while Approach C preserved both high sensitivity (82.5%, 93.0%) and PPV (95.6%, 95.8%). Approach C demonstrated the best performance, yielding significantly higher F1 scores for both NLP tools (88.6%, 94.4%) compared with Approach A (69.7%, 72.6%) and Approach B (83.3%, 84.5%) (P < 0.001).
Conclusion
The accuracy of PE detection improves when rule-based NLP algorithms are operationalized using administrative claims data in addition to radiology reports.
Keywords
pulmonary embolism - natural language processing - international classification of diseases - electronic health record - accuracyData Availability Statement
The data for this study are available from the corresponding author upon reasonable request.
Publication History
Received: 20 August 2025
Accepted: 23 January 2026
Accepted Manuscript online:
28 January 2026
Article published online:
09 February 2026
© 2026. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Stein PD, Beemath A, Olson RE. Trends in the incidence of pulmonary embolism and deep venous thrombosis in hospitalized patients. Am J Cardiol 2005; 95 (12) 1525-1526
- 2 Bikdeli B, Wang Y, Jimenez D. et al. Pulmonary embolism hospitalization, readmission, and mortality rates in US older adults, 1999-2015. JAMA 2019; 322 (06) 574-576
- 3 Barco S, Valerio L, Gallo A. et al. Global reporting of pulmonary embolism-related deaths in the World Health Organization mortality database: vital registration data from 123 countries. Res Pract Thromb Haemost 2021; 5 (05) e12520
- 4 Bikdeli B, Lo YC, Khairani CD. et al. Developing validated tools to identify pulmonary embolism in electronic databases: rationale and design of the PE-EHR+ study. Thromb Haemost 2023; 123 (06) 649-662
- 5 Pham AD, Névéol A, Lavergne T. et al. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings. BMC Bioinformatics 2014; 15 (01) 266
- 6 Tian Z, Sun S, Eguale T, Rochefort CM. Automated extraction of VTE events from narrative radiology reports in electronic health records: a validation study. Med Care 2017; 55 (10) e73-e80
- 7 Li A, De Las Pozas G, Andersen CR. et al. External validation of a novel electronic risk score for cancer-associated thrombosis in a comprehensive cancer center. Am J Hematol 2023; 98 (07) 1052-1057
- 8 Verma AA, Masoom H, Pou-Prom C. et al. Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients. Thromb Res 2022; 209: 51-58
- 9 Johnson SA, Signor EA, Lappe KL. et al. A comparison of natural language processing to ICD-10 codes for identification and characterization of pulmonary embolism. Thromb Res 2021; 203: 190-195
- 10 Mass General Brigham. Hospitals, Services & Specialties. Accessed at: https://www.massgeneralbrigham.org/en/patient-care/services-and-specialties/locations.location=Hospital
- 11 Bikdeli B, Khairani CD, Bejjani A. et al; PE-EHR+ Investigators. Validating International Classification of Diseases Code 10th Revision algorithms for accurate identification of pulmonary embolism. J Thromb Haemost 2025; 23 (02) 556-564
- 12 ICD List. Appendix I - Present on Admission Reporting Guidelines. 2024; available at: https://icdlist.com/icd-10/guidelines/appendix-i-present-on-admission-reporting-guidelines
- 13 Khanna RR, Kim SB, Jenkins I. et al. Predictive value of the present-on-admission indicator for hospital-acquired venous thromboembolism. Med Care 2015; 53 (04) e31-e36
- 14 Rashedi S, Leyva H, Pfeferman MB. et al. Use of present-on-admission indicators to improve accuracy of pulmonary embolism identification from electronic health record data. Semin Thromb Hemost 2025; 51 (07) 829-833
- 15 Eusebi P. Diagnostic accuracy measures. Cerebrovasc Dis 2013; 36 (04) 267-272
- 16 Mor Y. Diagnostic test evaluation. In: Eltorai AEM, Liu T, Kalva SP, Chand R. eds. Translational Interventional Radiology. Academic Press; 2023: 221-224
- 17 Rashedi S, Bejjani A, Hunsaker AR. et al; PE-EHR+ Investigators. Isolated subsegmental pulmonary embolism identification based on international classification of diseases (ICD)-10 codes and imaging reports. Thromb Res 2025; 247: 109271
- 18 Winden TJ, Boland LL, Frey NG, Satterlee PA, Hokanson JS. Care everywhere, a point-to-point HIE tool: utilization and impact on patient care in the ED. Appl Clin Inform 2014; 5 (02) 388-401
- 19 Hassan SM, Mylvaganam R, Didebulidze T. et al. Leveraging hybrid natural language processing techniques for large-scale pulmonary embolism identification. JACC Adv 2025; 4 (11 Pt 2): 101845
- 20 Cheligeer C, Southern DA, Yan J. et al. Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism. J Am Med Inform Assoc 2025; 32 (05) 876-884
- 21 Ge J, Li M, Delk MB, Lai JC. A comparison of a large language model vs manual chart review for the extraction of data elements from the electronic health record. Gastroenterology 2024; 166 (04) 707-709.e3
- 22 Hsueh JY, Nethala D, Singh S. et al. Exploring the feasibility of GPT-4 as a data extraction tool for renal surgery operative notes. Urol Pract 2024; 11 (05) 782-789
- 23 Bürgisser N, Chalot E, Mehouachi S. et al. Large language models for accurate disease detection in electronic health records. medRxiv 2024; 2024.07.27. 24311106 . Epub ahead of print
- 24 Bhattarai K, Oh IY, Sierra JM. et al. Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods. JAMIA Open 2024; 7 (03) ooae060
- 25 Du X, Novoa-Laurentiev J, Plasaek JM. et al. Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes. medRxiv . May 6 2024. Epub ahead of print
- 26 Akinci D'Antonoli T, Tejani AS, Khosravi B. et al. Cybersecurity threats and mitigation strategies for large language models in health care. Radiol Artif Intell 2025; 7 (04) e240739