Semin Thromb Hemost
DOI: 10.1055/a-2554-0043
Letter to the Editor

Use of Present-on-Admission Indicators to Improve Accuracy of Pulmonary Embolism Identification from Electronic Health Record Data

1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Hannah Leyva
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Mariana B. Pfeferman
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Darsiya Krishnathasan
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Antoine Bejjani
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
2   Department of Internal Medicine, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
,
Candrika D. Khairani
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Mehrdad Zarghami
3   Department of Medicine, Jamaica Hospital Medical Center, Queens, New York
,
David Jimenez
4   Respiratory Department, Hospital Ramón y Cajal, and Medicine Department, Universidad de Alcalá (Instituto de Ramón y Cajal de Investigación Sanitaria), Centro de Investigación Biomédica en Red de Enfermedades Respiratorias, Madrid, Spain
,
Alfonso Muriel
5   Department of Biostatistics, Hospital Ramón y Cajal, and Universidad de Alcalá (IRYCIS), Madrid, Spain
6   CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
,
Samuel Z. Goldhaber
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
7   Division of Cardiovascular Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Liqin Wang
8   Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Eric A. Secemsky
9   Department of Medicine, Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts
10   Harvard Medical School, Boston, Massachusetts
11   Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
,
Gregory Piazza
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
7   Division of Cardiovascular Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
,
Harlan M. Krumholz
12   YNHH/Yale Center for Outcomes Research and Evaluation (CORE), New Haven, Connecticut
13   Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
14   Department of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut
,
Zhenqiu Lin
12   YNHH/Yale Center for Outcomes Research and Evaluation (CORE), New Haven, Connecticut
,
Behnood Bikdeli
1   Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
10   Harvard Medical School, Boston, Massachusetts
12   YNHH/Yale Center for Outcomes Research and Evaluation (CORE), New Haven, Connecticut
› Author Affiliations

Funding B.B. is supported by a Career Development Award from the American Heart Association and Vascular InterVentional Advances Physicians (#938814) for the PE-EHR+ study. G.P. received research grants from BMS/Pfizer, Janssen, Alexion, Bayer, Amgen, BSC, Esperion, Regeneron, and 1R01HL164717-01.
Preview

Validation of the accuracy of International Classification of Diseases (ICD)-10 codes for identifying pulmonary embolism (PE) in claims databases is essential for research purposes and quality improvement initiatives.[1] [2] While ICD-10 principal discharge diagnosis codes are highly specific, they lack sensitivity, missing 40% of cases.[1] Adding secondary discharge diagnosis codes to principal codes improves sensitivity but increases false positives, diminishing the positive predictive value (PPV) by more than 10%.[1] [3] Therefore, efforts are necessary to reduce false positive results from incorporating secondary codes.

We recently validated an algorithm combining ICD-10 principal codes or secondary codes plus imaging codes to identify patients with PE from electronic health record (EHR) data.[1] However, imaging codes may be unavailable or incomplete at the individual patient level in large claims databases or require substantive time for processing and linkage. The current report focuses on the evaluation of an alternative approach, combining present-on-admission (POA) indicators (i.e., claims codes distinguishing preexisting conditions from those arising during hospitalization[4]) and ICD-10 codes to enhance PE identification, compared with using ICD-10 principal or secondary codes alone.

The rationale and design features of this study (called PE-EHR + ) were described previously.[3] Briefly, it included 1,712 adult hospitalized patients at Mass General Brigham Health System (MGB) between January 1, 2016, and December 31, 2021, to validate EHR-based tools for identifying patients with PE. Patients were selected in three equal-sized groups, including patients with a principal diagnosis code for PE, a secondary diagnosis code for PE, or no discharge diagnosis code for PE. ICD-10 discharge diagnosis codes for PE were I26 and its derivatives. Two independent physicians (A.B. and C.D.K) reviewed medical charts using prespecified criteria as the reference standard.[3] [5] These two physicians independently reviewed the medical charts, including a review of medical notes, vital signs, laboratory data, and imaging reports from computed tomography scans, high-probability ventilation/perfusion scans, ultrasound studies, and others as needed.[3] [5] Discrepancies were resolved by consulting with a third physician (B.B.).[3] [5] According to the prespecified criteria, patients were considered to have PE if acute PE diagnosis was mentioned in medical notes such as discharge summaries, verified by sufficient confirmatory findings for PE in radiology reports during the hospitalization (such as reports for filling defect in computed tomography pulmonary angiography, high-probability ventilation/perfusion scan, direct verification of pulmonary thrombi/emboli in invasive angiography, or presence of new proximal deep vein thrombosis in conjunction with symptoms and signs of PE).[3] [5] The investigators evaluated the imaging reports to differentiate acute PE from chronic-appearing emboli. The location of PE could be either subsegmental, segmental, lobar, and/or central pulmonary arteries.[3] [5] [6]

POA indicators denote whether a diagnosis was present at the time of admission or occurred during hospitalization.[4] [7] POA can be reported as “Y” (diagnosis present on admission), “N” (diagnosis absent on admission), “U” (inadequate documentation and timing of diagnosis cannot be determined), “W” (adequate documentation but the timing of diagnosis is unclear due to clinical uncertainty), and “1” (diagnosis code is exempted from POA reporting; not applicable to PE).[4] [7]

A hybrid approach was tested, incorporating ICD-10 principal codes for PE, or secondary codes plus POA indicators “Y” or “N” for PE (i.e., excluding “U” or “W”) plus the absence of ICD-10 principal or secondary discharge diagnosis codes for PE within 30 days before the index hospitalization ([Fig. 1A]). The rationale for considering the POA indicator “Y” with secondary codes was to minimize false positive findings, as we hypothesized that those with both secondary codes and POA indicator “Y” for PE were more likely to have acute PE than those with secondary codes alone. We considered the POA indicator “N” plus secondary codes for PE to account for hospital-acquired PE. To eliminate patients with a history of recent PE who did not have acute PE in the index presentation, in this subset, we excluded individuals with ICD-10 discharge diagnosis codes for PE in any position within 30 days before the index hospitalization.

Zoom
Fig. 1 (A) The hybrid approach incorporating POA indicators into ICD-10 discharge diagnosis codes, and (B) its accuracy compared with discharge diagnosis codes alone for PE identification. CI, confidence interval; ICD, International Classification of Diseases; NPV, negative predictive value; PE, pulmonary embolism; POA, present-on-admission; PPV, positive predictive value.

We recognized that patients with ICD-10 codes for PE in either the principal or secondary discharge position are disproportionately represented in the unweighted sample compared with their actual prevalence in health care systems, as most patients do not have acute or prior PE.[3] To account for this and ensure an accurate estimation of diagnostic accuracy metrics, it was predetermined that the three equally sized groups—those with a principal diagnosis of PE, those with a secondary diagnosis of PE, and those without a PE diagnosis—should be appropriately weighted.[3] Therefore, weighted estimates were determined considering the total number of hospitalizations at MGB in the study period. From January 1, 2016, to December 31, 2021, there were 4,878 patients at MGB with principal codes for PE, 3,224 patients with secondary codes for PE, and 373,540 patients without codes for PE.[3] Sensitivity, specificity, PPV, and negative predictive values were ascertained via MedCalc.[8] F1 scores were calculated as a metric for the overall performance and compared using the chi-square test. F1 score combines sensitivity and PPV and is calculated using the following formulae: 2 × (PPV × sensitivity)/(PPV + sensitivity) or (2 × true positive)/(2 × true positive + false positive + false negative).[9]

In the unweighted sample of 1,712 patients (mean age 60.6 ± 17.8, 52.3% female), the hybrid approach combining POA indicators with ICD-10 discharge diagnosis codes resulted in a sensitivity of 97.7% and a specificity of 92.3% ([Fig. 1B] and [Table 1]). In weighted estimates, the hybrid approach incorporating POA indicators resulted in higher sensitivity (81.8% vs. 58.3%) and similar PPV (92.7% vs. 92.1%) compared with using only principal discharge diagnosis codes for PE. The hybrid approach achieved comparable sensitivity (81.8% vs. 83.2%) and higher PPV (92.7% vs. 79.1%) than the method using principal or secondary discharge codes. The F1 score was significantly higher for the hybrid approach than using principal codes (0.87 vs. 0.71, p < 0.001) or principal or secondary codes (0.87 vs. 0.81, p < 0.001), indicating its superior performance in identifying PE.

Table 1

Details related to the accuracy of ICD-10 discharge diagnosis codes with and without the incorporation of present-on-admission indicators for identification of pulmonary embolism

Principal discharge diagnosis codes

Secondary discharge diagnosis codes

Principal or secondary discharge diagnosis codes

Hybrid approach of discharge diagnosis codes and POA indicators

Unweighted sample

Overall, n

568

568

1,136

908

True positive, n

523

338

861

843

False negative, n

340

525

2

20

True negative, n

804

619

574

784

False positive, n

45

230

275

65

Total population

1,712

1,712

1,712

1,712

Sensitivity, % (95% CI)

523/863 = 60.6 (57.3–63.9)

338/863 = 39.1 (35.9–42.5)

861/863 = 99.8 (99.2–99.9)

843/863 = 97.7 (96.4–98.6)

Specificity, % (95% CI)

804/849 = 94.7 (93.0–96.1)

619/849 = 72.9 (69.8–75.9)

574/849 = 67.6 (64.4–70.8)

784/849 = 92.3 (90.4–94.0)

Weighted sample

Overall, n

4,878

3,224

8,102

6,808

True positive, n

4,492

1,919

6,411

6,308

False negative, n

3,216

5,789

1,297

1,400

True negative, n

373,548

373,905

372,243

373,434

False positive, n

386

1,305

1,691

500

Total population

381,642

381,642

381,642

381,642

Sensitivity, % (95% CI)

4,492/7,708 = 58.3 (57.2–59.4)

1,919/7,708 = 24.9 (23.9–25.9)

6,411/7,708 =

83.2 (82.3–84.0)

6,308/7,708 =

81.8 (81.0–82.7)

Specificity, % (95% CI)

373,548/373,934 = 99.9 (99.9–99.9)

373,905/375,210 = 99.7 (99.6–99.7)

372,243/373,943 = 99.5 (99.5–99.6)

373,434/373,934 = 99.9 (99.9–99.9)

PPV, % (95% CI)

4,492/4,878 = 92.1 (91.3–92.8)

1,919/3,224 =

59.5 (57.9–61.1)

6,411/8,102 =

79.1 (78.3–79.9)

6,308/6,808 =

92.7 (92.0–93.2)

NPV, % (95% CI)

373,548/376,764 = 99.1 (99.1–99.2)

373,905/378,418 = 98.5 (98.5–98.5)

372,243/373,540 = 99.7 (99.6–99.7)

373,434/374,834 = 99.6 (99.6–99.6)

F1 Score

0.71

0.35

0.81

0.87[a]

Abbreviations: CI, confidence interval; ICD, International Classification of Diseases; NPV, negative predictive value; PE, pulmonary embolism; POA, present-on-admission; PPV, positive predictive value.


a The F1 score for this hybrid approach was significantly higher than all the other four approaches compared using the chi-square test (p < 0.001).


This study demonstrated that using principal discharge codes or secondary discharge codes paired with POA indicators (“Y” or “N”) plus codes to verify no PE-related hospitalization in the past 30 days improved overall performance for PE identification (higher F1 scores) compared with methods using either principal discharge codes alone or a combination of principal or secondary codes.

Incorporating POA indicators into ICD-10 discharge diagnosis codes has been evaluated in cardiovascular diseases, such as myocardial infarction and heart failure.[10] However, their utility for identifying PE has not been widely studied. Prior studies mainly focused on hospital-acquired venous thromboembolism rather than all patients with acute PE.[4] [11]

A challenge in using ICD-10 secondary codes for PE identification is distinguishing a recent history of PE from acute PE during the hospitalization of interest.[1] In our unweighted sample, 15 out of 17 patients with secondary codes for PE and recent PE-related hospitalization did not have acute PE during index hospitalization, according to chart reviews. Therefore, our hybrid approach excluded these patients, improving PE identification accuracy. However, those patients with principal discharge diagnosis codes (compared with those with only secondary discharge diagnosis codes) are more likely to have PE despite recent PE-related hospitalization, as the PPV of principal codes is substantially higher than secondary codes (92.1% vs. 59.5%). Thus, patients with principal codes for PE and recent PE-related hospitalization were kept in the hybrid approach, as they likely represent recurrent PE.

This study had some limitations. The data were derived from several centers in the United States within the MGB Health Care System. The Centers for Medicare and Medicaid Services demands that medical diagnoses in hospital discharge records be labeled with a POA indicator using similar approaches.[4] Consequently, POA indicators have been widely utilized across the United States for various diagnoses, including PE.[4] [10] [12] [13] Globally, the World Health Organization (WHO) has recommended using a diagnosis-timing flag to improve the ability of coded hospital data to support outcomes research and quality improvement initiatives.[14] While the exact “POA” terminology is not universally adopted, similar practices exist in several other health care systems, such as in the United Kingdom,[15] South Korea,[16] Australia,[14] [17] and Canada.[14] Variations may exist in the use of the POA indicator in other U.S. health systems[13] and, more importantly, other countries than the United States. Future studies are warranted to evaluate the validity of the proposed hybrid approach in other health care systems. Additionally, the hybrid approach using discharge diagnosis codes paired with POA indicators cannot be applied to the minority of patients with low-risk PE managed as outpatients. Furthermore, the current investigation did not explore approaches to validate the detection of PE-related outcomes (e.g., recurrent PE or PE-related death), which should be pursued in future studies.

In conclusion, a hybrid approach comprising ICD-10 principal codes of PE or secondary codes plus POA indicators “Y” or “N” plus no recent PE-related hospitalization can reliably identify PE without compromising sensitivity or PPV, making it useful for future research and quality improvement efforts based on claims data.



Publication History

Received: 24 January 2025

Accepted: 07 March 2025

Article published online:
24 June 2025

© 2025. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA