Subscribe to RSS
DOI: 10.1055/a-2590-6456
Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms
Funding This study received support from New York-Presbyterian Hospital (NYPH) and Weill Cornell Medical College (WCMC), including the Clinical and Translational Science Center (CTSC) (UL1 TR000457) and Joint Clinical Trials Office (JCTO). This study was supported in part by the William and Judy Higgins Trust and the Johns Family Foundation of the Cancer Research and Treatment Fund, Inc. New York, New York, United States.
Abstract
Background
Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.
Methods
An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.
Results
Compared with a reference standard (n = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.
Conclusion
To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.
* These authors contributed equally to the work.
Publication History
Received: 23 September 2024
Accepted: 09 April 2025
Accepted Manuscript online:
17 April 2025
Article published online:
09 May 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Titmarsh GJ, Duncombe AS, McMullin MF. et al. How common are myeloproliferative neoplasms? A systematic review and meta-analysis. Am J Hematol 2014; 89 (06) 581-587
- 2 Arber DA, Orazi A, Hasserjian R. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 2016; 127 (20) 2391-2405
- 3 Rumi E, Cazzola M. Diagnosis, risk stratification, and response evaluation in classical myeloproliferative neoplasms. Blood 2017; 129 (06) 680-692
- 4 Passamonti F, Cervantes F, Vannucchi AM. et al. A dynamic prognostic model to predict survival in primary myelofibrosis: a study by the IWG-MRT (International Working Group for Myeloproliferative Neoplasms Research and Treatment). Blood 2010; 115 (09) 1703-1708
- 5 Kunz I, Peddinti A, Nguyen T. et al. Extracting diagnostic data from unstructured bone marrow biopsy reports of myeloid neoplasms utilizing a customized natural language processing (NLP) algorithm. Blood 2018; 132: 2272
- 6 Burger G, Abu-Hanna A, de Keizer N, Cornet R. Natural language processing in pathology: a scoping review. J Clin Pathol 2016; 69: 949-955
- 7 Adekkanattu P, Sholle ET, DeFerio J, Pathak J, Johnson SB, Campion Jr TR. Ascertaining depression severity by extracting Patient Health Questionnaire-9 (PHQ-9) scores from clinical notes. AMIA Annu Symp Proc 2018; 2018: 147-156
- 8 Johnson SB, Adekkanattu P, Campion Jr TR. et al. From sour grapes to low-hanging fruit: a case study demonstrating a practical strategy for natural language processing portability. AMIA Jt Summits Transl Sci Proc 2018; 2017: 104-112
- 9 Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support?. J Biomed Inform 2009; 42 (05) 760-772
- 10 Abedian S, Sholle ET, Adekkanattu PM. et al. Automated extraction of tumor staging and diagnosis information from surgical pathology reports. JCO Clin Cancer Inform 2021; 5: 1054-1061
- 11 Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak 2019; 19 (239, Suppl 5) 239
- 12 Hu D, Zhang H, Li S, Wang Y, Wu N, Lu X. Automatic extraction of lung cancer staging information from computed tomography reports: deep learning approach. JMIR Med Inform 2021; 9 (07) e27955
- 13 Asti G, Sauta E, Curti N. et al. Clinical text reports to stratify patients affected with myeloid neoplasms using natural language processing. Blood 2023; 142 (Suppl. 01) 122
- 14 Campion TR, Sholle ET, Pathak J, Johnson SB, Leonard JP, Cole CL. An architecture for research computing in health to support clinical and translational investigators with electronic patient data. J Am Med Inform Assoc 2022; 29 (04) 677-685
- 15 Patterson OV, Forbush TB, Saini SD, Moser SE, DuVall SL. Classifying the indication for colonoscopy procedures: a comparison of NLP approaches in a diverse national healthcare system. Stud Health Technol Inform 2015; 216: 614-618
- 16 Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform 2009; 42 (05) 839-851
- 17 Gianelli U, Vener C, Bossi A. et al. The European Consensus on grading of bone marrow fibrosis allows a better prognostication of patients with primary myelofibrosis. Mod Pathol 2012; 25 (09) 1193-1202
- 18 Zarella MD, Bowman D, Aeffner F. et al. A practical guide to whole slide imaging: a white paper from the Digital Pathology Association. Arch Pathol Lab Med 2019; 143 (02) 222-234