Methods Inf Med
DOI: 10.1055/s-0041-1740493
Original Article

Ambiguous and Incomplete: Natural Language Processing Reveals Problematic Reporting Styles in Thyroid Ultrasound Reports

Priya H. Dedhia
1   Department of Surgery, Division of Surgical Oncology, Ohio State University Comprehensive Cancer Center and Ohio State University Wexner Medical Center, Columbus, Ohio, United States
,
Kallie Chen
2   Department of Surgery, Division of Endocrine Surgery, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, United States
,
Yiqiang Song
3   Department of Biostatistics and Medical Informatics, Department of Pediatrics, University of Wisconsin-Madison, Madison, Wisconsin, United States
,
Eric LaRose
4   Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield Clinic Health System, Marshfield, Wisconsin, United States
,
Joseph R. Imbus
2   Department of Surgery, Division of Endocrine Surgery, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, United States
,
Peggy L. Peissig
4   Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield Clinic Health System, Marshfield, Wisconsin, United States
,
Eneida A. Mendonca
3   Department of Biostatistics and Medical Informatics, Department of Pediatrics, University of Wisconsin-Madison, Madison, Wisconsin, United States
5   Department of Pediatrics, Department of Biostatistics and Health Data Sciences, Indiana University, Indianapolis, Indiana, United States
,
David F. Schneider
2   Department of Surgery, Division of Endocrine Surgery, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, United States
› Author Affiliations
Funding None.

Abstract

Objective Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language.

Methods We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated “gold standard” was then used to evaluate NLP performance on the test-set.

Results A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word “heterogeneous” interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B.

Conclusions NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.

Supplementary Material



Publication History

Received: 15 June 2021

Accepted: 05 November 2021

Article published online:
06 January 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany