Appl Clin Inform 2019; 10(04): 655-669
DOI: 10.1055/s-0039-1695791
Research Article
Georg Thieme Verlag KG Stuttgart · New York

Interactive NLP in Clinical Care: Identifying Incidental Findings in Radiology Reports

Gaurav Trivedi
1   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Esmaeel R. Dadashzadeh
2   Department of Surgery and Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Robert M. Handzel
3   Department of Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Wendy W. Chapman
4   Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
Shyam Visweswaran
1   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
5   Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Harry Hochheiser
1   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
5   Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
› Author Affiliations
Funding The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award number R01LM012095 and a Provost’s Fellowship in Intelligent Systems at the University of Pittsburgh (awarded to G.T.). The content of the paper is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the University of Pittsburgh.
Further Information

Publication History

25 April 2019

09 July 2019

Publication Date:
04 September 2019 (online)


Background Despite advances in natural language processing (NLP), extracting information from clinical text is expensive. Interactive tools that are capable of easing the construction, review, and revision of NLP models can reduce this cost and improve the utility of clinical reports for clinical and secondary use.

Objectives We present the design and implementation of an interactive NLP tool for identifying incidental findings in radiology reports, along with a user study evaluating the performance and usability of the tool.

Methods Expert reviewers provided gold standard annotations for 130 patient encounters (694 reports) at sentence, section, and report levels. We performed a user study with 15 physicians to evaluate the accuracy and usability of our tool. Participants reviewed encounters split into intervention (with predictions) and control conditions (no predictions). We measured changes in model performance, the time spent, and the number of user actions needed. The System Usability Scale (SUS) and an open-ended questionnaire were used to assess usability.

Results Starting from bootstrapped models trained on 6 patient encounters, we observed an average increase in F1 score from 0.31 to 0.75 for reports, from 0.32 to 0.68 for sections, and from 0.22 to 0.60 for sentences on a held-out test data set, over an hour-long study session. We found that tool helped significantly reduce the time spent in reviewing encounters (134.30 vs. 148.44 seconds in intervention and control, respectively), while maintaining overall quality of labels as measured against the gold standard. The tool was well received by the study participants with a very good overall SUS score of 78.67.

Conclusion The user study demonstrated successful use of the tool by physicians for identifying incidental findings. These results support the viability of adopting interactive NLP tools in clinical care settings for a wider range of clinical applications.

Protection of Human and Animal Subjects

Our data collection and user-study protocols were approved by the University of Pittsburgh's Institutional Review Board (PRO17030447 and PRO18070517).

Supplementary Material

  • References

  • 1 Chapman WW, Nadkarni PM, Hirschman L, D'Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011; 18 (05) 540-543
  • 2 Salim A, Sangthong B, Martin M, Brown C, Plurad D, Demetriades D. Whole body imaging in blunt multisystem trauma patients without obvious signs of injury: results of a prospective study. Arch Surg 2006; 141 (05) 468-473
  • 3 Lumbreras B, Donat L, Hernández-Aguado I. Incidental findings in imaging diagnostic tests: a systematic review. Br J Radiol 2010; 83 (988) 276-289
  • 4 James MK, Francois MP, Yoeli G, Doughlin GK, Lee SW. Incidental findings in blunt trauma patients: prevalence, follow-up documentation, and risk factors. Emerg Radiol 2017; 24 (04) 347-353
  • 5 Sperry JL, Massaro MS, Collage RD. , et al. Incidental radiographic findings after injury: dedicated attention results in improved capture, documentation, and management. Surgery 2010; 148 (04) 618-624
  • 6 Pons E, Braun LMM, Hunink MGM, Kors JA. Natural language processing in radiology: a systematic review. Radiology 2016; 279 (02) 329-343
  • 7 Cai T, Giannopoulos AA, Yu S. , et al. Natural language processing technologies in radiology research and clinical applications. Radiographics 2016; 36 (01) 176-191
  • 8 Grundmeier RW, Masino AJ, Casper TC. , et al; Pediatric Emergency Care Applied Research Network. Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement. Appl Clin Inform 2016; 7 (04) 1051-1068
  • 9 Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH. Automatic identification of critical follow-up recommendation sentences in radiology reports. AMIA Annual Symposium. Proceedings of the AMIA Symposium; 2011:1593–1602
  • 10 Zech J, Pain M, Titano J. , et al. Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 2018; 287 (02) 570-580
  • 11 Yetisgen M, Klassen P, McCarthy L, Pellicer E, Payne T, Gunn M. Annotation of clinically important follow-up recommendations in radiology reports. In: Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis; 2015 :50–54
  • 12 Ware M, Frank E, Holmes G, Hall MA, Witten IH. Interactive machine learning: letting users build classifiers. Int J Hum Comput Stud 2001; 55: 281-292
  • 13 Fails JA, Olsen Jr DR. Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces; 2003 :39–45
  • 14 Amershi S, Fogarty J, Kapoor A, Tan D. Effective end-user interaction with machine learning. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence; 2011:1529–1532
  • 15 Amershi S, Cakmak M, Knox WB, Kulesza T. Power to the people: the role of humans in interactive machine learning. AI Mag 2014; 35 (04) 105-120
  • 16 Boukhelifa N, Bezerianos A, Lutton E. Evaluation of Interactive Machine Learning Systems. Human and Machine Learning, 2018
  • 17 Gobbel GT, Garvin J, Reeves R. , et al. Assisted annotation of medical free text using RapTAT. J Am Med Inform Assoc 2014; 21 (05) 833-841
  • 18 Gobbel GT, Reeves R, Jayaramaraja S. , et al. Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J Biomed Inform 2014; 48: 54-65
  • 19 Mayfield E, Rosé CP. LightSIDE: Open source machine learning for text. In Handbook of Automated Essay Evaluation, 2013;146–157. Routledge
  • 20 Soysal E, Wang J, Jiang M. , et al. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc 2017; ocx132
  • 21 D'Avolio LW, Nguyen TM, Goryachev S, Fiore LD. Automated concept-level information extraction to reduce the need for custom software and rules development. J Am Med Inform Assoc 2011; 18 (05) 607-613
  • 22 Ogren PV. Knowtator: a protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Morristown, NJ, USA; 2006 :273–275
  • 23 Savova GK, Masanz JJ, Ogren PV. , et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010; 17 (05) 507-513
  • 24 Malmasi S, Sandor NL, Hosomura N, Goldberg M, Skentzos S, Turchin A. Canary: an NLP platform for clinicians and researchers. Appl Clin Inform 2017; 8 (02) 447-453
  • 25 Osborne JD, Wyatt M, Westfall AO, Willig J, Bethard S, Gordon G. Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning. J Am Med Inform Assoc 2016; 23 (06) 1077-1084
  • 26 Chau DH, Kittur A, Hong JI, Faloutsos C. Apolo: making sense of large network data by combining rich user interaction and machine learning. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA; 2011 :167–176
  • 27 Heimerl F, Koch S, Bosch H, Ertl T. Visual classifier training for text document retrieval. IEEE Trans Vis Comput Graph 2012; 18 (12) 2839-2848
  • 28 Kulesza T, Burnett M, Wong W-K, Stumpf S. Principles of explanatory debugging to personalize interactive machine learning. In: Proceedings of the 20th International Conference on Intelligent User Interfaces, New York, NY, USA; 2015 :126–137
  • 29 Trivedi G, Pham P, Chapman WW, Hwa R, Wiebe J, Hochheiser H. NLPReViz: an interactive tool for natural language processing on clinical text. J Am Med Inform Assoc 2018; 25 (01) 81-87
  • 30 Choo J, Lee C, Reddy CK, Park H. UTOPIAN: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans Vis Comput Graph 2013; 19 (12) 1992-2001
  • 31 Chuang J, Ramage D, Manning CD, Heer J. Interpretation and trust: designing model-driven visualizations for text analysis. In: ACM Human Factors in Computing Systems (CHI); 2012
  • 32 Wang Y, Zheng K, Xu H, Mei Q. Interactive medical word sense disambiguation through informed learning. J Am Med Inform Assoc 2018; 25 (07) 800-808
  • 33 Cakmak M, Thomaz AL. Optimality of human teachers for robot learners. In: 2010 IEEE 9th International Conference on Development and Learning; 2010 :64–69
  • 34 Gupta D, Saul M, Gilbertson J. Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol 2004; 121 (02) 176-186
  • 35 Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960; 20 (01) 37-46
  • 36 Honnibal M, Johnson M. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal; 2015 :1373–1378
  • 37 Zaidan OF, Eisner J. Using “annotator rationales” to improve machine learning for text categorization. In: In NAACL-HLT; 2007. :260–267
  • 38 Trivedi G, Hong C, Dadashzadeh ER, Handzel RM, Hochheiser H, Visweswaran S. Identifying incidental findings from radiology reports of trauma patients: an evaluation of automated feature representation methods. Int J Med Inform 2019; 129: 81-87
  • 39 Pedregosa F, Varoquaux G, Gramfort A. , et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12 (Oct): 2825-2830
  • 40 Fiebrink R, Cook PR, Trueman D. Human model evaluation in interactive supervised learning. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA; 2011 :147–156
  • 41 Friedman CP, Wyatt JC. Evaluation Methods in Biomedical Informatics (Health Informatics). Secaucus, NJ: Springer-Verlag New York, Inc.; 2005
  • 42 Brooke J. SUS: a quick and dirty usability scale. In: Jordan PW, Weerdmeester B, Thomas A, Mclelland IL. , eds. Usability Evaluation in Industry. London: Taylor and Francis; 1996
  • 43 Sauro J. A Practical Guide to the System Usability Scale: Background, Benchmarks and Best Practices. Denver, CO: CreateSpace; 2011
  • 44 Perri-Moore S, Kapsandoy S, Doyon K. , et al. Automated alerts and reminders targeting patients: a review of the literature. Patient Educ Couns 2016; 99 (06) 953-959
  • 45 Xu Y, Tsujii J, Chang EI-C. Named entity recognition of follow-up and time information in 20,000 radiology reports. J Am Med Inform Assoc 2012; 19 (05) 792-799
  • 46 Jenniskens K, de Groot JAH, Reitsma JB, Moons KGM, Hooft L, Naaktgeboren CA. Overdiagnosis across medical disciplines: a scoping review. BMJ Open 2017; 7 (12) e018448
  • 47 Amershi S, Weld D, Vorvoreanu M. , et al. Guidelines for Human-AI Interaction. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; New York, NY, 2019:3–13
  • 48 Heer J. Agency plus automation: designing artificial intelligence into interactive systems. Proc Natl Acad Sci U S A 2019; 116 (06) 1844-1850
  • 49 Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med 2019; 380 (14) 1347-1358
  • 50 Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001; 34 (05) 301-310