CC BY-NC-ND 4.0 · Appl Clin Inform
DOI: 10.1055/a-2282-4340
Research Article

Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes

Fangyi Chen
1   Biomedical Informatics, Columbia University, New York, United States (Ringgold ID: RIN5798)
Syed Mohtashim Abbas Bokhari
2   Biomedical Informatics, Columbia University, New York, United States (Ringgold ID: RIN5798)
Kenrick Cato
3   University of Pennsylvania School of Nursing, Philadelphia, United States (Ringgold ID: RIN16142)
Gamze Gürsoy
4   Biomedical Informatics, Columbia University, New York, United States (Ringgold ID: RIN5798)
Sarah Collins Rossetti
5   Department of Biomedical Informatics, Columbia University, New York, United States
6   School of Nursing, Columbia University, New York, United States
› Author Affiliations
Supported by: National Institute of Nursing Research 1R01NR016941

Narrative nursing notes are a valuable resource in informatics research with unique predictive signals about patient care. The open sharing of these data, however, is appropriately constrained by rigorous regulations set by the Health Insurance Portability and Accountability Act (HIPAA) for the protection of privacy. Several models have been developed and evaluated on the open-source i2b2 dataset. A focus on the generalizability of these models with respect to nursing notes remains understudied. The study aims to understand the generalizability of pre-trained transformer models and investigate the variability of personal protected health information (PHI) distribution patterns between discharge summaries and nursing notes with a goal to inform the future design for model evaluation schema. Two pre-trained transformer models (RoBERTa, ClinicalBERT) fine-tuned on i2b2 2014 discharge summaries were evaluated on our data inpatient nursing notes and compared with the baseline performance. Statistical testing was deployed to assess differences in PHI distribution across discharge summaries and nursing notes. RoBERTa achieved the optimal performance when tested on an external source of data, with a F1 score of 0.887 across PHI categories and 0.932 in the PHI binary task. Overall, discharge summaries contained a higher number of PHI instances and categories of PHI compared to inpatient nursing notes. The study investigated the applicability of two pre-trained transformers on inpatient nursing notes and examined the distinctions between nursing notes and discharge summaries concerning the utilization of personal protected health information. Discharge summaries presented a greater quantity of PHI instances and types when compared to narrative nursing notes, but narrative nursing notes exhibited more diversity in the types of PHI present, with some pertaining to patient’s personal life. The insights obtained from the research help improve the design and selection of algorithms, as well as contribute to the development of suitable performance thresholds for PHI.

Publication History

Received: 17 October 2023

Accepted after revision: 08 January 2024

Accepted Manuscript online:
06 March 2024

© . The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial-License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany