Summary
Objectives We examine recent published research on the extraction of information from textual
documents in the Electronic Health Record (EHR).
Methods Literature review of the research published after 1995, based on PubMed, conference
proceedings, and the ACM Digital Library, as well as on relevant publications referenced
in papers already included.
Results 174 publications were selected and are discussed in this review in terms of methods
used, pre-processing of textual documents, contextual features detection and analysis,
extraction of information in general, extraction of codes and of information for decision-support
and enrichment of the EHR, information extraction for surveillance, research, automated
terminology management, and data mining, and de-identification of clinical text.
Conclusions Performance of information extraction systems with clinical text has improved since
the last systematic review in 1995, but they are still rarely applied outside of the
laboratory they have been developed in. Competitive challenges for information extraction
from clinical text, along with the availability of annotated clinical text corpora,
and further improvements in system performance are important factors to stimulate
advances in this field and to increase the acceptance and usage of these systems in
concrete clinical and biomedical research contexts.
Keywords
Electronic health record - natural language processing - information extraction -
text mining - state-of-the-art review