Summary
Objectives: To review the latest scientific challenges organized in clinical Natural Language
Processing (NLP) by highlighting the tasks, the most effective methodologies used,
the data, and the sharing strategies.
Methods: We harvested the literature by using Google Scholar and PubMed Central to retrieve
all shared tasks organized since 2015 on clinical NLP problems on English data.
Results: We surveyed 17 shared tasks. We grouped the data into four types (synthetic, drug
labels, social data, and clinical data) which are correlated with size and sensitivity.
We found named entity recognition and classification to be the most common tasks.
Most of the methods used to tackle the shared tasks have been data-driven. There is
homogeneity in the methods used to tackle the named entity recognition tasks, while
more diverse solutions are investigated for relation extraction, multi-class classification,
and information retrieval problems.
Conclusions: There is a clear trend in using data-driven methods to tackle problems in clinical
NLP. The availability of more and varied data from different institutions will undoubtedly
lead to bigger advances in the field, for the benefit of healthcare as a whole.
Keywords
Clinical natural language processing - shared tasks - scientific challenges - survey