Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Michele Filannino; Özlem Uzuner

doi:10.1055/s-0038-1667079

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00034612.xml

Download PDF

CC BY-NC-ND 4.0 · Yearb Med Inform 2018; 27(01): 184-192
DOI: 10.1055/s-0038-1667079

Section 9: Natural Language Processing

Survey

Georg Thieme Verlag KG Stuttgart

Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Authors

Michele Filannino

¹George Mason University, Fairfax, VA, USA

²Massachusetts Institute of Technology, Cambridge, MA, USA
Özlem Uzuner

¹George Mason University, Fairfax, VA, USA

²Massachusetts Institute of Technology, Cambridge, MA, USA

Further Information

Publication History

Publication Date:
29 August 2018 (online)

Also available at

Permissions and Reprints

Summary

Objectives: To review the latest scientific challenges organized in clinical Natural Language Processing (NLP) by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.

Methods: We harvested the literature by using Google Scholar and PubMed Central to retrieve all shared tasks organized since 2015 on clinical NLP problems on English data.

Results: We surveyed 17 shared tasks. We grouped the data into four types (synthetic, drug labels, social data, and clinical data) which are correlated with size and sensitivity. We found named entity recognition and classification to be the most common tasks. Most of the methods used to tackle the shared tasks have been data-driven. There is homogeneity in the methods used to tackle the named entity recognition tasks, while more diverse solutions are investigated for relation extraction, multi-class classification, and information retrieval problems.

Conclusions: There is a clear trend in using data-driven methods to tackle problems in clinical NLP. The availability of more and varied data from different institutions will undoubtedly lead to bigger advances in the field, for the benefit of healthcare as a whole.

Keywords

Clinical natural language processing - shared tasks - scientific challenges - survey

References
1 Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc 2011; Sep 1; 18 (05) 539

Reference Link Ris
Crossref PubMed Search in Google Scholar
2 Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner Ö. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011; Sep; 18 (05) 540-3

Reference Link Ris
Crossref PubMed Search in Google Scholar
3 Nissim M, Abzianidze L, Evang K, van der Goot R, Haagsma H, Plank B. , et al. Last Words: Sharing is Caring: The Future of Shared Tasks. Computational Linguistics 2017; 43 (04) 897-904

Reference Link Ris
Crossref PubMed Search in Google Scholar
4 Lluch M. Healthcare professionals’ organizational barriers to health information technologies — A literature review. Int J Med Inform 2011; Dec 31; 80 (12) 849-62

Reference Link Ris
Crossref PubMed Search in Google Scholar
5 Dwyer 3rd SJ, Weaver AC, Hughes KK. Health insurance portability and accountability act. Security Issues in the Digital Medical Enterprise 2004; Apr; 72 (02) 9-18

Reference Link Ris
PubMed Search in Google Scholar
6 Styler 4th WF, Bethard S, Finan S, Palmer M, Pradhan S, de Groen PC. , et al. Temporal annotation in the clinical domain. Trans Assoc Comput Linguist 2014; Apr 30; 2: 143-154 ( http://aclweb.org/anthology/Q/Q14/Q14-1012.pdf )

Reference Link Ris
Crossref PubMed Search in Google Scholar
7 Velupillai S, Mowery D, South BR, Kvist M, Dalianis H. Recent advances in clinical natural language processing in support of semantic analysis. Yearb Med Inform 2015; 10 (01) 183-93

Reference Link Ris
Thieme Connect PubMed Search in Google Scholar
8 Huang CC, Lu Z. Community challenges in Biomedical Text Mining over 10 years: success, failure and the future. Brief Bioinform 2015; May 1; 17 (01) 132-44

Reference Link Ris
PubMed Search in Google Scholar
9 Roberts K, Simpson MS, Voorhees EM, Hersh WR. Overview of the TREC 2015 Clinical Decision Support Track. In: Proceedings of the 2015 Text Retrieval Conference

Reference Link Ris
PubMed
10 Song Y, He Y, Hu Q, He L. Ecnu at 2015 CDS track: Two re-ranking methods in medical information retrieval. In: Proceedings of the 2015 Text Retrieval Conference 2015

Reference Link Ris
PubMed
11 Roberts K, Demner-Fushman D, Voorhees EM, Hersh WR, Bedrick S, Lazar AJ. Overview of the TREC 2017 precision medicine track. TREC, Gaithersburg, MD; 2017

Reference Link Ris
PubMed
12 Kelly L, Goeuriot L, Suominen H, Névéol A, Palotti J, Zuccon G. Overview of the CLEF eHealth evaluation lab 2016. In International Conference of the Cross-Language Evaluation Forum for European Languages 2016 Sep 5. Springer International Publishing; 2016. . p. 255–66

Reference Link Ris
Search in Google Scholar
13 Goeuriot L, Kelly L, Suominen H, Hanlen L, Névéol A, Grouin C. , et al. Overview of the CLEF eHealth evaluation lab 2015. In: International Conference of the Cross-Language Evaluation Forum for European Languages 2015 Sep 8. Cham: Springer; 2015. . p. 429–43

Reference Link Ris
Search in Google Scholar
14 Goeuriot L, Kelly L, Suominen H, Névéol A, Robert A, Kanoulas E. , et al. CLEF 2017 eHealth evaluation lab overview. In: International Conference of the Cross-Language Evaluation Forum for European Languages 2017 Sep 11. Cham: Springer; 2017. p. 291–303

Reference Link Ris
Search in Google Scholar
15 Suominen H, Zhou L, Hanlen L, Ferraro G. Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations. JMIR Med Inform 2015; Apr; 3 (02) e19

Reference Link Ris
Crossref PubMed Search in Google Scholar
16 Suominen H, Zhou L, Goeuriot L, Kelly L. Task 1 of the CLEF eHealth Evaluation Lab 2016: Handover Information Extraction. In CLEF (Working Notes) 2016 Sep. p. 1–14

Reference Link Ris
PubMed
17 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; Jan 1; 32 (Database issue): D267-70

Reference Link Ris
Crossref PubMed Search in Google Scholar
18 Ebersbach M, Herms R, Lohr C, Eibl M. Wrappers for Feature Subset Selection in CRF-based Clinical Information Extraction. In CLEF (Working Notes) 2016; p. 69–80

Reference Link Ris
PubMed
19 Roberts K, Demner-Fushman D, Tonning JM. Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track. Proceedings of the Text Analysis Conference; 2017

Reference Link Ris
PubMed
20 Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Saf 1999; Feb 1; 20 (02) 109-17

Reference Link Ris
Crossref PubMed Search in Google Scholar
21 Gross R, Acquisti A. Information revelation and privacy in online social networks. In: Proceedings of the 2005 ACM workshop on Privacy in the electronic society 2005 Nov 7. ACM; 2005. p 71–80)

Reference Link Ris
PubMed
22 Zimmer M. “But the data is already public”: on the ethics of research in Facebook. Ethics Inf Technol 2010; Dec 1; 12 (04) 313-25

Reference Link Ris
Crossref PubMed Search in Google Scholar
23 Coppersmith G, Dredze M, Harman C, Hollingshead K, Mitchell M. CLPsych 2015 Shared Task: Depression and PTSD on Twitter. In: CLPsych@ HLT-NAACL 2015 Jun 5. p. 31–9. ( http://www.aclweb.org/anthology/W15-1204 )

Reference Link Ris
PubMed
24 Resnik P, Armstrong W, Claudino L, Nguyen T. The University of Maryland CLPsych 2015 shared task system. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality 2015. p. 54–60. ( http://www.aclweb.org/anthology/W15-1207 )

Reference Link Ris
PubMed
25 Sarker A, Nikfarjam A, Gonzalez G. Social Media Mining shared task workshop. In: Biocomputing 2016: Proceedings of the Pacific Symposium 2016. p. 581–92

Reference Link Ris
PubMed
26 Sarker A, Ginn R, Nikfarjam A, O’Connor K, Smith K, Jayaraman S. , et al. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform 2015; Apr 30; 54: 202-12

Reference Link Ris
Crossref PubMed Search in Google Scholar
27 Rastegar-Mojarad MA, Elayavilli RK, Yu Y, Liu H. Detecting signals in noisy data-can ensemble classifiers help identify adverse drug reaction in tweets. In: Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing 2016

Reference Link Ris
PubMed
28 Wang CK, Singh ON, Dai HJ, Jonnagaddala JI, Jue TR, Iqbal US. , et al. NTTMUNSW system for adverse drug reactions extraction in Twitter data. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Big Island, HI, USA 2016 Jan. p. 4–8

Reference Link Ris
PubMed
29 Sarker A, Gonzalez-Hernandez G. Overview of the Second Social Media Mining for Health (SMM4H) Shared Tasks at AMIA 2017. In: Proceedings of the 2nd Social Media Mining for Health Research and Applications Workshop;1(10,822):1239

Reference Link Ris
PubMed
30 Kiritchenko S, Mohammad SM, Jason Morin JC, de Bruijn B. NRC-Canada at SMM4H Shared Task: Classifying Tweets Mentioning Adverse Drug Reactions and Medication Intake. In: Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory; 2017

Reference Link Ris
PubMed
31 Friedrichs J, Mahata D, Gupta S. InfyNLP at SMM4H Task 2: Stacked Ensemble of Shallow Convolutional Neural Networks for Identifying Personal Medication Intake from Twitter. In: Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory; 2017

Reference Link Ris
PubMed
32 Belousov M, Dixon W, Nenadic G. Using an Ensemble of Generalised Linear and Deep Learning Models in the SMM4H 2017 Medical Concept Normalisation Task. In: Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory; 2017

Reference Link Ris
PubMed
33 Milne DN, Pink G, Hachey B, Calvo RA. CLPsych 2016 Shared Task: Triaging content in online peer-support forums. In CL Psych@ HLT-NAACL 2016. p. 118–27. ( http://www.aclweb.org/anthology/W16-0312 )

Reference Link Ris
PubMed
34 Mac Kim S, Wang Y, Wan S, Paris C. Data61- CSIRO systems at the CLPsych 2016 Shared Task. In CLPsych@ HLT-NAACL 2016. p. 128–32. ( http://www.aclweb.org/anthology/W16-0313 )

Reference Link Ris
PubMed
35 Hollingshead K, Ireland ME, Loveys K. Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology–From Linguistic Signal to Clinical Reality; 2017

Reference Link Ris
PubMed
36 Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E. Overview of the NTCIR-13: Medweb task. In Proceedings of the NTCIR-13 Conference; 2017

Reference Link Ris
PubMed
37 Iso H, Ruiz C, Murayama T, Taguchi K, Takeuchi R, Yamamoto H. , et al. NTCIR-13 MedWeb Task: Multi-label Classification of Tweets using an Ensemble of Neural Networks. In Proceedings of the NTCIR-13 Conference 2017

Reference Link Ris
PubMed
38 Saeed M, Villarroel M, Reisner AT, Clifford G, Lehman LW, Moody G. , et al. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database. Crit Care Med 2011; May; 39 (05) 952

Reference Link Ris
Crossref PubMed Search in Google Scholar
39 Elhadad N, Pradhan S, Gorman SL, Manandhar S, Chapman WW, Savova GK. SemEval-2015 Task 14: Analysis of Clinical Text. In SemEval@ NAACL-HLT 2015 Jun 4 (pp. 303-310). ( http://aclweb.org/anthology/S/S15/S15-2051.pdf )

Reference Link Ris
PubMed
40 Mowery DL, Velupillai S, South BR, Christensen L, Martinez D, Kelly L. , et al. Task 2: ShARe/CLEF eHealth evaluation lab 2014. In: Proceedings of CLEF: 2014

Reference Link Ris
PubMed
41 Pathak P, Patel P, Panchal V, Soni S, Dani K, Patel A, Choudhary N. ezDI: A Supervised NLP System for Clinical Narrative Analysis. In: SemEval@ NAACL-HLT 2015 Jun 4. p. 412–6. ( http://aclweb.org/anthology/S/S15/S15-2071.pdf )

Reference Link Ris
PubMed
42 Xu J, Zhang Y, Wang J, Wu Y, Jiang M, Soysal E. , et al. UTH-CCB: The Participation of the SemEval 2015 Challenge-Task 14. In: SemEval@NAACL- HLT 2015 Jun 4. p. 311-4. ( http://aclweb.org/anthology/S/S15/S15-2052.pdf )

Reference Link Ris
PubMed
43 Roberts K, Demner-Fushman D, Voorhees E, Hersh W. Overview of the TREC 2016 Clinical Decision Support Track. In: Proceedings of the Twenty-Five Text RE trieval Conference (TREC 2016), Nov 2016, Gaithersburg, United States

Reference Link Ris
PubMed
44 Simpson MS, Voorhees EM, Hersh W. Overview of the TREC 2014 Clinical Decision Support Track. In: Proceedings of the 2014 Text Retrieval Conference

Reference Link Ris
PubMed
45 Bethard S, Derczynski L, Savova G, Pustejovsky J, Verhagen M. SemEval-2015 Task 6: Clinical TempEval. In: SemEval@NAACL-HLT 2015 Jun 4. p. 806–14. ( http://aclweb.org/anthology/S/S15/S15-2136.pdf )

Reference Link Ris
PubMed
46 Bethard S, Savova G, Chen WT, Derczynski L, Pustejovsky J, Verhagen M. Semeval-2016 Task 12: Clinical TempEval. Proceedings of the 10th International Workshop on Semantic Evaluations (SemEval-2016); 2016. p. 1052–62. ( http://www.aclweb.org/anthology/S16-1165 )

Reference Link Ris
PubMed
47 Lee HJ, Xu H, Wang J, Zhang Y, Moon S, Xu J. , et al. UTHealth at SemEval-2016 Task 12: an End-to- End System for Temporal Information Extraction from Clinical Notes. In: SemEval@ NAACL-HLT 2016. p. 1292–17. ( http://www.aclweb.org/anthology/S16-1201 )

Reference Link Ris
PubMed
48 Bethard S, Savova G, Palmer M, Pustejovsky J. SemEval-2017 Task 12: Clinical TempEval. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017); 2017. p. 565–72. ( http://aclweb.org/anthology/S17-2000 )

Reference Link Ris
PubMed
49 MacAvaney S, Cohan A, Goharian N. GUIR at SemEval- 2017 Task 12: A Framework for Cross-Domain Clinical Temporal Information Extraction. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017. p. 1024–9. ( http://www.aclweb.org/anthology/S17-2180 )

Reference Link Ris
PubMed
50 Tourille J, Ferret O, Névéol A, Tannier X. LIMSI- COT at SemEval-2016 Task 12: Temporal relation identification using a pipeline of classifiers. In: SemEval@NAACL-HLT 2016. p. 1136–42. ( http://www.aclweb.org/anthology/S16-1175 )

Reference Link Ris
PubMed
51 Uzuner Ö, Stubbs A, Filannino M. A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry. J Biomed Inform 2017; Oct 16; 75: S1-S3 . ( https://doi.org/10.1016/j.jbi.2017.10.005 )

Reference Link Ris
Crossref PubMed Search in Google Scholar
52 Stubbs A, Filannino M, Uzuner Ö. De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID Shared Tasks Track 1. J Biomed Inform 2017; Nov; 75S: S4-S18

Reference Link Ris
PubMed Search in Google Scholar
53 Filannino M, Stubbs A, Uzuner Ö. Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID Shared Tasks Track 2. J Biomed Inform 2017; Nov; 75S: S62-S70

Reference Link Ris
PubMed Search in Google Scholar
54 Uzuner Ö, Luo Y, Szolovits P. Evaluating the stateof- the-art in automatic de-identification. J Am Med Inform Assoc 2007; Sep 1; 14 (05) 550-63

Reference Link Ris
Crossref PubMed Search in Google Scholar
55 Jiang Z, Zhao C, He B, Guan Y, Jiang J. De-identification of medical records using Conditional Random Fields and Long Short-Term Memory networks. J Biomed Inform 2017; Nov; 75S: S43-S53

Reference Link Ris
PubMed Search in Google Scholar
56 Clements D, Dault M, Priest A. Effective teamwork in healthcare: research and reality. Healthc Pap 2007; 7 (I): 26

Reference Link Ris
PubMed Search in Google Scholar

Related Journals

Subscribe to RSS

Share / Bookmark

Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Authors

Publication History

Summary

Keywords

References