Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Michele Filannino; Özlem Uzuner

doi:10.1055/s-0038-1667079

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00034612.xml

Download PDF

CC BY-NC-ND 4.0 · Yearb Med Inform 2018; 27(01): 184-192
DOI: 10.1055/s-0038-1667079

Section 9: Natural Language Processing

Survey

Georg Thieme Verlag KG Stuttgart

Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Authors

Michele Filannino

¹George Mason University, Fairfax, VA, USA

²Massachusetts Institute of Technology, Cambridge, MA, USA
Özlem Uzuner

¹George Mason University, Fairfax, VA, USA

²Massachusetts Institute of Technology, Cambridge, MA, USA

Further Information

Publication History

Publication Date:
29 August 2018 (online)

Also available at

Permissions and Reprints

Summary

Objectives: To review the latest scientific challenges organized in clinical Natural Language Processing (NLP) by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.

Methods: We harvested the literature by using Google Scholar and PubMed Central to retrieve all shared tasks organized since 2015 on clinical NLP problems on English data.

Results: We surveyed 17 shared tasks. We grouped the data into four types (synthetic, drug labels, social data, and clinical data) which are correlated with size and sensitivity. We found named entity recognition and classification to be the most common tasks. Most of the methods used to tackle the shared tasks have been data-driven. There is homogeneity in the methods used to tackle the named entity recognition tasks, while more diverse solutions are investigated for relation extraction, multi-class classification, and information retrieval problems.

Conclusions: There is a clear trend in using data-driven methods to tackle problems in clinical NLP. The availability of more and varied data from different institutions will undoubtedly lead to bigger advances in the field, for the benefit of healthcare as a whole.

Keywords

Clinical natural language processing - shared tasks - scientific challenges - survey

References
1 Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc 2011; Sep 1; 18 (05) 539

Crossref Search in Google Scholar
Download RIS citation
2 Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner Ö. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011; Sep; 18 (05) 540-3

Crossref Search in Google Scholar
Download RIS citation
3 Nissim M, Abzianidze L, Evang K, van der Goot R, Haagsma H, Plank B. , et al. Last Words: Sharing is Caring: The Future of Shared Tasks. Computational Linguistics 2017; 43 (04) 897-904

Crossref Search in Google Scholar
Download RIS citation
4 Lluch M. Healthcare professionals’ organizational barriers to health information technologies — A literature review. Int J Med Inform 2011; Dec 31; 80 (12) 849-62

Crossref Search in Google Scholar
Download RIS citation
5 Dwyer 3rd SJ, Weaver AC, Hughes KK. Health insurance portability and accountability act. Security Issues in the Digital Medical Enterprise 2004; Apr; 72 (02) 9-18

Search in Google Scholar
Download RIS citation
6 Styler 4th WF, Bethard S, Finan S, Palmer M, Pradhan S, de Groen PC. , et al. Temporal annotation in the clinical domain. Trans Assoc Comput Linguist 2014; Apr 30; 2: 143-154 ( http://aclweb.org/anthology/Q/Q14/Q14-1012.pdf )

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Velupillai S, Mowery D, South BR, Kvist M, Dalianis H. Recent advances in clinical natural language processing in support of semantic analysis. Yearb Med Inform 2015; 10 (01) 183-93

Thieme Connect Search in Google Scholar
Download RIS citation
8 Huang CC, Lu Z. Community challenges in Biomedical Text Mining over 10 years: success, failure and the future. Brief Bioinform 2015; May 1; 17 (01) 132-44

Search in Google Scholar
Download RIS citation
9 Roberts K, Simpson MS, Voorhees EM, Hersh WR. Overview of the TREC 2015 Clinical Decision Support Track. In: Proceedings of the 2015 Text Retrieval Conference

Download RIS citation
10 Song Y, He Y, Hu Q, He L. Ecnu at 2015 CDS track: Two re-ranking methods in medical information retrieval. In: Proceedings of the 2015 Text Retrieval Conference 2015

Download RIS citation
11 Roberts K, Demner-Fushman D, Voorhees EM, Hersh WR, Bedrick S, Lazar AJ. Overview of the TREC 2017 precision medicine track. TREC, Gaithersburg, MD; 2017

Download RIS citation
12 Kelly L, Goeuriot L, Suominen H, Névéol A, Palotti J, Zuccon G. Overview of the CLEF eHealth evaluation lab 2016. In International Conference of the Cross-Language Evaluation Forum for European Languages 2016 Sep 5. Springer International Publishing; 2016. . p. 255–66

Search in Google Scholar
Download RIS citation
13 Goeuriot L, Kelly L, Suominen H, Hanlen L, Névéol A, Grouin C. , et al. Overview of the CLEF eHealth evaluation lab 2015. In: International Conference of the Cross-Language Evaluation Forum for European Languages 2015 Sep 8. Cham: Springer; 2015. . p. 429–43

Search in Google Scholar
Download RIS citation
14 Goeuriot L, Kelly L, Suominen H, Névéol A, Robert A, Kanoulas E. , et al. CLEF 2017 eHealth evaluation lab overview. In: International Conference of the Cross-Language Evaluation Forum for European Languages 2017 Sep 11. Cham: Springer; 2017. p. 291–303

Search in Google Scholar
Download RIS citation
15 Suominen H, Zhou L, Hanlen L, Ferraro G. Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations. JMIR Med Inform 2015; Apr; 3 (02) e19

Crossref Search in Google Scholar
Download RIS citation
16 Suominen H, Zhou L, Goeuriot L, Kelly L. Task 1 of the CLEF eHealth Evaluation Lab 2016: Handover Information Extraction. In CLEF (Working Notes) 2016 Sep. p. 1–14

Download RIS citation
17 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; Jan 1; 32 (Database issue): D267-70

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Ebersbach M, Herms R, Lohr C, Eibl M. Wrappers for Feature Subset Selection in CRF-based Clinical Information Extraction. In CLEF (Working Notes) 2016; p. 69–80

Download RIS citation
19 Roberts K, Demner-Fushman D, Tonning JM. Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track. Proceedings of the Text Analysis Conference; 2017

Download RIS citation
20 Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Saf 1999; Feb 1; 20 (02) 109-17

Crossref PubMed Search in Google Scholar
Download RIS citation
21 Gross R, Acquisti A. Information revelation and privacy in online social networks. In: Proceedings of the 2005 ACM workshop on Privacy in the electronic society 2005 Nov 7. ACM; 2005. p 71–80)

Download RIS citation
22 Zimmer M. “But the data is already public”: on the ethics of research in Facebook. Ethics Inf Technol 2010; Dec 1; 12 (04) 313-25

Crossref Search in Google Scholar
Download RIS citation
23 Coppersmith G, Dredze M, Harman C, Hollingshead K, Mitchell M. CLPsych 2015 Shared Task: Depression and PTSD on Twitter. In: CLPsych@ HLT-NAACL 2015 Jun 5. p. 31–9. ( http://www.aclweb.org/anthology/W15-1204 )

Download RIS citation
24 Resnik P, Armstrong W, Claudino L, Nguyen T. The University of Maryland CLPsych 2015 shared task system. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality 2015. p. 54–60. ( http://www.aclweb.org/anthology/W15-1207 )

Download RIS citation
25 Sarker A, Nikfarjam A, Gonzalez G. Social Media Mining shared task workshop. In: Biocomputing 2016: Proceedings of the Pacific Symposium 2016. p. 581–92

Download RIS citation
26 Sarker A, Ginn R, Nikfarjam A, O’Connor K, Smith K, Jayaraman S. , et al. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform 2015; Apr 30; 54: 202-12

Crossref Search in Google Scholar
Download RIS citation
27 Rastegar-Mojarad MA, Elayavilli RK, Yu Y, Liu H. Detecting signals in noisy data-can ensemble classifiers help identify adverse drug reaction in tweets. In: Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing 2016

Download RIS citation
28 Wang CK, Singh ON, Dai HJ, Jonnagaddala JI, Jue TR, Iqbal US. , et al. NTTMUNSW system for adverse drug reactions extraction in Twitter data. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Big Island, HI, USA 2016 Jan. p. 4–8

Download RIS citation
29 Sarker A, Gonzalez-Hernandez G. Overview of the Second Social Media Mining for Health (SMM4H) Shared Tasks at AMIA 2017. In: Proceedings of the 2nd Social Media Mining for Health Research and Applications Workshop;1(10,822):1239

Download RIS citation
30 Kiritchenko S, Mohammad SM, Jason Morin JC, de Bruijn B. NRC-Canada at SMM4H Shared Task: Classifying Tweets Mentioning Adverse Drug Reactions and Medication Intake. In: Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory; 2017

Download RIS citation
31 Friedrichs J, Mahata D, Gupta S. InfyNLP at SMM4H Task 2: Stacked Ensemble of Shallow Convolutional Neural Networks for Identifying Personal Medication Intake from Twitter. In: Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory; 2017

Download RIS citation
32 Belousov M, Dixon W, Nenadic G. Using an Ensemble of Generalised Linear and Deep Learning Models in the SMM4H 2017 Medical Concept Normalisation Task. In: Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory; 2017

Download RIS citation
33 Milne DN, Pink G, Hachey B, Calvo RA. CLPsych 2016 Shared Task: Triaging content in online peer-support forums. In CL Psych@ HLT-NAACL 2016. p. 118–27. ( http://www.aclweb.org/anthology/W16-0312 )

Download RIS citation
34 Mac Kim S, Wang Y, Wan S, Paris C. Data61- CSIRO systems at the CLPsych 2016 Shared Task. In CLPsych@ HLT-NAACL 2016. p. 128–32. ( http://www.aclweb.org/anthology/W16-0313 )

Download RIS citation
35 Hollingshead K, Ireland ME, Loveys K. Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology–From Linguistic Signal to Clinical Reality; 2017

Download RIS citation
36 Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E. Overview of the NTCIR-13: Medweb task. In Proceedings of the NTCIR-13 Conference; 2017

Download RIS citation
37 Iso H, Ruiz C, Murayama T, Taguchi K, Takeuchi R, Yamamoto H. , et al. NTCIR-13 MedWeb Task: Multi-label Classification of Tweets using an Ensemble of Neural Networks. In Proceedings of the NTCIR-13 Conference 2017

Download RIS citation
38 Saeed M, Villarroel M, Reisner AT, Clifford G, Lehman LW, Moody G. , et al. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database. Crit Care Med 2011; May; 39 (05) 952

Crossref Search in Google Scholar
Download RIS citation
39 Elhadad N, Pradhan S, Gorman SL, Manandhar S, Chapman WW, Savova GK. SemEval-2015 Task 14: Analysis of Clinical Text. In SemEval@ NAACL-HLT 2015 Jun 4 (pp. 303-310). ( http://aclweb.org/anthology/S/S15/S15-2051.pdf )

Download RIS citation
40 Mowery DL, Velupillai S, South BR, Christensen L, Martinez D, Kelly L. , et al. Task 2: ShARe/CLEF eHealth evaluation lab 2014. In: Proceedings of CLEF: 2014

Download RIS citation
41 Pathak P, Patel P, Panchal V, Soni S, Dani K, Patel A, Choudhary N. ezDI: A Supervised NLP System for Clinical Narrative Analysis. In: SemEval@ NAACL-HLT 2015 Jun 4. p. 412–6. ( http://aclweb.org/anthology/S/S15/S15-2071.pdf )

Download RIS citation
42 Xu J, Zhang Y, Wang J, Wu Y, Jiang M, Soysal E. , et al. UTH-CCB: The Participation of the SemEval 2015 Challenge-Task 14. In: SemEval@NAACL- HLT 2015 Jun 4. p. 311-4. ( http://aclweb.org/anthology/S/S15/S15-2052.pdf )

Download RIS citation
43 Roberts K, Demner-Fushman D, Voorhees E, Hersh W. Overview of the TREC 2016 Clinical Decision Support Track. In: Proceedings of the Twenty-Five Text RE trieval Conference (TREC 2016), Nov 2016, Gaithersburg, United States

Download RIS citation
44 Simpson MS, Voorhees EM, Hersh W. Overview of the TREC 2014 Clinical Decision Support Track. In: Proceedings of the 2014 Text Retrieval Conference

Download RIS citation
45 Bethard S, Derczynski L, Savova G, Pustejovsky J, Verhagen M. SemEval-2015 Task 6: Clinical TempEval. In: SemEval@NAACL-HLT 2015 Jun 4. p. 806–14. ( http://aclweb.org/anthology/S/S15/S15-2136.pdf )

Download RIS citation
46 Bethard S, Savova G, Chen WT, Derczynski L, Pustejovsky J, Verhagen M. Semeval-2016 Task 12: Clinical TempEval. Proceedings of the 10th International Workshop on Semantic Evaluations (SemEval-2016); 2016. p. 1052–62. ( http://www.aclweb.org/anthology/S16-1165 )

Download RIS citation
47 Lee HJ, Xu H, Wang J, Zhang Y, Moon S, Xu J. , et al. UTHealth at SemEval-2016 Task 12: an End-to- End System for Temporal Information Extraction from Clinical Notes. In: SemEval@ NAACL-HLT 2016. p. 1292–17. ( http://www.aclweb.org/anthology/S16-1201 )

Download RIS citation
48 Bethard S, Savova G, Palmer M, Pustejovsky J. SemEval-2017 Task 12: Clinical TempEval. Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017); 2017. p. 565–72. ( http://aclweb.org/anthology/S17-2000 )

Download RIS citation
49 MacAvaney S, Cohan A, Goharian N. GUIR at SemEval- 2017 Task 12: A Framework for Cross-Domain Clinical Temporal Information Extraction. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017. p. 1024–9. ( http://www.aclweb.org/anthology/S17-2180 )

Download RIS citation
50 Tourille J, Ferret O, Névéol A, Tannier X. LIMSI- COT at SemEval-2016 Task 12: Temporal relation identification using a pipeline of classifiers. In: SemEval@NAACL-HLT 2016. p. 1136–42. ( http://www.aclweb.org/anthology/S16-1175 )

Download RIS citation
51 Uzuner Ö, Stubbs A, Filannino M. A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry. J Biomed Inform 2017; Oct 16; 75: S1-S3 . ( https://doi.org/10.1016/j.jbi.2017.10.005 )

Crossref Search in Google Scholar
Download RIS citation
52 Stubbs A, Filannino M, Uzuner Ö. De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID Shared Tasks Track 1. J Biomed Inform 2017; Nov; 75S: S4-S18

Search in Google Scholar
Download RIS citation
53 Filannino M, Stubbs A, Uzuner Ö. Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID Shared Tasks Track 2. J Biomed Inform 2017; Nov; 75S: S62-S70

Search in Google Scholar
Download RIS citation
54 Uzuner Ö, Luo Y, Szolovits P. Evaluating the stateof- the-art in automatic de-identification. J Am Med Inform Assoc 2007; Sep 1; 14 (05) 550-63

Crossref Search in Google Scholar
Download RIS citation
55 Jiang Z, Zhao C, He B, Guan Y, Jiang J. De-identification of medical records using Conditional Random Fields and Long Short-Term Memory networks. J Biomed Inform 2017; Nov; 75S: S43-S53

Search in Google Scholar
Download RIS citation
56 Clements D, Dault M, Priest A. Effective teamwork in healthcare: research and reality. Healthc Pap 2007; 7 (I): 26

Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Authors

Publication History

Summary

Keywords

References