Methods Inf Med 2018; 57(01/02): 43-53
DOI: 10.3414/ME17-01-0120
Original Articles
Schattauer GmbH

Validating UMLS Semantic Type Assignments Using SNOMED CT Semantic Tags

Huanying Gu
1   Department of Computer Science, New York Institute of Technology, New York, NY, USA
,
Zhe He
2   School of Information, Florida State University, Tallahassee, FL, USA
,
Duo Wei
3   Computer Science and Information Systems, Stockton University, Galloway, NJ, USA
,
Gai Elhanan
4   Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
5   Desert Research Institute, Reno, NV, USA
,
Yan Chen
6   Department of Computer Information Systems, Borough of Manhattan Community College, City University of New York, New York, NY, USA
› Author Affiliations
Funding Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health (NIH) under Award Number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Further Information

Publication History

received: 02 November 2017

accepted: 20 December 2017

Publication Date:
05 April 2018 (online)

Summary

Background: The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments.

Objectives: Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups.

Methods: Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing.

Results: We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies.

Conclusion: Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.

 
  • References

  • 1 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue): D267-270.
  • 2 The Statistics of the UMLS 2016AB Release. [May 1, 2017]. Available from: www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/statistics.html
  • 3 The UMLS Semantic Network. [cited 2012 Dec 5]. Available from: https://semanticnetwork.nlm.nih.gov/
  • 4 McCray AT, Hole WT. The scope and structure of the first version of the UMLS Semantic Network. Proc 14th Annu Symp Comput Appl Med Care; Los Alamitos, CA: 1990: 126-130.
  • 5 He Z, Morrey CP, Perl Y, Elhanan G, Chen L, Chen Y, Geller J. Sculpting the UMLS Refined Semantic Network. Online J Public Health Inform 2014; 06 (02) e181.
  • 6 Min H, Perl Y, Chen Y, Halper M, Geller J, Wang Y. Auditing as part of the terminology design life cycle. J Am Med Inform Assoc 2006; 13 (06) 676-690.
  • 7 Luo J, Zhang GQ, Wentz S, Cui L, Xu R. SimQ: real-time retrieval of similar consumer health questions. J Med Internet Res 2015; 17 (02) e43.
  • 8 Park MS, He Z, Chen Z, Oh S, Bian J. Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites. JMIR Med Inform 2016; 04 (04) e41.
  • 9 Weng C, Wu X, Luo Z, Boland MR, Theodoratos D, Johnson SB. EliXR: an approach to eligibility criteria extraction and representation. J Am Med Inform Assoc 2011; 18 Suppl 1: i116-124.
  • 10 He Z, Chen Z, Oh S, Hou J, Bian J. Enriching consumer health vocabulary through mining a social Q&A site: a similarity-based approach. J Biomed Inform 2017; 69: 75-85.
  • 11 SNOMED CT User Guide. [cited 2013 Apr 2]. Available from: www.ihtsdo.org/fileadmin/user_upload/doc/en_us/ug.html
  • 12 Release Notes of SNOMED CT International Edition. [May 1, 2017]. Available from: www.nlm.nih.gov/healthit/snomedct/international.html
  • 13 Fung KW, Hole WT, Nelson SJ, Srinivasan S, Powell T, Roth L. Integrating SNOMED CT into the UMLS: an exploration of different views of synonymy and quality of editing. J Am Med Inform Assoc 2005; 12 (04) 486-494.
  • 14 Gu HH, Perl Y, Elhanan G, Min H, Zhang L, Peng Y. Auditing concept categorizations in the UMLS. Artif Intell Med 2004; 31 (01) 29-44.
  • 15 Cimino JJ, Min H, Perl Y. Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform 2003; 36 (06) 450-461.
  • 16 Chen Y, Gu HH, Perl Y, Geller J. Structural groupbased auditing of missing hierarchical relationships in UMLS. J Biomed Inform 2009; 42 (03) 452-467.
  • 17 Morrey CP, Geller J, Halper M, Perl Y. The Neighborhood Auditing Tool: a hybrid interface for auditing the UMLS. J Biomed Inform 2009; 42 (03) 468-489.
  • 18 Geller J, He Z, Perl Y, Morrey CP, Xu J. Rule-based support system for multiple UMLS semantic type assignments. J Biomed Inform 2013; 46 (01) 97-110.
  • 19 Morrey CP. Auditing the Unified Medical Language System and Enhancing the Refined Semantic Network: Dissertation in the Department of Computer Science. New: Jersey Institute of Technology; 2009
  • 20 Gu H, Chen Y, He Z, Halper M, Chen L. Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies. Methods Inf Med 2016; 55 (02) 158-165.
  • 21 Wei D, Halper M, Elhanan G. editors. Using SNOMED semantic concept groupings to enhance semantic-type assignments in the UMLS. Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. 2012. Miami, Florida, USA: ACM.;
  • 22 Sfakianaki P, Koumakis L, Sfakianakis S, Iatraki G, Zacharioudakis G, Graf N, Marias K, Tsiknakis M. Semantic biomedical resource discovery: a Natural Language Processing framework. BMC Med Inform Decis Mak 2015; 15: 77.
  • 23 Albright D, Lanfranchi A, Fredriksen A, Styler WFt, Warner C, Hwang JD, Choi JD, Dligach D, Nielsen RD, Martin J, Ward W, Palmer M, Savova GK. Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc 2013; 20 (05) 922-930.
  • 24 Zhang R, Pakhomov S, Melton GB. Longitudinal analysis of new information types in clinical notes. AMIA Jt Summits Transl Sci Proc 2014; 2014: 232-237.
  • 25 Hoxha J, Jiang G, Weng C. Automated learning of domain taxonomies from text using background knowledge. J Biomed Inform 2016; 63: 295-306.
  • 26 Fan JW, Li J, Lussier YA. Semantic Modeling for Exposomics with Exploratory Evaluation in Clinical Context. J Healthc Eng 2017; 2017 3818302.
  • 27 Yu B, He Z. editors. Exploratory Textual Analysis of Consumer Health Languages for People Who are Deaf/Hard of Hearing. Proceedings of 2017 IEEE International Conference on Bioinformatics and Biomedicine. 2017. Kansas City, MO: IEEE.;
  • 28 Ceusters W, Bona JP. Analyzing SNOMED CT’s Historical Data: Pitfalls and Possibilities. AMIA Annu Symp Proc 2016; 2016: 361-370.
  • 29 Chen Y, Gu HH, Perl Y, Halper M, Xu J. Expanding the extent of a UMLS semantic type via group neighborhood auditing. J Am Med Inform Assoc 2009; 16 (05) 746-757.