Methods Inf Med 2018; 57(01/02): 43-53
DOI: 10.3414/ME17-01-0120
Original Articles
Schattauer GmbH

Validating UMLS Semantic Type Assignments Using SNOMED CT Semantic Tags

Huanying Gu
1   Department of Computer Science, New York Institute of Technology, New York, NY, USA
,
Zhe He
2   School of Information, Florida State University, Tallahassee, FL, USA
,
Duo Wei
3   Computer Science and Information Systems, Stockton University, Galloway, NJ, USA
,
Gai Elhanan
4   Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
5   Desert Research Institute, Reno, NV, USA
,
Yan Chen
6   Department of Computer Information Systems, Borough of Manhattan Community College, City University of New York, New York, NY, USA
› Author Affiliations

Funding Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health (NIH) under Award Number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Further Information

Publication History

received: 02 November 2017

accepted: 20 December 2017

Publication Date:
05 April 2018 (online)

Preview

Summary

Background: The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments.

Objectives: Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups.

Methods: Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing.

Results: We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies.

Conclusion: Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.