Summary
Background: The UMLS assigns semantic types to all its integrated concepts. The semantic types
are widely used in various natural language processing tasks in the biomedical domain,
such as named entity recognition, semantic disambiguation, and semantic annotation.
Due to the size of the UMLS, erroneous semantic type assignments are hard to detect.
It is imperative to devise automated techniques to identify errors and inconsistencies
in semantic type assignments.
Objectives: Designing a methodology to perform programmatic checks to detect semantic type assignment
errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts
in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic
type assignment errors may exist in concepts residing in semantically inconsistent
groups.
Methods: Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT
hierarchy into semantically uniform groups based on their assigned semantic tags;
2) partitioning concepts in each group from 1) into the disjoint sub-groups based
on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one
or more semantic types in the UMLS; 4) identifying semantically inconsistent groups
that have inconsistent assignments between semantic tags and semantic types according
to the mapping from 3) and providing concepts in such groups to the domain experts
for reviewing.
Results: We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent
groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33%
and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically
consistent groups of the two hierarchies.
Conclusion: Concepts in semantically in - consistent groups are more likely to contain semantic
type assignment errors. Our methodology can make auditing more efficient by limiting
auditing resources on concepts of semantically inconsistent groups.
Keywords
Controlled medical terminology - quality assurance - SNOMED CT - UMLS