Summary
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.
Objectives: In previous work, we have defined methods for the extraction of lexical patterns
from labels as an initial step towards semi-automatic ontology enrichment methods.
Our previous findings revealed that many biomedical ontologies could benefit from
enrichment methods using lexical patterns as a starting point. Here, we aim to identify
which lexical patterns are appropriate for ontology enrichment, driving its analysis
by metrics to prioritised the patterns.
Methods: We propose metrics for suggesting which lexical regularities should be the starting
point to enrich complex ontologies. Our method determines the relevance of a lexical
pattern by measuring its locality in the ontology, that is, the distance between the
classes associated with the pattern, and the distribution of the pattern in a certain
module of the ontology. The methods have been applied to four significant biomedical
ontologies including the Gene Ontology and SNOMED CT.
Results: The metrics provide information about the engineering of the ontologies and the relevance
of the patterns. Our method enables the suggestion of links between classes that are
not made explicit in the ontology. We propose a prioritisation of the lexical patterns
found in the analysed ontologies.
Conclusions: The locality and distribution of lexical patterns offer insights into the further
engineering of the ontology. Developers can use this information to improve the axiomatisation
of their ontologies.
Keywords
Biological ontologies - ontology enrichment - quality assurance - lexical patterns