Methods Inf Med 2011; 50(05): 397-407
DOI: 10.3414/ME10-01-0020
Original Articles
Schattauer GmbH

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

Authors

  • K. Liu

    1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
  • W. W. Chapman

    1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
    2   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
  • G. Savova

    3   Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
  • C. G. Chute

    3   Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
  • N. Sioutos

    4   Lockheed Martin Corporation, Fairfax, Virginia, USA
  • R. S. Crowley

    1   Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
    2   Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
    5   Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
Weitere Informationen

Publikationsverlauf

received: 03. März 2010

accepted: 06. Oktober 2010

Publikationsdatum:
18. Januar 2018 (online)

Preview

Summary

Objective: To evaluate the effectiveness of a lexico-syntactic pattern (LSP) matching method for ontology enrichment using clinical documents.

Methods: Two domains were separately studied using the same methodology. We used radiology documents to enrich RadLex and pathology documents to enrich National Cancer Institute Thesaurus (NCIT). Several known LSPs were used for semantic knowledge extraction. We first retrieved all sentences that contained LSPs across two large clinical repositories, and examined the frequency of the LSPs. From this set, we randomly sampled LSP instances which were examined by human judges. We used a twostep method to determine the utility of these patterns for enrichment. In the first step, domain experts annotated medically meaningful terms (MMTs) from each sentence within the LSP. In the second step, RadLex and NCIT curators evaluated how many of these MMTs could be added to the resource. To quantify the utility of this LSP method, we defined two evaluation metrics: suggestion rate (SR) and acceptance rate (AR). We used these measures to estimate the yield of concepts and relationships, for each of the two domains.

Results: For NCIT, the concept SR was 24%, and the relationship SR was 65%. The concept AR was 21%, and the relationship AR was 14%. For RadLex, the concept SR was 37%, and the relationship SR was 55%. The concept AR was 11%, and the relationship AR was 44%.

Conclusion: The LSP matching method is an effective method for concept and concept relationship discovery in biomedical domains.