Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

K. Liu; W. W. Chapman; G. Savova; C. G. Chute; N. Sioutos; R. S. Crowley

doi:10.3414/ME10-01-0020

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

PDF herunterladen

Methods Inf Med 2011; 50(05): 397-407
DOI: 10.3414/ME10-01-0020

Original Articles

Schattauer GmbH

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

Autoren

K. Liu

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
W. W. Chapman

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

²Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
G. Savova

³Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
C. G. Chute

³Department of Health Services Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
N. Sioutos

⁴Lockheed Martin Corporation, Fairfax, Virginia, USA
R. S. Crowley

¹Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

²Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA

⁵Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

Weitere Informationen

Publikationsverlauf

received: 03. März 2010

accepted: 06. Oktober 2010

Publikationsdatum:
18. Januar 2018 (online)

Lizenzen und Reprints

Summary

Objective: To evaluate the effectiveness of a lexico-syntactic pattern (LSP) matching method for ontology enrichment using clinical documents.

Methods: Two domains were separately studied using the same methodology. We used radiology documents to enrich RadLex and pathology documents to enrich National Cancer Institute Thesaurus (NCIT). Several known LSPs were used for semantic knowledge extraction. We first retrieved all sentences that contained LSPs across two large clinical repositories, and examined the frequency of the LSPs. From this set, we randomly sampled LSP instances which were examined by human judges. We used a twostep method to determine the utility of these patterns for enrichment. In the first step, domain experts annotated medically meaningful terms (MMTs) from each sentence within the LSP. In the second step, RadLex and NCIT curators evaluated how many of these MMTs could be added to the resource. To quantify the utility of this LSP method, we defined two evaluation metrics: suggestion rate (SR) and acceptance rate (AR). We used these measures to estimate the yield of concepts and relationships, for each of the two domains.

Results: For NCIT, the concept SR was 24%, and the relationship SR was 65%. The concept AR was 21%, and the relationship AR was 14%. For RadLex, the concept SR was 37%, and the relationship SR was 55%. The concept AR was 11%, and the relationship AR was 44%.

Conclusion: The LSP matching method is an effective method for concept and concept relationship discovery in biomedical domains.

Keywords

Ontology learning from text - knowledge acquisition - ontology enrichment - natural language processing - lexico-syntactic pattern

References
1 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucl Acids Res 2004; 32 (01) 267-270.

Suche in Google Scholar
Download RIS citation
2 Cowell L, Smith B. Infectious disease ontology. In: Sintchenko V. editor. Infectious Disease Informatics. New York City: Springer; 2010. pp 373-395.

Crossref Suche in Google Scholar
Download RIS citation
3 The Gene Ontology Consortium.. The Gene Ontology project. Nucl Acids Res 2008; 36 (01) 440-444.

Crossref PubMed Suche in Google Scholar
Download RIS citation
4 HL7: HL7 Reference Information Model. Available from: http://www.hl7.org/implement/standards/rim.cfm

Download RIS citation
5 Achour S, Dojat M, Rieux C, Bierling P, Lepage E. A UMLS-based Knowledge Acquisition Tool for Rule-based Clinical Decision Support System Development. J Am Med Inform Assoc 2001; 8: (04) 351-360.

Crossref PubMed Suche in Google Scholar
Download RIS citation
6 Collier N, Kawazoe A, Jin L, Shigematsu M, Dien D, Barrero RA. et al. A multilingual ontology for infectious disease surveillance: rationale, design and challenges. Language Resources and Evaluation 2006; 40: 405-413.

PubMed Suche in Google Scholar
Download RIS citation
7 Kashyap V, Morales A, Hongsermeier T. On implementing clinical decision support: achieving scalability and maintainability by combining business rules and ontologies. In: Proceedings of the Annual Symposium of American Medical Informatics Association 2006. Washington, DC: 2006. pp 414-418.

Suche in Google Scholar
Download RIS citation
8 Haynes B, McKibbon A, Wilczynski N, Walter S, Werre S, for the Hedges T. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. Brit Med J 2005; 330 7501 1179.

Crossref PubMed Suche in Google Scholar
Download RIS citation
9 Sneiderman CA, Demner-Fushman D, Marcelo Fiszman M, Ide NC, Rindflesch TC. Knowledge-based methods to help clinicians find answers in MEDLINE. J Am Med Inform Assoc 2007; 14 (06) 772-780.

Crossref PubMed Suche in Google Scholar
Download RIS citation
10 Meystre S, Haug PJ. Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform 2006; 39 (06) 589-599.

Crossref PubMed Suche in Google Scholar
Download RIS citation
11 Liang T, Lin Y-H. Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources. In: Dale R, Wong K-F, Su J, Kwong OY. editors. Natural Language Processing – IJCNLP. Berlin/Heidelberg: Springer; 2005. pp 742-753.

Suche in Google Scholar
Download RIS citation
12 Pustejovsky J, Rumshisky A, Castano J. Rerendering semantic ontologies: Automatic extensions to UMLS through corpus analytics. Language Resources and Evaluation Workshop on Ontologies and Lexical Knowledge Bases. Las Palmas, Canary Islands, Spain: 2002 pp 60-67.

PubMed Suche in Google Scholar
Download RIS citation
13 Girju R, Badulescu A, Moldovan D. Learning semantic constraints for the automatic discovery of part-whole relations. In: Proceedings of the Human Language Technology Conference. Edmonton, Canada: 2003. pp 80-87.

Suche in Google Scholar
Download RIS citation
14 Wagner C. End-users as expert system developers. Journal of End User Computing 2000; 12 (03) 3-13.

Suche in Google Scholar
Download RIS citation
15 Wagner C. Breaking the knowledge acquisition bottleneck through conversational knowledge management. Information Resources Management 2006; 19 (01) 70-83.

Suche in Google Scholar
Download RIS citation
16 Waterman DA. A guide to expert systems. Addison-Wesley Longman Publishing Co., Inc.; 1985

Suche in Google Scholar
Download RIS citation
17 Druss BG, Marcus SC. Growth and decentralization of the medical literature: implications for evidence-based medicine. J Med Libr Assoc 2005; 93 (04) 499-501.

PubMed Suche in Google Scholar
Download RIS citation
18 Chun H-W, Tsuruoka Y, Kim J-D, Shiba R, Nagata N, Hishiki T. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In: Proceedings of Pacific Symposium on Biocomputing. Maui, HI: 2006. pp 4-15.

Suche in Google Scholar
Download RIS citation
19 Collier N, Park H, Ogata N, Tateishi Y, Nobata C, Ohta T. et al. The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, Norway: 1999. pp 271-272.

Suche in Google Scholar
Download RIS citation
20 Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 2004; 20 (05) 604-611.

Crossref PubMed Suche in Google Scholar
Download RIS citation
21 Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004; 37 (02) 120-127.

Crossref PubMed Suche in Google Scholar
Download RIS citation
22 South BR, Chapman WW, Delisle S, Shen S, Kalp E, Perl T. et al. Optimizing a Syndromic Surveillance Text Classifier for Influenza-like Illness: Does Document Source Matter?. In: Proceedings of the Annual Symposium of American Medical Informatics Association. Washington, DC: 2008. pp 692-696.

Suche in Google Scholar
Download RIS citation
23 Cornet R, De Keizer NF, Abu-Hanna A. A framework for characterizing terminological systems. Methods Inf Med 2006; 45: 253-266.

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
24 de Keizer NF, Abu-Hanna A. Understanding terminological systems II: terminology and typology. Methods Inf Med 2000; 39: 22-29.

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
25 de Keizer NF, Abu-Hanna A, Zwetsloot-Schonl JHM. Understanding terminological systems I: terminology and typology. Methods Inf Med 2000; 39: 16-21.

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
26 Buitelaar P, Cimiano P, Magnini B. Ontology learning from text: method, evaluation and applications. Breuker J, Dieng R, Guarino N, Mantaras RLd, Mizoguchi R, Musen M. editors. Amsterdam, Berlin, Oxford, Tokyo, Washington DC:: IOS Press; 2005

Suche in Google Scholar
Download RIS citation
27 Gomez-Perez A, Manzano-Macho D. An overview of method and tools for ontology learning from texts. The Knowledge Engineering Review 2005; 19 (03) 187-212.

Suche in Google Scholar
Download RIS citation
28 Caraballo S. Automatic construction of a hyper-nym-labeled noun hierarchy from text. In: Proceedings of the 37th Conference on Computational Linguistics. College Park, MD: 1999. pp 120-126.

Suche in Google Scholar
Download RIS citation
29 Cederberg S, Widdows D. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the 7th Conference on Natural Language Learning. Edmonton, Canada: 2003. pp 111-118.

Suche in Google Scholar
Download RIS citation
30 Downey D, Etzioni O, Soderland S, Weld DS. Learning text patterns for Web information extraction and assessment. In: Proceedings of the American Association for Artificial Intelligence Workshop on Adaptive Text Extraction and Mining. San Jose, CA: 2004. pp 50-55.

Suche in Google Scholar
Download RIS citation
31 Hearst MA. Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 12th Conference on Computational Linguistics. Nantes, France: 1992. pp 539-545.

Suche in Google Scholar
Download RIS citation
32 Church KW, Hanks P. Word association norms, mutual information, and lexicography. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics. Vancouver, BC, Canada: 1989. pp 76-83.

Suche in Google Scholar
Download RIS citation
33 Grefenstette G. Sextant: exploring unexplored contexts for semantic extraction from syntactic analysis. In: Proceedings of the 30th annual meeting of the Association for Computational Linguistics. Newark, DE: 1992. pp 324-326.

Suche in Google Scholar
Download RIS citation
34 Grefenstette G. Explorations in automatic thesaurus discovery. Boston, MA: Kluwer Academic Publisher; 1994

Suche in Google Scholar
Download RIS citation
35 Kavalec M, Svatek V. A study on automated relation labeling in ontology learning. In: Buitelaar P, Cimiano P, Magnini B. editors. Ontology Learning from Text: Method, Evaluation and Applications. Amsterdam, Berlin, Oxford, Tokyo, Washington DC: IOS Press; 2005. pp 44-58.

Suche in Google Scholar
Download RIS citation
36 Nenadâ G, Spasiâ I, Ananiadou S. Automatic discovery of term similarities using pattern mining. In: Proceedings of the 2nd International Workshop on Computational Terminology. Taipei, Taiwan: Association for Computational Linguistics; 2002. pp 1-7.

Suche in Google Scholar
Download RIS citation
37 Ryu P-M, Choi K-S. Measuring the specificity of terms for automatic hierarchy construction. In: Proceedings of the European Conference on Artificial Intelligence Workshop on Ontology Learning and Population. Valencia, Spain: 2004

Suche in Google Scholar
Download RIS citation
38 Liu K, Hogan WR, Crowley RS. Natural language processing methods and systems for biomedical ontology learning. J Biomed Inform 2010. In press

Download RIS citation
39 ODIE toolkit.. 2010. Available from: http://bioontology.org/tools/ODIE.html

Download RIS citation
40 Crowley RS, Chavan G, Mitchell K, Liu K, Savova G, Chapman W. et al. ODIE – A workbench for cyclic entity recognition and ontology enrichment. In: Proceedings of the Annual Symp of American Medical Informatics Association. Washington, DC: 2010. Submitted.

Suche in Google Scholar
Download RIS citation
41 Mukherjea S, Sahay S. Discovering biomedical relations utilizing the World-Wide Web. In: Proceedings of the Pacific Symposium on Biocomputing. Maui, HI; 2006. pp 164-175.

Suche in Google Scholar
Download RIS citation
42 Berland M, Charniak E. Finding parts in very large corpora. In: Proceedings of the 37th Conference on Computational Linguistics. College Park, MD; 1999. pp 57-64.

Suche in Google Scholar
Download RIS citation
43 Sundblad H. Automatic acquisition of hyponyms and meronyms from question corpora.. In: Proceedings of the 15th European Conference on Artificial Intelligence. Lyon, France;: 2002

Suche in Google Scholar
Download RIS citation
44 Fiszman M, Rindflesch TC, Kilicoglu H. Integrating a hypernymic proposition interpreter into a semantic processor for biomedical texts. In: Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, DC: 2003. pp 239-243.

Suche in Google Scholar
Download RIS citation
45 Health Insurance Portability and Accountability Act of 1996. Available from: http://aspe.hhs.gov/admnsimp/pl104191.htm

Download RIS citation
46 National Cancer Institute Thesaurus (NCIT) 2010. Available from: http://ncit.nci.nih.gov

Download RIS citation
47 Mejino JLV, Rubin DL, Brinkley JF. FMA-RadLex: an application ontology of radiological anatomy derived from the Foundational Model of Anatomy reference ontology. In: Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, DC: 2008. p 465.

Suche in Google Scholar
Download RIS citation
48 Liu K, Chapman W, Hwa R, Crowley RS. Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger. J Am Med Inform Assoc 2007; 14 (05) 641-650.

Crossref PubMed Suche in Google Scholar
Download RIS citation
49 GATE.. June 2010. Available from: http://gate.ac.uk/

Download RIS citation
50 Chapman WW, Dowling JN, Hripcsak G. Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports. Int J Med Inform 2008; 77 (02) 107-113.

Crossref PubMed Suche in Google Scholar
Download RIS citation
51 Riloff E. Automatically generating extraction patterns from untagged text. In: Proceedings of the 13th National Conference on Artificial Intelligence. Portland, OR; 1996. pp 1044-1049.

Suche in Google Scholar
Download RIS citation
52 Xu R, Morgan A, Das AK, Garber A. Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. In: Proceedings of the Workshop on Bio NLP, Boulder; Colorado: 2009. pp 63-70.

Suche in Google Scholar
Download RIS citation
53 Pantel P, Ravich D, Hovy E. Towards terascale knowledge acquisition. In: Proceedings of THe Conference on Computational Linguistics. Barcelona; Spain: 2004. pp 771-777.

Suche in Google Scholar
Download RIS citation
54 Snow R, Jurafsky D, Ng AY. editors. Learning syntactic patterns for automatic hypernym discovery. Cambridge, MA: MIT Press; 2005

Suche in Google Scholar
Download RIS citation
55 Embarek M, Ferret O. Learning patterns for building resources about semantic relations in the medical domain. In: Proceedings of the 6th International Confernce on Language Resources and Evaluation. Marrakech; Morocco: 2008. pp 2006-2012.

Suche in Google Scholar
Download RIS citation

Ähnliche Zeitschriften

RSS-Feed abonnieren

Teilen / Bookmarken

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

Autoren

Publikationsverlauf

Summary

Keywords

References