CC BY-NC-ND 4.0 · Yearb Med Inform 2020; 29(01): 159-162
DOI: 10.1055/s-0040-1701991
Section 6: Knowledge Representation and Management
Survey
Georg Thieme Verlag KG Stuttgart

Ontologies, Knowledge Representation, and Machine Learning for Translational Research: Recent Contributions

Peter N. Robinson
1   The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
2   Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
,
Melissa A. Haendel
3   Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR, USA
4   Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
21 August 2020 (online)

Summary

Objectives: To select, present, and summarize the most relevant papers published in 2018 and 2019 in the field of Ontologies and Knowledge Representation, with a particular focus on the intersection between Ontologies and Machine Learning.

Methods: A comprehensive review of the medical informatics literature was performed to select the most interesting papers published in 2018 and 2019 and that document the utility of ontologies for computational analysis, including machine learning.

Results: Fifteen articles were selected for inclusion in this survey paper. The chosen articles belong to three major themes: (i) the identification of phenotypic abnormalities in electronic health record (EHR) data using the Human Phenotype Ontology ; (ii) word and node embedding algorithms to supplement natural language processing (NLP) of EHRs and other medical texts; and (iii) hybrid ontology and NLP-based approaches to extracting structured and unstructured components of EHRs.

Conclusion: Unprecedented amounts of clinically relevant data are now available for clinical and research use. Machine learning is increasingly being applied to these data sources for predictive analytics, precision medicine, and differential diagnosis. Ontologies have become an essential component of software pipelines designed to extract, code, and analyze clinical information by machine learning algorithms. The intersection of machine learning and semantics is proving to be an innovative space in clinical research.

 
  • References

  • 1 Haendel MA, Chute CG, Robinson PN. Classification, Ontology, and Precision Medicine. N Engl J Med 2018; 379: 1452-62
  • 2 Toh TS, Dondelinger F, Wang D. Looking beyond the hype: Applied AI and machine learning in translational medicine. EBioMedicine 2019; 47: 607-15
  • 3 National Academies of Sciences, Engineering, and Medicine. Artificial Intelligence and Machine Learning to Accelerate Translational Research: Proceedings of a Workshop. Available from: https://www.ncbi.nlm.nih.gov/books/NBK513721/pdf/Bookshelf_NBK513721.pdf
  • 4 Arbabi A, Adams DR, Fidler S, Brudno M. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med Inform 2019; 7: e12596
  • 5 Son JH, Xie G, Yuan C, Ena L, Li Z, Goldstein A. et al. Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet 2018; 103: 58-73
  • 6 Deisseroth CA, Birgmeier J, Bodle EE, Kohler JN, Matalon DR, Nazarenko Y. et al. ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis. Genet Med 2019; 21: 1585-93
  • 7 Bastarache L, Hughey JJ, Hebbring S, Marlo J, Zhao W, Ho WT. et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science 2018; 359: 1233-9
  • 8 Yang Y, Wang X, Huang Y, Chen N, Shi J, Chen T. Ontology-based venous thromboembolism risk assessment model developing from medical records. BMC Med Inform Decis Mak 2019; 19: 151
  • 9 Clark MM, Hildreth A, Batalov S, Ding Y, Chowdhury S, Watkins K. et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci Transl Med 2019; 11 (489): eaat6177 . Available from: http://dx.doi.org/10.1126/scitranslmed.aat6177
  • 10 Liu C, Peres Kury FS, Li Z, Ta C, Wang K, Weng C. Doc2Hpo: a web application for efficient and accurate HPO concept curation. Nucleic Acids Res 2019; 47: W566-W570
  • 11 Shen F, Peng S, Fan Y, Wen A, Liu S, Wang Y. et al HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology. J Biomed Inform 2019; 96: 103246
  • 12 Lin C, Lou YS, Tsai DJ, Lee CC, Robinson CJ, Wu DC. , et al. Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study. JMIR Med Inform 2019; 7: e14499
  • 13 Beam AL, Kompa B, Schmaltz A, Fried I, Weber G, Palmer NP. et al Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. Pac Symp Biocomput 2020; 25: 295-306 Available from: http://arxiv.org/abs/1804.01486
  • 14 Hong N, Wen A, Shen F, Sohn S, Liu S, Liu H. et al. Integrating Structured and Unstructured EHR Data Using an FHIR-based Type System: A Case Study with Medication Data. AMIA Jt Summits Transl Sci Proc 2018; 2017: 74-83
  • 15 Hong N, Wen A, Stone DJ, Tsuji S, Kingsbury PR, Rasmussen L. et al Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. J Biomed Inform 2019; 99: 103310
  • 16 Wu H, Toti G, Morley KI, Ibrahim ZM, Folarin A, Jackson R. et al. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J Am Med Inform Assoc 2018; 25: 530-7
  • 17 Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC. et al Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med 2019; 2: 32 Available from: http://dx.doi.org/10.1038/s41746-019-0110-4
  • 18 Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F. et al. A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 2018; 87: 12-20
  • 19 Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP. , et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res 2019; 47 (D1): D1018-D1027 . Available from: http://dx.doi.org/10.1093/nar/gky1105
  • 20 Trujillano D, Bertoli-Avella AM, Kumar Kandaswamy K, Weiss ME, Köster J, Marais A. et al. Clinical exome sequencing: results from 2819 samples reflecting 1000 families. Eur J Hum Genet 2017; 25: 176-82
  • 21 Köhler S, Øien NC, Buske OJ, Groza T, Jacobsen JOB, McNamara C. et al. Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics. Curr Protoc Hum Genet 2019; 103: e92
  • 22 Sevenster M, van Ommering R, Qian Y. Automatically correlating clinical findings and body locations in radiology reports using MedLEE. J Digit Imaging 2012; 25: 240-9
  • 23 Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010; 17: 229-36
  • 24 Bronstein MM, Bruna J, Le Cun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine 2017; 34 (04) 18-42 Available from: http://arxiv.org/abs/1611.08097
  • 25 Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR. Available from: http://arxiv.org/abs/1301.3781
  • 26 Lovász L. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty 1993; 2: 1-46
  • 27 Perozzi B, Al-Rfou R, Skiena S. eepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM 2014; 701-10
  • 28 Grover A, Leskovec J. node2vec: Scalable Feature Learning for Networks. KDD 2016; 2016: 855-64
  • 29 A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications. IEEE Trans Knowl Data Eng 2018; 30 (09) 1616-37 Available from: http://arxiv.org/abs/1709.07604
  • 30 Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018; 34: i457-i466
  • 31 Köhler S, Schulz MH, Krawitz P, Bauer S, Dölken S, Ott CE. et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet 2009; 85: 457-64