Subscribe to RSS
DOI: 10.1055/s-0044-1791487
Evolution of a Graph Model for the OMOP Common Data Model
Funding This work was supported by grant 5U19AI135964 from the National Institute of Allergy and Infectious Disease of the National Institutes of Health.Abstract
Objective Graph databases for electronic health record (EHR) data have become a useful tool for clinical research in recent years, but there is a lack of published methods to transform relational databases to a graph database schema. We developed a graph model for the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that can be reused across research institutions.
Methods We created and evaluated four models, representing two different strategies, for converting the standardized clinical and vocabulary tables of OMOP into a property graph model within the Neo4j graph database. Taking the Successful Clinical Response in Pneumonia Therapy (SCRIPT) and Collaborative Resource for Intensive care Translational science, Informatics, Comprehensive Analytics, and Learning (CRITICAL) cohorts as test datasets with different sizes, we compared two of the resulting graph models with respect to database performance including database building time, query complexity, and runtime for both cohorts.
Results Utilizing a graph schema that was optimized for storing critical information as topology rather than attributes resulted in a significant improvement in both data creation and querying. The graph database for our larger cohort, CRITICAL, can be built within 1 hour for 134,145 patients, with a total of 749,011,396 nodes and 1,703,560,910 edges.
Discussion To our knowledge, this is the first generalized solution to convert the OMOP CDM to a graph-optimized schema. Despite being developed for studies at a single institution, the modeling method can be applied to other OMOP CDM v5.x databases. Our evaluation with the SCRIPT and CRITICAL cohorts and comparison between the current and previous versions show advantages in code simplicity, database building, and query speed.
Conclusion We developed a method for converting OMOP CDM databases into graph databases. Our experiments revealed that the final model outperformed the initial relational-to-graph transformation in both code simplicity and query efficiency, particularly for complex queries.
Keywords
databases - general information systems and technologies in clinical settings - OMOP common data model - clinical data management - electronic health records and systems - clinical information systemsProtection of Human and Animal Subjects
This study was conducted in accordance with the ethical standards of the institutional review board (IRB). All procedures involving human participants were reviewed and approved by the IRB of Northwestern University (STU00204868 for SCRIPT study and STU00212016 for CRITICAL study).
Publication History
Received: 23 August 2022
Accepted: 27 August 2024
Article published online:
04 December 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Kang M, Alvarado-Guzman JA, Rasmussen L, Starren JB. AMIA Summit. 2021 abstract: graph model for OMOP CDM.
- 2 Needham M, Hodler A. Graph algorithms. Accessed February 11, 2022 at: https://learning.oreilly.com/library/view/graph-algorithms/9781492047674/
- 3 Simpson CM, Gnad F. Applying graph database technology for analyzing perturbed co-expression networks in cancer. Database (Oxford) 2020; 2020: baaa110
- 4 Fabregat A, Korninger F, Viteri G. et al. Reactome graph database: efficient access to complex pathway data. PLOS Comput Biol 2018; 14 (01) e1005968
- 5 Park Y, Shankar M, Park BH, Ghosh J. Graph databases for large-scale healthcare systems: a framework for efficient data management and data services. IEEE 30th International Conference on Data Engineering Workshops. Chicago, IL; 2014: 12-19
- 6 Xia Y, Sun C. Property graph database modeling and application of electronic medical record. Paper presented at: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC) 2018: 963-967
- 7 Campbell WS, Pedersen J, McClay JC, Rao P, Bastola D, Campbell JR. An alternative database approach for management of SNOMED CT and improved patient data queries. J Biomed Inform 2015; 57: 350-357
- 8 Neo4j, Inc. Modeling Designs: Developer Guides. Neo4j Graph Database Platform. Accessed February 11, 2022 at: https://neo4j.com/developer/modeling-designs/
- 9 Rasmussen LV, Brandt PS, Jiang G. et al. Considerations for improving the portability of electronic health record-based phenotype algorithms. AMIA Annu Symp Proc 2020; 2019: 755-764
- 10 Observational Health Data Sciences and Informatics. . OMOP Common Data Model – OHDSI. Accessed February 8, 2022 at: https://www.ohdsi.org/data-standardization/the-common-data-model/
- 11 National Center for Advancing Translational Sciences (NCATS). National COVID Cohort Collaborative (N3C). National Center for Advancing Translational Sciences. May 12, 2020 . Accessed February 11, 2022 at: https://ncats.nih.gov/n3c
- 12 U.S. Department of Health and Human Services. All of Us Research Program | National Institutes of Health (NIH). National Institutes of Health (NIH)—All of Us. 2020 . Accessed February 13, 2022 at: https://allofus.nih.gov/future-health-begins-all-us
- 13 Alvarado-Guzmán JA, Keren I. Relational to Graph Database: Migration. Published 2017. Accessed September 16, 2024 at: https://www.ohdsi.org/web/wiki/lib/exe/fetch.php?media=resources:jose_alvarado_rd2gd_ohdsi_submission_2017.pdf
- 14 Pfaff ER, Girvin AT, Gabriel DL. et al; N3C Consortium. Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative. J Am Med Inform Assoc 2022; 29 (04) 609-618
- 15 Sathappan SMK, Jeon YS, Dang TK. et al. Transformation of electronic health records and questionnaire data to OMOP CDM: a feasibility study using SG_T2DM dataset. Appl Clin Inform 2021; 12 (04) 757-767
- 16 Maier C, Lang L, Storf H. et al. Towards implementation of OMOP in a German University Hospital Consortium. Appl Clin Inform 2018; 9 (01) 54-61
- 17 Sun H, Depraetere K, De Roo J. et al. Semantic processing of EHR data for clinical research. J Biomed Inform 2015; 58: 247-259
- 18 Lynch KE, Deppen SA, DuVall SL. et al. Incrementally Transforming Electronic Medical Records into the Observational Medical Outcomes Partnership Common Data Model: A Multidimensional Quality Assurance Approach. Appl Clin Inform 2019; 10 (05) 794-803
- 19 OMOP to PCORIv2 ETL Mapping Specification Version 0.1. 2015 . Google Search. Accessed March 12, 2024 at: https://www.google.com/search?q=OMOP+to+PCORIv2+ETL+Mapping+Specification+Version+0.1+15+May+2015&rlz=1C5GCCM_en&oq=OMOP+to+PCORIv2+ETL+Mapping+Specification+Version+0.1+15+May+2015&gs_lcrp=EgZjaHJvbWUyBggAEEUYOdIBCDEwMjhqMGo0qAIAsAIA&sourceid=chrome&ie=UTF-8
- 20 Klann JG, Phillips LC, Herrick C, Joss MAH, Wagholikar KB, Murphy SN. Web services for data warehouses: OMOP and PCORnet on i2b2. J Am Med Inform Assoc 2018; 25 (10) 1331-1338
- 21 Neo4j, Inc. Neo4j Graph Data Platform. Neo4j Graph Data Platform. Accessed February 13, 2022 at: https://neo4j.com/
- 22 SCRIPT research team. SCRIPT homepage. Accessed February 13, 2022 at: https://script.northwestern.edu/
- 23 Grant RA, Morales-Nebreda L, Markov NS. et al; NU SCRIPT Study Investigators. Circuits between infected macrophages and T cells in SARS-CoV-2 pneumonia. Nature 2021; 590 (7847) 635-641
- 24 Gannon D. Azure's new CosmosDB Planet-Scale Database. Published online 2017.
- 25 Robinson I, Webber J, Eifrem E. Graph Databases. 2nd ed. O'Reilly Media, Inc.. 2015 ISBN: 9781491930892
- 26 Shannon P, Markiel A, Ozier O. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13 (11) 2498-2504
- 27 Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. Proc Int AAAI Conf Web Soc Media 2009; 3 (01) 361-362
- 28 Starren JB, Winter AQ, Lloyd-Jones DM. Enabling a learning health system through a unified enterprise data warehouse: the experience of the Northwestern University Clinical and Translational Sciences (NUCATS) institute. Clin Transl Sci 2015; 8 (04) 269-271
- 29 OHDSI FHIR Work Groups. Workgroups:mappings_between_ohdsi_cdm_and_fhir. Accessed February 13, 2022 at: https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:mappings_between_ohdsi_cdm_and_fhir
- 30 i2b2 tranSMART Foundation. i2b2 Community Wiki. Accessed February 13, 2022 at: https://community.i2b2.org/wiki/display/OMOP
- 31 Katsma B. Benchmarking big observational health data. Medium. 2020 . Accessed February 13, 2022 at: https://medium.com/@b.katsma/benchmarking-big-observational-health-data-97e148c393f4
- 32 Mughal S, Moghul I, Yu J, Clark T, Gregory DS, Pontikos N. UKIRDC. Pheno4J: a gene to phenotype graph database. Bioinformatics 2017; 33 (20) 3317-3319
- 33 Queralt-Rosinach N, Stupp GS, Li TS. et al. Structured reviews for data and knowledge-driven research. Database (Oxford) 2020; 2020: baaa015
- 34 Declerck G, Hussain S, Daniel C. et al. Bridging data models and terminologies to support adverse drug event reporting using EHR data. Methods Inf Med 2015; 54 (01) 24-31
- 35 Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D. A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual ACM Southeast Conference. ACMSE'10. Association for Computing Machinery; 2010: 1-6