CC BY-NC-ND 4.0 · Yearb Med Inform 2021; 30(01): 283-289
DOI: 10.1055/s-0041-1726481
Research & Education

Drawing Reproducible Conclusions from Observational Clinical Data with OHDSI

George Hripcsak
1   Department of Biomedical Informatics, Columbia University, New York, New York, USA
2   Observational Health Data Sciences and Informatics, New York, New York, USA
Martijn J. Schuemie
2   Observational Health Data Sciences and Informatics, New York, New York, USA
3   Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey, USA
David Madigan
2   Observational Health Data Sciences and Informatics, New York, New York, USA
4   Northeastern University, Boston, Massachusetts, USA
Patrick B. Ryan
1   Department of Biomedical Informatics, Columbia University, New York, New York, USA
2   Observational Health Data Sciences and Informatics, New York, New York, USA
3   Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey, USA
Marc A. Suchard
2   Observational Health Data Sciences and Informatics, New York, New York, USA
5   Fielding School of Public Health, Department of Biostatistics, University of California, Los Angeles, Los Angeles, USA
6   David Geffen School of Medicine, Department of Biomathematics, University of California, Los Angeles, Los Angeles, USA
› Author Affiliations


Objective: The current observational research literature shows extensive publication bias and contradiction. The Observational Health Data Sciences and Informatics (OHDSI) initiative seeks to improve research reproducibility through open science.

Methods: OHDSI has created an international federated data source of electronic health records and administrative claims that covers nearly 10% of the world’s population. Using a common data model with a practical schema and extensive vocabulary mappings, data from around the world follow the identical format. OHDSI’s research methods emphasize reproducibility, with a large-scale approach to addressing confounding using propensity score adjustment with extensive diagnostics; negative and positive control hypotheses to test for residual systematic error; a variety of data sources to assess consistency and generalizability; a completely open approach including protocol, software, models, parameters, and raw results so that studies can be externally verified; and the study of many hypotheses in parallel so that the operating characteristics of the methods can be assessed.

Results: OHDSI has already produced findings in areas like hypertension treatment that are being incorporated into practice, and it has produced rigorous studies of COVID-19 that have aided government agencies in their treatment decisions, that have characterized the disease extensively, that have estimated the comparative effects of treatments, and that the predict likelihood of advancing to serious complications.

Conclusions: OHDSI practices open science and incorporates a series of methods to address reproducibility. It has produced important results in several areas, including hypertension therapy and COVID-19 research.

Publication History

Article published online:
21 April 2021

© 2021. IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Hripcsak G, Duke J, Shah N, Reich C, Huser V, Schuemie M. et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform 2015; 216: 574-8
  • 2 Cardwell CR, Abnet CC, Cantwell MM, Murray LJ. Exposure to oral bisphosphonates and risk of esophageal cancer. JAMA 2010; 304: 657-63
  • 3 Green J, Czanner G, Reeves G, Watson J, Wise L, Beral V. Oral bisphosphonates and risk of cancer of oesophagus, stomach, and colorectum: case-control analysis within a UK primary care cohort. BMJ 2010; 341: c4444
  • 4 Schuemie MJ, Ryan PB, Hripcsak G, Madigan D, Suchard MA. Improving reproducibility by using high-throughput observational studies with empirical calibration. Philos Trans A Math Phys Eng Sci 2018; 376 (2128): 20170356
  • 5 Schuemie MJ, Suchard MA, Hripcsak G, Ryan PB, Madigan D. Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data. Proc Natl Acad Sci U S A 2018; 115 (11) 2571-77
  • 6 Overhage J, Ryan P, Reich C, Hartzema A, Stang P. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012; 19: 54-60
  • 7 Wang J, Anh H, Manion F, Rouhizadeh M, Zhang Y. COVID-19 SignSym–A fast adaptation of general clinical NLP tools to identify and normalize COVID-19 signs and symptoms to OMOP common data model. ArXiv 2020;arXiv:2007.10286v3
  • 8 Liu S, Wang Y, Wen A, Wang L, Hong N, Shen F, Bedrick S, Hersh W, Liu H. CREATE: Cohort retrieval enhanced by analysis of text from electronic health records using OMOP common data model. ArXiv 2019;arXiv:1901.07601
  • 9 Meystre SM, Heider PM, Kim Y, Aruch DB, Britten CD. Automatic trial eligibility surveillance based on unstructured clinical data. Int J Med Inform 2019; 129: 13-19
  • 10 Sharma H, Mao C, Zhang Y, Vatani H, Yao L, Zhong Y, Rasmussen L, Jiang G, Pathak J, Luo Y. Developing a portable natural language processing based phenotyping system. BMC Med Inform Decis Mak 2019; 19 (Suppl 3): 78
  • 11 OHDSI. Observational Health Data Sciences and Informatics GitHub Library. Available from: [Accessed 2021 Feb 7]
  • 12 ATLAS – A unified interface for the OHDSI tools. [Accessed 2020 Nov 18]
  • 13 Suchard M, Simpson S, Zorych I, Ryan P, Madigan D. Massive parallelization of serial inference algorithms for a complex generalized linear model. Transactions on Modeling and Computer Simulation 2013; 23: 10
  • 14 Hripcsak G, Ryan PB, Duke JD, Shah NH, Park RW, Huser V, Suchard MA, Schuemie MJ, DeFalco FJ, Perotte A, Banda JM, Reich CG, Schilling LM, Matheny ME, Meeker D, Pratt N, Madigan D. Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A 2016; 113: 7329-36
  • 15 Schuemie MJ, Ryan PB, Pratt N, Chen R, You SC, Krumholz HM. et al. Principles of large-scale evidence generation and evaluation across a network of databases (LEGEND). J Am Med Inform Assoc 2020; 27: 1331-7
  • 16 Schuemie MJ, Ryan PB, Pratt N, Chen R, You SC, Krumholz HM. et al. Large-scale evidence generation and evaluation across a network of databases (LEGEND): Assessing validity using hypertension as a case study. J Am Med Inform Assoc 2020; 27: 1268-77
  • 17 Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41-55
  • 18 Tian Y, Schuemie MJ, Suchard MA. Evaluating large-scale propensity score performance through real-world and synthetic data experiments. Int J Epidemiol 2018; 47: 2005-14
  • 19 Weinstein RB, Ryan P, Berlin JA, Berlin JA, Matcho A, Schuemie M. et al. Channeling in the use of nonprescription paracetamol and ibuprofen in an electronic medical records database: evidence and implications. Drug Saf 2017; 40 (12) 1279-92
  • 20 Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 1996; 58 (01) 267-88
  • 21 Walker AM, Patrick AR, Lauer MS, Hornbrook MC, Marin MG, Platt R. et al. A tool for assessing the feasibility of comparative effectiveness research. Comparative Effectiveness Research 2013; 3: 11-20
  • 22 Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 2009; 28: 3083-107
  • 23 Graham DJ, Reichman ME, Wernecke M, Zhang R, RossSouthworth M, Levenson M. et al. Cardiovascular, bleeding, and mortality risks in elderly Medicare patients treated with dabigatran or warfarin for nonvalvular atrial fibrillation. Circulation 2015; 131: 157-64
  • 24 Hripcsak G, Suchard MA, Shea S, Chen R, You SC, Pratt N. et al. Real-world evidence on the effectiveness and safety of chlorthalidone and hydrochlorothiazide. JAMA Intern Med 2020; 180 (04) 542-51
  • 25 Schuemie MJ, Cepeda MS, Suchard MA, Yang J, Tian Y, Schuler A. et al. How confident are we about observational findings in healthcare: a benchmark study. Harv Data Sci Rev 2020;2(01): 10.1162/99608f92.147cc28e
  • 26 Voss EA, Boyce RD, Ryan PB, van der Lei J, Rijnbeek PR, Schuemie MJ. Accuracy of an automated knowledge base for identifying drug adverse reactions. J Biomed Inform 2017; 66: 72-81
  • 27 Duke JD, Ryan PB, Suchard MA, Hripcsak G, Jin P, Reich C. et al. Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network. Epilepsia 2017; 58: e101-e106
  • 28 Suchard MA, Schuemie MJ, Krumholz HM, You SC, Chen R, Pratt N. et al. Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes. Lancet 2019; 394: 1816-26
  • 29 Whelton PK, Carey RM, Aronow WS, Casey Jr DE, Collins KJ, DennisonHimmelfarb C. et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: Executive Summary: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension 2018; 71 (06) 1269-324
  • 30 Hripcsak G, Shea S, Schuemie MJ. Chlorthalidone and hydrochlorothiazide for treatment of patients with hypertension-reply. JAMA Intern Med 2020; 180: 1133-4
  • 31 The Medical Letter, Inc. Drugs for hypertension. The Medical Letter on Drugs and Therapeutics 2020; 62 (1598): 73-80
  • 32 Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeekv PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc 2018; 25: 969-75
  • 33 Wang Q, Reps JM, Kostka KF, Ryan PB, Zou Y, Voss EA. et al. Development and validation of a prognostic model predicting symptomatic hemorrhagic transformation in acute ischemic stroke at scale in the OHDSI network. PloS One 2020; 15 (01) e0226718
  • 34 Reps JM, Rijnbeek PR, Ryan PB. Identifying the DEAD: development and validation of a patient-level model to predict death status in population-level claims data. Drug Saf 2019; 42 (11) 1377-86
  • 35 Reps JM, Williams RD, You SC, Falconer T, Minty E, Callahan A. et al. Feasibility and evaluation of a large-scale external validation approach for patient-level prediction in an international data network: validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation. BMC Med Res Methodol 2020; 20: 102
  • 36 Williams RD, Markus AF, Yang C, DuarteSalles T, DuVall SL, Falconer T. et al. Seek COVER: Development and validation of a personalized risk calculator for COVID-19 outcomes in an international network. medRxiv 2020. doi:
  • 37 Burn E, You SC, Sena A, Kostka K, Abedtash H, Abrahão MTF. et al. Deep phenotyping of 34,128 patients hospitalised with COVID-19 and a comparison with 81,596 influenza patients in America, Europe and Asia: an international network study. Nat Commun 2020;11(5009). doi: 10.1038/s41467-020-18849-z
  • 38 Morales DR, Conover MM, You SC, Pratt N, Kostka K, Duarte-Salles T. et al. Renin-angiotensin system blockers and susceptibility to COVID-19: an international open science cohort study. Lancet Digit Health 2021; 3 (02) e98-e114
  • 39 Lane JCE, Weaver J, Kostka K, Duarte-Salles T, Abrahao T, Alghoul H. et al. Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study. Lancet Rheumatology 2020; 2 (11) e698-e711
  • 40 European Medicines Agency. COVID-19: reminder of risk of serious side effects with chloroquine and hydroxychloroquine. 2020 April 28. Available from: [cited 2021 Feb 7]
  • 41 The European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP). Guide on Methodological Standards in Pharmacoepidemiology (Revision 8). EMA/95098/2010. Available from: [Accessed 2021 Feb 7]
  • 42 Observational Health Data Sciences and Informatics. The Book of OHDSI; 2020. Available from: