CC BY-NC-ND 4.0 · Gesundheitswesen 2020; 82(S 02): S139-S150
DOI: 10.1055/a-1009-6634
Originalarbeit
Eigentümer und Copyright ©Georg Thieme Verlag KG 2019

A Comparison of Matching and Weighting Methods for Causal Inference Based on Routine Health Insurance Data, or: What to do If an RCT is Impossible

Ein Vergleich von „matching“ und „weighting“-Verfahren zur Kausalanalyse mit Routinedaten von Krankenversicherungen, oder: Was tun wenn ein RCT nicht möglich ist
Herbert Matschinger
,
Dirk Heider
,
Hans-Helmut König
Further Information

Publication History

Publication Date:
17 February 2020 (online)

Abstract

Due to a multitude of reasons Randomized Control Trials on the basis of so-called “routine data” provided by insurance companies cannot be conducted. Therefore the estimation of “causal effects” for any kind of treatment is hampered since systematic bias due to specific selection processes must be suspected. The basic problem of counterfactual, which is to evaluate the difference between two potential outcomes for the same unit, is discussed. The focus lies on the comparison of the performance of different approaches to control for systematic differences between treatment and control group. These strategies are all based on propensity scores, namely matching or pruning, IPTW (inverse probability treatment weighting) and entropy balancing. Methods to evaluate these strategies are presented. A logit model is employed with 87 predictors to estimate the propensity score or to estimate the entropy balancing weights. All analyses are restricted to estimate the ATT (Average Treatment Effect for the Treated) Exemplary data come from a prospective controlled intervention-study with two measurement occasions. Data contain 35 857 chronically ill insurants with diabetes, congestive heart failure, arteriosclerosis, coronary heart disease or hypertension of one German sickness fund. The intervention group was offered an individual telephone coaching to improve health behavior and slow down disease progression while the control group received treatment as usual. Randomization took place before the insurants’ consent to participate was obtained so assumptions of an RCT are violated. A weighted mixture model (difference-in-difference) as the causal model of interest is employed to estimate treatment effects in terms of costs distinguishing the categories outpatient costs, medication costs, and total costs. It is shown that entropy balancing performs best with respect to balancing treatment and control group at baseline for the first three moments of all 87 predictors. This will result in least biased estimates of the treatment effect.

Zusammenfassung

Aus verschiedensten Gründen kann auf der Basis sogenannter „Routine-Daten“ von Versicherungsgesellschaften ein RCT nicht durchgeführt werden. Daher ist die Schätzung „kausaler“ Effekte unmöglich, da mit systematischer Verzerrung durch spezifische Selektionsprozesse gerechnet werden muss. Die grundlegenden Probleme des „Kontrafaktischen“, also die Beurteilung der Differenz zwischen zwei potentiellen Ergebnissen an derselben Beobachtungseinheit, werden abgehandelt. Der Fokus dieser Studie liegt im Vergleich von methodischen Zugängen die Differenzen zwischen Versuchs- und Kontrollgruppe zu kontrollieren. Alle Methoden basieren auf dem Propensity score, nämlich „Matchig“ bzw. „Pruning“, „Inverse Probability Weighting“ und „Entropy Balancing“. Methoden der Evaluation dieser Strategien werden dargestellt. Zur Balanzierung und/oder Schätzung des Propensity Scores dient ein Logit Modell mit 87 Prädiktoren. Alle Analysen beschränken sich auf die Schätzung des ATT (Average Treatment Effect for the Treated) Als Beispiel dienen Daten aus einer prospektiv kontrollierten Intervention-Studie mit 2 Messzeitpunkten. Die Daten beinhalten 35 857 chronisch kranke Versicherte mit Diabetes, Herzinsuffizienz, Arteriosklerose, Koronarer Herzkrankheit und Hypertonie. Der Interventionsgruppe wurde ein individuelles Telephoncoaching zur Verbesserung des Gesundheitsverhaltens und zur Verlangsamung des Krankheitsfortschrittes angeboten, wohingegen die Kontrollgruppe konventionelle Therapien bekam. Die Randomisierung wurde vor dem Einholen der Teilnahmezustimmung durchgeführt, wodurch die Vorrausetzungen eines RCT verletzt sind. Zur Schätzung des Behandlungseffektes mit Rücksicht auf Kosten wurde ein gewichtetes Mixture Modell (Differenz-in-Differenz) eingesetzt. Dabei wurde zwischen ambulanten Kosten, Medikationskosten und Gesamtkosten differenziert. Es kann gezeigt werden, dass das Verfahren des „Entropy Balancing“ die Verteilung der Prädiktoren zur baseline mit Rücksicht auf die ersten drei Momente am besten balanziert und damit die wohl am wenigsten verzerrten Behandlungseffekte liefert.

1 In the original text D instead of T and P instead of ρ – changed here to achieve notational consistency


Supplementary Material

 
  • References

  • 1 Guo S, Fraser M. Propensity Score Analysis. Statistical Methods and Applications. 2. Ed Los Angeles London New Delhi: SAGE; 2015
  • 2 Austin P. A comparison of 12 algorithms for matching on the propensity score. Stat Med 2014; 33: 1057-1069
  • 3 Baser O. Too much ado about propensity score models? Comparing methods of propensity score matching. Value Health 2006; 9: 377-385
  • 4 Harder V, Stuart E, Anthony J. Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol Methods 2010; 15: 234-249
  • 5 Seeger JD, Bykov K, Bartels DB. et al. Propensity score weighting compared to matching in a study of dabigatran and warfarin. Drug Saf 2017; 40: 169-181
  • 6 Imbens G.. Matching Methods in Practice: Three Examples. IZA Discussion paper Nr. 8049. 2014
  • 7 Imbens G, Wooldridge J. Recent developments in the econometrics of program evaluation. J Econ Lit 2009; 47: 5-86
  • 8 Linden A, Uysal D, Ryan A. et al. Estimating causal effects for multivalued treatments: A comparison of approaches. Stat Med 2015; 35: 534-552
  • 9 Khandker S, Koolwal G, Samad H. Handbook of Impact Evaluation: Quantitative Methods and Practices. Washington DC: The International Bank for Reconstruction and Development / The World Bank; 2010. Available at: DOI: https://doi.org/10.1596/978-0-8213-8028-4
  • 10 Holland PW. Statistics and Causal Inference. J Am Stat Assoc 1986; 81: 945-960
  • 11 Schafer J, Kang J. Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychol Methods 2008; 13: 279-313
  • 12 Rubin D. Bayesian inference for causal effects: The role of randomization. Ann Stat 1978; 6: 34-58
  • 13 Ho D, Imai K, King G. et al. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit Anal 2007; 15: 199-236
  • 14 Stuart E, Lalongo N.. Matching methods for selection of participants for follow-up. Multivar Behav Res 2010; 45: 746-765
  • 15 Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41-55
  • 16 Rosenbaum P. Design of Observational Studies. New York: Springer; 2010. Available at: doi:https://doi.org/10.1007/978-1-4419-1213-8
  • 17 Austin P. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med 2008; 27: 2037-2049
  • 18 Heckman J, Ichimura H, Todd P. Matching as an econometric evaluation estimator. Rev Econ Stud 1998; 65: 261-294
  • 19 Crump R, Hotz J, Imbens G. et al. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009; 96: 187-199
  • 20 Traskin M, Small DS. Defining the study population for an observational study to ensure sufficient overlap: A tree approach. Stat Biosci. 2011; 3: 94-118
  • 21 Rosenbaum PR. model-based direct adjustment. J Am Stat Assoc 1987; 82: 387-394
  • 22 Rosenbaum P, Rubin D. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985; 39: 33-38
  • 23 Stuart E. Matching methods for causal inference: A review and a look forward. Stat Sci 2010; 25: 1-21
  • 24 Rubin D. Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Serv Outcome Res Methodol 2001; 2: 169-188
  • 25 Austin P. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat 2011; 10: 150-161
  • 26 Austin P. Assessing balance in measured baseline covariates when using many-to-one matching on the propensity-score. Pharmacoepidemiol Drug Saf 2008; 17: 1218-1225
  • 27 Li F, Morgan KL, Zaslavsky AM. Balancing covariates via propensity score weighting. J Am Stat Assoc 2018; 113: 390-400
  • 28 Linden A, Adams JL. Using propensity score-based weighting in the evaluation of health management programme effectiveness. J Eval Clin Pract 2010; 16: 175-179
  • 29 Emsley R, Lunt M, Pickles A. et al. Implementing double-robust estimators of causal effects. Stata J 2008; 8: 334-353
  • 30 Imai K, King G, Stuart E. Misunderstandings between experimentalists and observationalists about causal inference. J R Stat Soc Ser A Stat Soc 2008; 171: 481-502
  • 31 Zhao Q, Percival D. Entropy Balancing is Doubly Robust. J Causal Inference. 2017 5. Available at: https://www.degruyter.com/view/j/jci.2017.5.issue-1/jci-2016-0010/jci-2016-0010.xml
  • 32 Austin P.. Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-to-one matching on the propensity score. Am J Epidemiol 2010; 172: 1092-1097
  • 33 Imai K, Ratkovic M. Covariate balancing propensity score. J R Stat Soc Ser B Stat Methodol 2014; 76: 243-263
  • 34 Hainmueller J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit Anal 2012; 20: 25-46
  • 35 King G, Nielsen R. Why propensity scores should not be used for matching. 2016 Available at: http://gking.harvard.edu/publications/why-propensity-scores-should-not-be-used-for-matching
  • 36 Linden A. Graphical displays for assessing covariate balance in matching studies. J Eval Clin Pract 2015; 21: 242-247
  • 37 Linden A. qqplot3: Stata module for plotting unweighted and weighted Q-Q plots.. Available at http://ideas.repec.org/c/boc/bocode/s457856.html 2014
  • 38 Austin P. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 2009; 28: 3083-3107
  • 39 Hansen BB. The essential role of balance tests in propensity-matched observational studies: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine. Stat Med 2008; 27: 2050-2054
  • 40 Freedman D, Berk R. Weighting regressions by propensity scores. Eval Rev 2008; 32: 392-409
  • 41 StataCorp LP. Stata Statistical Software: Release 15. College Station, TX: Stata Corporation; 2017. Available at: www.stata.com
  • 42 Hainmueller J, Xu Y. ebalance: A STATA Package for Entropy Balancing. J Stat Softw 2013; 54: 1-18
  • 43 Ho D, Imai K, King G. et al. MatchIt: Nonparametric preprocessing for parametric causal inference. J Stat Softw 2011; 42: 2-28
  • 44 R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2018. Available at: https://www.R-project.org/
  • 45 Linden A. covbal: Stata module for generating covariate balance statistics. 2016 Available at: http://ideas.repec.org/c/boc/bocode/s458188.html
  • 46 Quan H, Sundararajan V, Halfon P. et al. Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data. Med Care 2005; 43: 1130-1139
  • 47 Frölich M. On the inefficiency of propensity score matching. AStA Adv Stat Anal 2007; 91: 279-290
  • 48 Marcus J. The effect of unemployment on the mental health of spouses – Evidence from plant closures in Germany. J Health Econ 2013; 32: 546-558
  • 49 McCaffrey D, Ridgeway G, Morral A. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 2004; 9: 403-425
  • 50 Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984; 79: 516-524
  • 51 Rosenbaum P. Observational Studies. 2nd edition New York: Springer; 2002