The Utility of Imputed Matched SetsAnalyzing Probabilistically Linked Databases in a Low Information Setting
19 August 2013
accepted: 18 February 2014
20 January 2018 (online)
Objective: To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information.
Methods: A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status.
Results: High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p < 0.002), but imputed matched sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust.
Conclusions: The level of information available to a linkage is an important con -sideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case- specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.
- 1 Newcombe HB. Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business. New York City: Oxford University Press; 1988
- 2 Jaro MA. Probabilistic linkage of large public health data files. Stats Med 1995; 14 5-7 491-498.
- 3 Jaro MA. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. J Am Statist Assoc 1989; 84 (406) 414-420.
- 4 Bell RM, Keesey J, Richards T. The Urge to Merge: A Computational Method for Linking Datasets with No Unique Identifier. Western Users of SAS Software. 1993
- 5 Roos LL, Wajda A. Record linkage strategies. Part I: Estimating information and evaluating approaches. Methods Inf Med 1991; 30 (02) 117-123.
- 6 Cook LJ, Olson LM, Dean JM. Probabilistic record linkage: relationships between file sizes, identifiers and match weights. Methods Inf Med 2001; 30 (02) 117-123.
- 7 Barnett AS, Wang NE, Sahni R, Hsia RY, Haukoos JS, Barton ED. Variation in prehospital use and uptake of the national Field Triage Decision Scheme. Prehosp Emerg Care 2013; 17 (02) 135-148.
- 8 Bohensky MA, Jolley D, Pilcher DV, Sundararajan V, Evans S, Brand CA. Prognostic models based on administrative data alone inadequately predict the survival outcomes for critically ill patients at 180 days post-hospital discharge. Journal of Critical Care 2012; 27 (04) 422-421.
- 9 Cook LJ, Knight S, Olson LM, Nechodom PJ, Dean JM. Motor vehicle crash characteristics and medical outcomes among older drivers in Utah. Ann Emerg Med 2000; 35 (06) 585-591.
- 10 Marx FM, Dunbar R, Hesseling AC, Enarson DA, Fielding K, Beyers N. Increased risk of default among previously treated tuberculosis cases in the Western Cape Province, South Africa. Int J Tuberc Lung Dis 2012; 16 (08) 1059-1065.
- 11 Olsen CS, Cook LJ, Keenan HT, Olson LM. Driver seat belt use indicates decreased risk for child passengers in a motor vehicle crash. Accid Anal Prev 2010; 42 (02) 771-777.
- 12 Chamberlayne R, Green B, Barer ML, Hertzman C, Lawrence WJ, Sheps SB. Creating a population-based linked health database: a new resource for health services research. Can J Public Health 1998; 89 (04) 270-273.
- 13 Dean JM, Vernon DD, Cook L, Nechodom P, Reading J, Suruda A. Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services. Ann Emerg Med 2001; 37 (06) 616-626.
- 14 Tromp M, Ravelli ACJ, Meray N, Reitsma JB, Bonsel GJ. An efficient validation method of probabilistic record linkage including readmissions and twins. Methods Inf Med 2008; 47: 356-363.
- 15 Oberaigner W. Errors in Survival Rates Caused by Routinely Used Deterministic Record Linkage Methods. Methods Inf Med 2007; 46: 420-424.
- 16 Hyde LK, Cook LJ, Olson LM, Weiss HB, Dean JM. Effect of motor vehicle crashes on adverse fetal outcomes. J Obstet Gynaecol 2003; 102 (02) 279-286.
- 17 Hyde LK, Cook LJ, Knight S, Olson LM. Graduated driver licensing in Utah: is it effective?. Ann Emerg Med 2005; 45 (02) 147-154.
- 18 Smith ME. Record linkage: present status and methodology. Journal of Clinical Computing 1984; 13 2-3 52-71.
- 19 Tefft BC. Prevalence of motor vehicle crashes involving drowsy drivers, United States, 1999-2008. Accid Anal Prev 2012; 45: 180-186.
- 20 Coste J, Quinquis L, Audureau E, Pouchot J. Non response, incomplete and inconsistent responses to self-administered health-related quality of life measures in the general population: patterns, determinants and impact on the validity of estimates - a population-based study in France using the MOS SF-36. Health and Quality of Life Outcomes 2013; 11: 44
- 21 Fatovich DM, Phillips M, Langford SA, Jacobs IG. A comparison of metropolitan vs rural major trauma in Western Australia. Resuscitation 2011; 82 (07) 886-890.
- 22 Worni M, Scarborough JE, Gandhi M, Pietrobon R, Shortell CK. Use of Endovascular Therapy for Peripheral Arterial Lesions: An Analysis of the National Trauma Data Bank from 2007 to 2009. Ann Vasc Surg 2013; 27 (03) 299-305.
- 23 McGlincy MH. A Bayesian record linkage methodology for multiple imputation of missing links. Toronto, CA: JSM Proceeding; 2004
- 24 Schafer JL. Analysis of Incomplete Multivariate Data. Boca Raton, Florida: Chapman & Hall/CRC; 1997
- 25 Newgard C, Malveau S, Staudenmayer K, Wang NE, Hsia RY, Mann NC. et al Evaluating the use of existing data sources, probabilistic linkage, and multiple imputation to build population-based injury databases across phases of trauma care. Acad Emerg Med 2012; 19 (04) 469-480.
- 26 Olsen CS, Thomas AM, Cook LJ. Hospital charges associated with motorcycle crash factors: a quantile regression analysis. Inj Prev doi: 10.1136/ injuryprev-2013-040881.
- 27 Conner KA, Xiang H, Smith GA. The impact of a standard enforcement safety belt law on fatalities and hospital charges in Ohio. J Saf Res 2010; 41 (01) 17-23.
- 28 Thomas AM, Thygerson SM, Merrill RM, Cook LJ. Identifying work-related motor vehicle crashes in multiple databases. Traffic Inj Prev 2012; 13 (04) 348-354.
- 29 Thygerson SM, Merrill RM, Cook LJ, Thomas AM. Comparison of factors influencing emergency department visits and hospitalization among drivers in work and nonwork-related motor vehicle crashes in Utah, 1999-2005. Accid Anal Prev 2011; 43 (01) 209-213.
- 30 McGlincy MH. LinkSolv. 8.1.9077 ed. Morrisonville, New York: Strategic Matching; 2000
- 31 Wang HE, Balasubramani GK, Cook LJ, Lave JR, Yealy DM. Out-of-hospital endotracheal intubation experience and patient outcomes. Ann Emerg Med 2010; 55 (06) 527-537.
- 32 SAS Institute Inc. SAS Software. 9.2 ed. Cary, NC. SAS Institute Inc.. 2002