Methods Inf Med 2014; 53(03): 186-194
DOI: 10.3414/ME13-01-0094
Original Articles
Schattauer GmbH

The Utility of Imputed Matched Sets

Analyzing Probabilistically Linked Databases in a Low Information Setting
A. M. Thomas
1  The Intermountain Injury Control Research Center, University of Utah School of Medicine, Department of Pediatrics, Division of Critical Care, Salt Lake City, Utah, USA
,
L. J. Cook
1  The Intermountain Injury Control Research Center, University of Utah School of Medicine, Department of Pediatrics, Division of Critical Care, Salt Lake City, Utah, USA
,
J. M. Dean
1  The Intermountain Injury Control Research Center, University of Utah School of Medicine, Department of Pediatrics, Division of Critical Care, Salt Lake City, Utah, USA
,
L. M. Olson
1  The Intermountain Injury Control Research Center, University of Utah School of Medicine, Department of Pediatrics, Division of Critical Care, Salt Lake City, Utah, USA
› Author Affiliations
Further Information

Publication History

received: 19 August 2013

accepted: 18 February 2014

Publication Date:
20 January 2018 (online)

Summary

Objective: To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information.

Methods: A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status.

Results: High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p < 0.002), but imputed matched sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust.

Conclusions: The level of information available to a linkage is an important con -sideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case- specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.