Introduction
There is increasing interest in the use of administrative data in health services
and clinical research. Administrative data are routinely collected during clinic,
hospital, laboratory, or pharmacy visits for administrative purposes.[1] Administrative databases provide easy and cheap access to large numbers of patients
over expansive geographic regions. Although these databases were initially designed
to reimburse health care services and to track differences in services and the use
for state and national agencies, they are increasingly being used for epidemiological,
effectiveness, and safety outcomes research. However, there are several limitations
that must be considered, and critical appraisal of studies that utilize administrative
databases is important.
Publically Available Databases
Publically Available Databases
Several administrative databases are available. In the United States, the largest
publicly available all-payer inpatient care database is the Nationwide Inpatient Sample
(NIS). The NIS was developed for the Healthcare Cost and Utilization Project (HCUP)
through funding from the Agency for Healthcare Research and Quality (AHRQ). The NIS
contains hospital stay data starting in 1998 that includes diagnoses, admissions and
discharge, demographics, and outcomes data from a sample of approximately 20% of patients
admitted to all community hospitals in the United States (5–8 million patients per
year).[2]
Table 1
Spectrum Research checklist for evaluating the quality of administrative database
studies
|
Methodological principle
|
|
•Study design
|
|
Administrative database comparative study
|
|
Administrative database case–control study
|
|
Administrative database case series
|
|
•Why database was created clearly stated
|
|
•Description of database's inclusion/exclusion criteria
|
|
•Description of methods for reducing bias in database
|
|
•Codes and search algorithms reported
|
|
•Rationale for coding algorithm reported
|
|
•Code accuracy reported
|
|
•Code validity reported
|
|
•Clinical significance assessed
|
|
•Is the period of data consistent with the outcome data?
|
|
•Statement regarding whether data stems from single or multiple hospital admissions
|
|
•Statement regarding whether data stems from single or multiple procedures
|
|
•Accounting for clustering
|
|
•Number of criteria met (maximum: 12)
|
The HCUP also provides several other health care databases besides the NIS, including
the Kids' Inpatient Database (KID), the Nationwide Emergency Department Sample (NEDS),
as well as a variety of state databases. HCUP has software tools available that allow
users to access information from the databases. HCUP was created to provide a robust
source of health care data that could be used to further research, improve health
care, and inform decision making.
Another large database in the United States is the Centers for Medicare and Medicaid
Services (CMS) administrative data files, which contains information on approximately
98% of adults 65 years of age and older enrolled in Medicare (more than 45 million
people). Data from these administrative databases are useful in health care research
in that they provide clinical validity, information on population coverage, and linkage
to other data sets.[3]
Increasingly, commercially available services such as PearlDiver[4] and IMS Health Incorporated[5] are being used by universities, medical device manufacturers, and government agencies.
These companies utilize large health claims databases comprised of records from private
insurers, government databases, pharmacy prescriptions, and manufacturers to provide
clinical effectiveness and health care management services.
Strengths of Administrative Data
Strengths of Administrative Data
Administrative data sets provide a readily available source of “real-world” health
care data on a large population of unselected patients.[6] Because of the sheer numbers of patients included in databases such as the NIS,
the data are considered to be representative of the populations of interest.[7] Administrative databases can serve as useful and inexpensive resources for reliably
reported data associated with accepted coding systems, including procedure volumes,
length of stay, as well as reliably reported outcomes such as death.[1]
[7] Furthermore, administrative data can be used to evaluate health care utilization
as well as outcomes that differ by patient demographics or geographical locale.[6]
Limitations of Administrative Data
Limitations of Administrative Data
One limitation inherent in administrative data is the reason for their creation. Because
they are typically intended for financial and administrative management rather than
for research purposes, they may vary in the degree of detail and accuracy.[7]
[8]
[9] For example, they may prove to be less reliable information sources for events that
may not result in a medical visit or use of a diagnostic code, such as nausea. Furthermore,
the coding of administrative data may be nuanced in terms of how ICD-9 codes (International
Classifications of Diseases, Ninth Revision) are applied or how physician records
are interpreted by the medical reviewer entering the codes.[6]
[7] One recent report showed data suggestive of underreporting of perioperative stroke
occurring with carotid endarterectomy and stenting in the NIS data set,[8] whereas another reported the complexity in evaluating national rates of mortality
from pneumonia due to changing coding practices.[10]
Critical Appraisal of Administrative Data Studies
Critical Appraisal of Administrative Data Studies
Guidelines to govern high-quality administrative database studies are presently under
development by the Reporting of Studies Conducted using Observational Routinely Collected
Data collaborative.[11]
[12] However, criteria that constitute high-quality administrative database studies have
recently been proposed.[12]
[13] Here, we have summarized such proposed criteria for critical appraisal of administrative
studies ([Table 1]). These criteria can act as a checklist of things to consider if you are planning
a study using administrative data. As described in previous “Science in Spine” articles,
using a focused, answerable research question and the PICOTS/PPOTS framework are important
to planning your study.
Robust Descriptions of the Data Set
Robust Descriptions of the Data Set
Clear descriptions should be provided regarding how and why the database was created.[12]
[13] To that end, the database's inclusion and exclusion criteria should be clearly stated.
The reader then can use these descriptions to assess the potential for biased or missing
information as it relates to the study at hand.[13]
Code Accuracy
Because administrative data are coded, administrative database studies should clearly
state the diagnostic and/or procedural codes used in the search algorithm as well
as the reason for selecting the codes. In addition, the accuracy of the codes to identify
a particular diagnosis or outcome should be reported to provide an estimate of the
percentage of misclassified data. This information provides insight as to how well
the code(s) represent the actual diagnosis, procedure, or outcome and allows the reader
to gauge the level of resulting bias. Code accuracy can be measured using several
different types of code validation studies,[13] the most reliable of which are “gold standard” validation studies. These studies
compare the code to a gold standard known to provide accurate information, such as
laboratory test results required for diagnosis. Ideally, code validity statistics
will be reported in terms of the probability that a patient identified with a code
actually has the condition of interest, although other methods such as positive predictive
value, sensitivity and specificity, and positive likelihood ratio may also be used.[13]
Clinical Significance
Because in large database studies very small differences between groups can result
in statistically significant differences, results should not be interpreted based
solely on p value because these differences may not be clinically relevant. Instead, results
should be interpreted based on clinical relevance and on the absolute and relative
differences between treatment groups.[13]
Time-Dependent Bias
Time-dependent patient variables are those which can change during the period of observation.
If the values of such variables are unknown at baseline but are assessed as if they
were known, time-dependent bias of the results may occur.[13] Other factors that should be considered include whether the data set specifies the
following: the same time period consistent with the length of follow-up for the outcome
data; whether it includes data from the initial hospital admission alone or in addition
to data from repeat admissions; and whether it includes data from the first procedure
only or in addition to data from repeat procedures.
Clustering
Because data obtained from administrative data sets are subject to clustering, a study
should properly account for clustering that may be present in the data set. One example
of clustering is a specific diagnosis (e.g., acute myocardial infarction) treated
by emergency room physicians only within academic hospitals. Multivariate regression
models can be used to control for clustering and avoid the potential for misleading
conclusions.[13]
Summary
Administrative data provide researchers with relatively inexpensive access to large
numbers of patients nationwide and are increasingly being used for epidemiological,
effectiveness, and safety outcomes studies. Publically available databases from sources
such as the NIS and CMS provide information on large proportions of medical visits
in the United States, and provide a good source of “real-world” health care data for
reliably reported data. However, because administrative data are primarily gathered
for billing purposes rather than research purposes, there are several limitations
that must be considered, including the potential for inaccuracy and bias. As for all
study types, critical appraisal of administrative database studies are critical to
avoid arriving at inaccurate conclusions.