Key words
cross-national research - comparative health research - rare diseases - bias - health
insurance claims data
Schlüsselwörter
Seltene Erkrankungen - Sekundärdatenanalyse - Länderübergreifende Forschung - Vergleichende
Gesundheitssystemforschung - Bias - Routinedaten
Introduction
Cross-national comparisons of health systems follow at least three aims: learning
about health systems, learning why they are what they are and
learning from other countries [1]. The
European Union (EU) is a rich source of potential comparisons – a natural
laboratory [2] – as the Member States
employ diverse approaches to organizing health systems with varying results. Based
on the Treaty mandate regarding health (Article 168 (2) TFEU) an important area of
EU institutions’ work is the identification of best practices in health
policy and their dissemination.
Digital transformation of health systems has created vast collections of health data:
from healthcare providers (electronic patient records), insurance companies
(insurance claims databases), and other parties [3]. For health researchers, these collections of data promise several
advantages: ready availability, low cost, large number of patients and time points
included and diversity in patients and settings [4]. Health insurance claims data have been commonly used in the USA and
Canada since the 1980s for epidemiological, health services and health economics
research [5]. European countries have recently
made it possible to use health insurance claims data for research purposes [6]. For all its promise, analysis of
administrative health data presents a novel set of challenges. Because the data
collection methods were designed with purposes other than research in mind, the data
might be biased [7] or limited in scope [8].
Comparative health research utilizing surveys has identified comparability in
measurement methodologies employed as key to avoiding biased results [9].The European Union [10], WHO Europe [11] and the OECD [12] have worked to harmonize health data
collection procedures, increasingly including morbidity and cost information, but
this work is ongoing and includes a limited scope of indicators. If issues of
comparability are not carefully considered by researchers, they can lead to
erroneous conclusions [13]. Biased or
erroneous conclusions in this context mean that the differences in health status or
healthcare provision between countries uncovered by the studies reflect differences
in measurement, data collection or other methodological issues, instead of the
differences between the respective populations’ health and health systems
that are really present.
Germany and the Netherlands are two EU countries where insurance claims data is
routinely available and accessible for researchers. Both countries also share a
similar organization of their health systems. For example, both countries have
social health insurance, a mix of public and private providers and similar health
governance structures [14]
[15]. This presumably makes Germany and the
Netherlands the most likely case of valid comparisons in health and healthcare. If
comparability of insurance claims datasets is limited in the example of this country
pair, one can assume that cross-national comparisons will be more biased for other
examples with more different health systems.
When using insurance claims data generated in different jurisdictions with different
measurement, data collection and data processing procedures for comparative health
research, questions of comparability need to be considered [16]. Given absence of a comprehensive framework
for cross-national research using secondary analysis of health insurance claims
data, we pursue two objectives:
-
propose a framework of comparability in cross-national health services
research using health insurance claims data and
-
apply it to the German (DaTraV) and Dutch (Vektis) national insurance claims
datasets.
Our research question is: “To what extent are the DaTraV and Vektis datasets
comparable and therefore useful in cross-national health research between Germany
and the Netherlands?”
Methods
Comparability in cross-national research
In this section, we propose a framework of comparability in cross-national
research using secondary analysis of insurance claims data. Spector et al.
provide a framework of methods-induced differences that bias results in
cross-national research using collection and analysis of primary data [17]. They identify three domains of
comparability that researchers need to pay attention to: samples, data
collection and the measurement instrument. Based on their work, we identify
three domains of health insurance claims datasets’ characteristics that
need to be comparable across countries to avoid inducing differences and biasing
results in cross-national research using secondary data analysis.
Spector et al. note that researchers need to ensure that the samples used are
representative of the setting. Translating this principle to secondary data
analysis, researchers need to understand the underlying populations that are
represented in the health insurance claims databases they draw data from and
carefully consider whether these populations are representative of the
populations that they aim to compare. In summary, the first dimension that needs
to be comparable between insurance claims datasets is the representation of
populations compared.
Spector et al. also point out that researchers should ideally use comparable
measurement and analytical procedures, which includes the comparability of
concepts and their operationalization in the measurement instrument. In
transferring this principle into the area of secondary data analysis, we can
lean on the experiences of other groups gained during the development of the
European Common Health Indicators (ECHIs). There, comparability of measures
across countries was ensured through harmonization of the definitions of the
underlying concepts, data sources, data collection and data processing
procedures (i. e., subsequent transformations or calculations) [10]. In summary, the second dimension of
health insurance claims datasets that needs to be comparable are the underlying
concepts, data sources, data collection and data processing. However, the
importance of the comparability of the underlying concepts depends largely on
the research question. While death is a clear concept with little cross-national
variation in its operationalization, a diabetes care trajectory may be defined
very differently between health systems. Therefore, while we include
comparability of underlying concepts in our framework for completeness, we will
not explicitly examine this aspect later in the paper.
Beyond the dimensions of the Spector et al. framework, our framework includes a
third dimension of comparability: the database contents and availability for
research purposes. To compare population health or healthcare provision in
different jurisdictions using health insurance claims data, both databases need
to contain the relevant information and be available to international research
groups. It plays a fundamental role in limiting the possible avenues of
cross-national research in general and therefore biases the kinds of research
questions that can be asked.
[Figure 1] applies our framework to an
example of a study of the difference in incidence of disease X in two countries
that uses health insurance claims data. It highlights that the overall
difference in incidence rates found by the study can be separated into two
components. The first component represents the real difference while the
second component represents the difference induced by measurement and
data collection. The latter component is then further decomposed into the first
two domains of our framework: (1) difference due to incomparable representation
of the populations and (2) difference due to incomparable underlying concepts,
data sources, data collection or data processing procedures. Domain (3),
incomparable database contents and lack of research access, is portrayed as
preceding the study by influencing the feasibility of the comparison.
Fig. 1 Sources of difference in cross-national comparisons.
Data collection and analysis
The three domains in the framework described above are used as a conceptual lens
to analyze the comparability of the DaTraV and Vektis insurance claims
databases. The aim of the data collection step was not to systematically uncover
all available descriptions of the datasets, but rather to achieve saturation in
terms of a satisfactory description of the datasets in each of the three
domains.
To populate each of the domains we relied on publicly available information about
the datasets in official government documents and scientific publications that
used the databases. To identify the former group of documents, the DIMDI
(www.dimdi.de) and Vektis (www.vektis.nl) websites were hand searched for
relevant information. For the relevant scientific publications, we searched the
DIMDI and Vektis websites, as well as the MEDLINE and Social Sciences Citation
Index (SSCI) electronic databases using the keywords “DaTraV”
and “Vektis”.
An additional source of information was our first-hand experience with the
databases, which we gathered during working on the Euregio Meuse-Rhine Rare
Diseases (EMRaDi) project. Within EMRaDi, the estimation of the number of rare
disease patients and the costs of rare diseases is envisaged utilizing national
insurance claims databases (20).
Results
This chapter provides a brief summary of the history and legal basis of the national
insurance claims datasets in Germany (DIMDI) and the Netherlands (Vektis) and
analyzes their comparability according to the three dimensions from our framework:
(1) representation of populations compared, (2) data sources and data processing and
(3) database contents and availability for research purposes.
DaTraV dataset
History and legal basis
Since 2014, an anonymized Germany-wide insurance claims database is available
for research purposes. The name DaTraV (Datentransparenzverordnung)
stems from one of the two legal acts, which brought the database into
existence. It is maintained by the German Institute for Medical
Documentation and Information (DIMDI), a part of the Federal Ministry
for Health. Changes to the Social Code Book V (SGB V) led to the
development of the Data transparency regulation
(Datentransparenzverordnung) in September 2012, which tasked
DIMDI to secure storage of insurance claims data and the maintenance of a
database [18].
Access to the database is regulated in the SGB V and explicitly includes
research institutions. The research-relevant objectives stated in the
statute include: improving the quality of care, long-term analyses of
treatment processes, analysis of supply processes to detect undesirable
developments and for starting points for reforms (over-, under- and misuse),
support of political decision-making processes for the further development
of statutory health insurance, and analysis and development of novel care
provision approaches [18]
[19].
DIMDI’s website currently (April 2019) lists thirteen successful
completed projects that have used the DaTraV dataset for research purposes
[20].
Representation of the population
Data in the DaTraV includes insured persons in the statutory health insurance
(GKV), which is the majority - approximately 70 million Germans or roughly
90% of the population [21]
[22]. This means that the part of the
population - persons insured in the private health insurance system - is
excluded and that the database cannot be considered representative of the
German population, as there are important differences between the
populations included in the respective systems [23]. In addition, the claims data of
persons who died are missing in the year of death, an estimated 750 000
cases per year [24]. This could
introduce a risk of bias especially for acute conditions with a high
short-term risk of death (e. g. acute myocardial infarction).
Data sources, collection and processing
DIMDI receives insurance claims from the German Federal (Social) Insurance
Office (BVA) on an annual basis, which in turn receives the
information from individual insurers. The purpose is morbidity-based risk
equalization procedure among the insurers, which is the task of the BVA.
Before the annual transfer of data takes place, BVA verifies and corrects
the source data for completeness and plausibility according to procedure
agreed upon by DIMDI and BVA (a description of this procedure is not
publicly available). DIMDI then pseudonymizes and collates the claims data
into a longitudinal database [19].
Database items and research access
The DaTraV currently (April 2019) contains information related to the insured
person (gender and age, insurance status, insurer), costs medical services
rendered in the ambulant and stationary sectors (ICD-10 coded diagnoses),
medicines prescribed, and costs related to ambulatory medical services,
stationary medical services, dentists’ services, pharmacies, other
services and sickness compensation for years 2009–2014. The postal
code of insured persons is also available for years 2009 and 2010.
Access to the dataset for research purposes is possible after DIMDI
scrutinizes the request for eligibility and adequate protection of privacy.
Researchers can develop the analysis script (SQL) themselves based on the
example dataset available on the DIMDI website [25], they can request that DIMDI create
the script for them, or perform the analysis on-site. Researchers are
allowed to inspect and export an aggregated results table. A minimum number
of patients per cell in the exported table is set by DIMDI on a case-by-case
basis to prevent re-identification [26]. The basic charge of processing a request is €200,
with an additional cost of €300 per evaluated year. An additional
cost of €100 per personnel hour will be charged to adjust a
user-developed script or for DIMDI to develop the script, up to a maximum of
€400 or €700, respectively[26]. The legally mandated waiting time between receipt of a
request and processing is three months with an additional extension of one
month in complex cases or for other justified reasons[18]. In our experience, the waiting
time can currently exceed 12 months. ([Table 1])
Table 1 A summary of DaTraV and Vektis databases
according to the domains of induced differences identified in
[Figure 1].
|
“DaTraV” (Germany)
|
Vektis (Netherlands)
|
Potential comparability
|
Population representation
|
Population covered by compulsory insurance (90%
of residents).
|
Population covered by compulsory insurance (100%
of residents).
|
The selective exclusion of privately insured individuals
(high-income groups) from the DaTraV dataset could bias
the comparison.
|
Data source
|
German Federal (Social) Insurance Office
(Bundesversicherungsamt - BVA), which
receives data from individual insurers.
|
Individual insurers
|
The underlying source of data in both cases are
insurers.
|
Data collection
|
Annual collection in the context of morbidity-based risk
equalization procedure among the insurers between BVA
and DIMDI after pre-processing according to a
non-publicly defined procedure.
|
Data collection is performed in the context of
morbidity-based risk equalization. Data collection
occurs every quarter.
|
Data was collected in both cases with the same
overarching purpose. However, the lack of a public
description of the pre-processing phase introduces some
uncertainty.
|
Data processing
|
Pseudonymization and collation into a longitudinal
dataset.
|
Collation into a longitudinal dataset. Pseudonymization
when made available to researchers.
|
The key processing steps are same in both databases, but
details might differ.
|
Database contents and research access
|
Full dataset and variable description is available
(DIMDI, 2018b), and includes information related to the
insured person (gender and age, insurance status,
insurer), costs medical services rendered in the
ambulant and stationary sectors (including diagnoses
given according to ICD-10), medicines prescribed, and
costs related to ambulatory medical services, stationary
medical services, dentists’ services,
pharmacies, other services and sickness compensation for
years 2009–2014, as well as location data for
2009 and 2010. Researchers request access in an
application process. If successful, they can extract an
aggregated results table, where the minimum number of
patients per cell needs to make re-identification of
patients impossible. Costs of access depend on
the complexity of request (flat fee per year of data
included) and amount of expert advice required from
DIMDI.
|
No public in-depth description of the dataset is
available. However, all of the claims types (all DRGs
and their descriptions, types of consultations, etc.)
are clearly defined and their descriptions publicly
available elsewhere. Based on descriptions in
research and our experiences, the dataset includes data
on claims for hospital and mental health care services
based on a home-grown DRG system, consultation based
claims for GP care, prescription based claims for
pharmaceuticals including the type of medication and the
dosage. All claims data include the date on which the
service was delivered or commenced, the organization
and/or professional who delivered the service,
and the price paid for the service. Researchers
can freely access aggregated data per municipality and
by 5-year age groups. Researchers can apply for access
to individual cost data and clinical data, access to
both of which is subject to approval.
|
Both databases offer access for research
purposes. The architectures are similar in that
they offer access to cost information. However,
the DaTraV dataset offers patient selection based on
ICD-10 diagnoses, while the Vektis system focuses either
on DRGs in the hospital and mental health sectors,
broader consultation types in primary care, or
pharmaceuticals prescribed for disease
identification.
|
Vektis insurance claims database
History and legal basis
Since 2006, Vektis manages a nation-wide database of claims covered by the
Health Insurance Act (HIA). The act obliges Dutch inhabitants to purchase a
basic statutory health insurance package from a private health insurer [14]. The package includes primary care,
maternity care, hospital care, mental health care, home nursing care,
prescription pharmaceuticals and some allied healthcare services [14]. The majority of Dutch citizens
furthermore purchase an additional, voluntary insurance package, which
typically cover the costs of services such as physical therapy, dentistry
and glasses. All insurers submits their claims data to Vektis, a subsidiary
of Health Insurers Netherlands, the umbrella organization of the private
health insurers in the country, towards the purpose of risk-equalization.
Furthermore, the database includes all claims made under the Long-term Care
Act (LCA), which was introduced in 2015 and covers residential and home
care, are reimbursed by so-called care offices [14]. ([Table 2])
Table 2 Key data elements in DaTraV and Vektis
databases.
|
DaTraV (Germany)
|
Vektis (Netherlands)
|
Demographic information
|
Year of birth
|
Year of birth
|
Gender
|
Gender
|
Death
|
Death
|
Postal code
|
Postal code
|
Insurance information
|
Number of days insured
|
Years with the same insurer
|
Insurer
|
Insurer
|
Medicines
|
Prescription date
|
Prescription date
|
Medication (PZN)
|
Medication
|
Quantity
|
Quantity
|
|
Prescriber
|
|
Supplier
|
Stationary sector
|
Discharge month
|
Start and end date of care product (DBC)
|
ICD-10 code
|
Care products (DBC)
|
Ambulatory sector
|
Quarter
|
Month or Quarter
|
ICD-10 code
|
Type of practitioner and type of treatment
|
Number of treatments
|
Number of treatments
|
Costs
|
Year
|
Specific date of any procedure performed
|
Doctor
|
General practitioner
|
|
Second-line and specialist mental health care
|
Dentist
|
Dentist (split by adult or youth care, some common
procedures)
|
Medicines
|
Medicines
|
Hospital
|
Specialist care
|
Other
|
Medical devices
|
|
Physiotherapy and other paramedical services
(e. g. speech therapy)
|
|
Maternal and obstetric care
|
|
Primary mental health care
|
|
Nursing and home care
|
|
Other
|
Sickness benefit
|
|
Representation of the population
All (i. e., 100%) insured residents in the Netherlands are
included in the dataset. Despite the fact that purchasing basic insurance is
mandatory, approximately 25 000 inhabitants do not purchase a basic
package [14]. The data therefore
covers almost the entire Dutch population for services covered by the basic
insurance package. Approximately 85% of the Dutch population
purchases voluntary additional insurance, which is also included in the
dataset [14]. The LCA claims amount to
approximately 340 thousand.
Data sources, collection and processing
Insurers only reimburse claims which have undergone, and passed, an
(electronic) accuracy review [27].
Towards this purpose, they store data regarding their enrollees, healthcare
providers and health services provided [28]. Health insurers (24 insurers consolidated in 9 companies)
submit their data to Vektis electronically on a quarterly basis at an
individual level per claim. Vektis standardizes the incoming data to remove
any remaining discrepancies in storing formats between insurers [29].
Database items and research access
The data stored by Vektis includes information of the service provided, the
organization where the service was provided (e. g. location and
type), the professional who provided the service (e. g. age,
specialization, affiliations) and the insured who received the service
(e. g. age, gender, postal code, and insurer). The structure of the
data regarding the service provided and level of (clinical) detail that can
be retrieved from it varies across different types of services. Claims for
hospital and mental healthcare services for example, are structured around
the Dutch DRG (i. e. DBC) system, which has been majorly reformed in
2012, and contain (detailed) information regarding the condition and
treatment. On the other hand, claims for GP care are consultation-based and
contain little to none medical information, while claims for pharmaceuticals
are prescription based and include the type of medication and the dosage.
Depending on the duration of the treatment, claims appear in Vektis’
database within approximately two years.
Researchers can access Vektis’ data for non-commercial projects which
‘aim to improve Dutch health care and have societal
relevance’ [30]. Besides the
data made publicly available by Vektis through www.zorgprismapubliek.nl,
which has been used in academic research (cf. [29]), researchers can access
Vektis’ data in two ways. An aggregated version of Vektis’
data can be accessed through Statistics Netherlands (CBS). These data
include the sum of the costs incurred in each of the cost categories covered
by the basic insurance package per inhabitant per year. Requests need to be
approved by Vektis and CBS. This route, including its costs and process, is
further specified in the CBS data catalog [31]. Data can be accessed remotely and linked to a range of other
sources stored within the CBS environment [32].Researchers who require data at the individual level
including claims (i. e., clinical) details can submit a proposal to
Vektis, outlining the purpose and relevance of their research and required
data. Vektis scrutinizes the proposals and presents them to the health
insurers (once every month) who also need to approve it. Individual-level
data needs to remain in Vektis’ protected environment and analyzed
on-site in Zeist, the Netherlands. Vektis charges a processing fee of 125
Euro/hour (excluding VAT) in addition to a user fee between 250 and
5000 € [30]. Vektis as well as
CBS operate under the general rule that datasets with n>10 per cell
can be provided to researchers externally.
Discussion
We explored the feasibility of cross-country comparisons between Germany and the
Netherlands using health insurance claims data based on comparability of the two
national insurance claims databases. We find that while the purpose, underlying data
sources and structures of both databases are highly similar, their implementation
may present important challenges to comparability.
Both databases collect information on all residents of the countries that are
included in the respective national mandatory insurance schemes. However, in
Germany, this excludes approximately 10% which is covered as part of the
private insurance system. Previous research [23] shows that this portion of the population is systematically different
from the remainder, which begs the question whether differences uncovered between
the two databases could be influenced by selection bias in the DaTraV database. This
is particularly important in cross-national comparisons of disease burden
(e. g. incidence of cancer), which can be lower in the usually wealthier
population excluded from the DaTraV database. Researchers may begin to overcome this
limitation by being cognizant of this source of bias, and carefully considering
standard methods of adjusting the epidemiological estimates for differences in
socio-demographic characteristics of compared populations [33].
DaTraV and Vektis databases are comparable in terms of sources of data (insurance
claims data) and have the same overarching purpose (morbidity-based risk
adjustment). However, the opacity in the data processing step makes a final
determination of comparability impossible. Potential differences in this step,
especially if they involve excluding certain individuals with supposedly poor
quality data, could introduce bias. We urge both data holders to make available
detailed descriptions of all data processing steps, especially any rules for
excluding individuals or group from the databases.
Both databases are available for research purposes after approval from respective
data management authorities. Both databases are comparable in terms of variables
included: demographic information (e. g. sex and age) of the insured persons
and the costs of healthcare services rendered broken down by healthcare sector and
condition. However, the way patient populations are identified differs. The DaTraV
dataset includes ICD-10 diagnoses, while Vektis relies on procedure codes
(e. g. DRGs). While this makes identification of the same patient
populations challenging, it is not impossible. Previous research in the Netherlands
that used Vektis data successfully identified diabetic and vascular disease
treatment pathways using carefully curated collections of procedure codes [28]. This suggests that cross-national
comparisons of wider disease groups using DaTraV and Vektis datasets are possible,
but that caution and significant effort are necessary. To make this type of research
more feasible, we urge both data holders to consider adding more diagnosis-related
descriptors to their dataset (e. g. ICD-10, SNOMED CT, etc.).
Beyond a comparison of DaTraV and Vektis databases, this paper addresses a larger
challenge for comparative health researchers using routinely collected health data.
Valuable data remains locked away in difficult to access silos. Even when accessed,
the data might be of questionable comparability and therefore of limited value for
cross-national research. While the comparability framework we propose is focused on
health insurance claims datasets, we are confident it is just as relevant for other
routinely collected health data. Various European level projects are currently
working to harmonize the databases of routinely collected health data throughout the
European union [34], but these results will
not be available in the near term. However, we must not be dissuaded from trying to
make progress, working to harmonize routinely collected health databases
step-by-step and database-by-database. Only with access to comparable health data
can we encourage faster diffusion of best practices and make our health systems more
effective, more efficient and more responsive to the needs of patients and
citizens.
Conclusions
Analysis of administrative health data is a promising approach to comparative health
research due to its ready availability, low cost and a large number of diverse
patients and settings included. For the results of comparative studies to reflect
the realities on the ground, three domains of database characteristics need to be
considered: (1) representation of populations compared, (2) data sources and data
processing and (3) database contents and availability for research purposes. We
compared the German (DaTraV) and Dutch (Vektis) national insurance claims datasets
according to these domains and found them to be an incompletely comparable source
for cross-national comparative health research. We suggest that using various
standard adjustment techniques for socio-demographic differences, making available
more information about data collection and processing procedures and adding more
diagnosis-related descriptors offer ways to overcome the identified threats to
comparability.