CC BY-NC-ND 4.0 · Gesundheitswesen 2020; 82(S 01): S83-S90
DOI: 10.1055/a-1005-6792
Original Article
Eigentümer und Copyright ©Georg Thieme Verlag KG 2019

Comparability in Cross-National Health Research Using Insurance Claims Data: The Cases of Germany and The Netherlands

Rok Hrzic
1   CAPHRI School for Public Health and Primary Care, International Health, Maastricht University, Maastricht, Netherlands
,
Timo Clemens
2   Universität Maastricht, International Health, Maastricht, Netherlands
,
Daan Westra
3   CAPHRI School for Public Health and Primary Care, Health Services Research, Maastricht University, Maastricht, Netherlands
,
Helmut Brand
2   Universität Maastricht, International Health, Maastricht, Netherlands
› Author Affiliations
Further Information

Correspondence

Rok Hrzic
CAPHRI School for Public Health and Primary Care,
International Health,
Maastricht University
PO Box 616
6200MD Maastricht
Netherlands   

Publication History

Publication Date:
19 November 2019 (online)

 

Abstract

Objective Comparison is a key method in learning about what works in health and healthcare. We discuss the importance of comparability in cross-national health research using health insurance claims data, develop a framework to systematically asses these threats and apply it to the German (DaTraV) and Dutch (Vektis) national-level insurance claims datasets.

Methods We propose a framework of threats to the comparability of health insurance claims databases, which includes three domains: (1) representation of populations compared, (2) data sources and data processing and (3) database contents and availability for research purposes. We apply the framework to analyze the comparability of DaTraV and Vektis databases using publicly available information (organization’s websites, scientific publications) and our experiences from an interregional project on rare diseases (EMRaDi).

Results Both databases were created for the same purpose (morbidity-based risk adjustment) and use the same underlying sources of data. Differences in population representation and uncertainty about data processing procedures represent potential sources of incomparability. Access for research purposes is feasible in both databases but may be subject to long processing time.

Conclusions We find important threats to the comparability of the Dutch and German national insurance claims databases and by extension to validity of any comparative health studies that rely on them. Standard adjustment techniques, making more information available about data collection and processing procedures and adding more diagnosis-related descriptors offer ways to overcome the identified threats to comparability.


#

Introduction

Cross-national comparisons of health systems follow at least three aims: learning about health systems, learning why they are what they are and learning from other countries [1]. The European Union (EU) is a rich source of potential comparisons – a natural laboratory [2] – as the Member States employ diverse approaches to organizing health systems with varying results. Based on the Treaty mandate regarding health (Article 168 (2) TFEU) an important area of EU institutions’ work is the identification of best practices in health policy and their dissemination.

Digital transformation of health systems has created vast collections of health data: from healthcare providers (electronic patient records), insurance companies (insurance claims databases), and other parties [3]. For health researchers, these collections of data promise several advantages: ready availability, low cost, large number of patients and time points included and diversity in patients and settings [4]. Health insurance claims data have been commonly used in the USA and Canada since the 1980s for epidemiological, health services and health economics research [5]. European countries have recently made it possible to use health insurance claims data for research purposes [6]. For all its promise, analysis of administrative health data presents a novel set of challenges. Because the data collection methods were designed with purposes other than research in mind, the data might be biased [7] or limited in scope [8].

Comparative health research utilizing surveys has identified comparability in measurement methodologies employed as key to avoiding biased results [9].The European Union [10], WHO Europe [11] and the OECD [12] have worked to harmonize health data collection procedures, increasingly including morbidity and cost information, but this work is ongoing and includes a limited scope of indicators. If issues of comparability are not carefully considered by researchers, they can lead to erroneous conclusions [13]. Biased or erroneous conclusions in this context mean that the differences in health status or healthcare provision between countries uncovered by the studies reflect differences in measurement, data collection or other methodological issues, instead of the differences between the respective populations’ health and health systems that are really present.

Germany and the Netherlands are two EU countries where insurance claims data is routinely available and accessible for researchers. Both countries also share a similar organization of their health systems. For example, both countries have social health insurance, a mix of public and private providers and similar health governance structures [14] [15]. This presumably makes Germany and the Netherlands the most likely case of valid comparisons in health and healthcare. If comparability of insurance claims datasets is limited in the example of this country pair, one can assume that cross-national comparisons will be more biased for other examples with more different health systems.

When using insurance claims data generated in different jurisdictions with different measurement, data collection and data processing procedures for comparative health research, questions of comparability need to be considered [16]. Given absence of a comprehensive framework for cross-national research using secondary analysis of health insurance claims data, we pursue two objectives:

  • propose a framework of comparability in cross-national health services research using health insurance claims data and

  • apply it to the German (DaTraV) and Dutch (Vektis) national insurance claims datasets.

Our research question is: “To what extent are the DaTraV and Vektis datasets comparable and therefore useful in cross-national health research between Germany and the Netherlands?”


#

Methods

Comparability in cross-national research

In this section, we propose a framework of comparability in cross-national research using secondary analysis of insurance claims data. Spector et al. provide a framework of methods-induced differences that bias results in cross-national research using collection and analysis of primary data [17]. They identify three domains of comparability that researchers need to pay attention to: samples, data collection and the measurement instrument. Based on their work, we identify three domains of health insurance claims datasets’ characteristics that need to be comparable across countries to avoid inducing differences and biasing results in cross-national research using secondary data analysis.

Spector et al. note that researchers need to ensure that the samples used are representative of the setting. Translating this principle to secondary data analysis, researchers need to understand the underlying populations that are represented in the health insurance claims databases they draw data from and carefully consider whether these populations are representative of the populations that they aim to compare. In summary, the first dimension that needs to be comparable between insurance claims datasets is the representation of populations compared.

Spector et al. also point out that researchers should ideally use comparable measurement and analytical procedures, which includes the comparability of concepts and their operationalization in the measurement instrument. In transferring this principle into the area of secondary data analysis, we can lean on the experiences of other groups gained during the development of the European Common Health Indicators (ECHIs). There, comparability of measures across countries was ensured through harmonization of the definitions of the underlying concepts, data sources, data collection and data processing procedures (i. e., subsequent transformations or calculations) [10]. In summary, the second dimension of health insurance claims datasets that needs to be comparable are the underlying concepts, data sources, data collection and data processing. However, the importance of the comparability of the underlying concepts depends largely on the research question. While death is a clear concept with little cross-national variation in its operationalization, a diabetes care trajectory may be defined very differently between health systems. Therefore, while we include comparability of underlying concepts in our framework for completeness, we will not explicitly examine this aspect later in the paper.

Beyond the dimensions of the Spector et al. framework, our framework includes a third dimension of comparability: the database contents and availability for research purposes. To compare population health or healthcare provision in different jurisdictions using health insurance claims data, both databases need to contain the relevant information and be available to international research groups. It plays a fundamental role in limiting the possible avenues of cross-national research in general and therefore biases the kinds of research questions that can be asked.

[Figure 1] applies our framework to an example of a study of the difference in incidence of disease X in two countries that uses health insurance claims data. It highlights that the overall difference in incidence rates found by the study can be separated into two components. The first component represents the real difference while the second component represents the difference induced by measurement and data collection. The latter component is then further decomposed into the first two domains of our framework: (1) difference due to incomparable representation of the populations and (2) difference due to incomparable underlying concepts, data sources, data collection or data processing procedures. Domain (3), incomparable database contents and lack of research access, is portrayed as preceding the study by influencing the feasibility of the comparison.

Zoom Image
Fig. 1 Sources of difference in cross-national comparisons.

#

Data collection and analysis

The three domains in the framework described above are used as a conceptual lens to analyze the comparability of the DaTraV and Vektis insurance claims databases. The aim of the data collection step was not to systematically uncover all available descriptions of the datasets, but rather to achieve saturation in terms of a satisfactory description of the datasets in each of the three domains.

To populate each of the domains we relied on publicly available information about the datasets in official government documents and scientific publications that used the databases. To identify the former group of documents, the DIMDI (www.dimdi.de) and Vektis (www.vektis.nl) websites were hand searched for relevant information. For the relevant scientific publications, we searched the DIMDI and Vektis websites, as well as the MEDLINE and Social Sciences Citation Index (SSCI) electronic databases using the keywords “DaTraV” and “Vektis”.

An additional source of information was our first-hand experience with the databases, which we gathered during working on the Euregio Meuse-Rhine Rare Diseases (EMRaDi) project. Within EMRaDi, the estimation of the number of rare disease patients and the costs of rare diseases is envisaged utilizing national insurance claims databases (20).


#
#

Results

This chapter provides a brief summary of the history and legal basis of the national insurance claims datasets in Germany (DIMDI) and the Netherlands (Vektis) and analyzes their comparability according to the three dimensions from our framework: (1) representation of populations compared, (2) data sources and data processing and (3) database contents and availability for research purposes.

DaTraV dataset

History and legal basis

Since 2014, an anonymized Germany-wide insurance claims database is available for research purposes. The name DaTraV (Datentransparenzverordnung) stems from one of the two legal acts, which brought the database into existence. It is maintained by the German Institute for Medical Documentation and Information (DIMDI), a part of the Federal Ministry for Health. Changes to the Social Code Book V (SGB V) led to the development of the Data transparency regulation (Datentransparenzverordnung) in September 2012, which tasked DIMDI to secure storage of insurance claims data and the maintenance of a database [18].

Access to the database is regulated in the SGB V and explicitly includes research institutions. The research-relevant objectives stated in the statute include: improving the quality of care, long-term analyses of treatment processes, analysis of supply processes to detect undesirable developments and for starting points for reforms (over-, under- and misuse), support of political decision-making processes for the further development of statutory health insurance, and analysis and development of novel care provision approaches [18] [19].

DIMDI’s website currently (April 2019) lists thirteen successful completed projects that have used the DaTraV dataset for research purposes [20].


#

Representation of the population

Data in the DaTraV includes insured persons in the statutory health insurance (GKV), which is the majority - approximately 70 million Germans or roughly 90% of the population [21] [22]. This means that the part of the population - persons insured in the private health insurance system - is excluded and that the database cannot be considered representative of the German population, as there are important differences between the populations included in the respective systems [23]. In addition, the claims data of persons who died are missing in the year of death, an estimated 750 000 cases per year [24]. This could introduce a risk of bias especially for acute conditions with a high short-term risk of death (e. g. acute myocardial infarction).


#

Data sources, collection and processing

DIMDI receives insurance claims from the German Federal (Social) Insurance Office (BVA) on an annual basis, which in turn receives the information from individual insurers. The purpose is morbidity-based risk equalization procedure among the insurers, which is the task of the BVA. Before the annual transfer of data takes place, BVA verifies and corrects the source data for completeness and plausibility according to procedure agreed upon by DIMDI and BVA (a description of this procedure is not publicly available). DIMDI then pseudonymizes and collates the claims data into a longitudinal database [19].


#

Database items and research access

The DaTraV currently (April 2019) contains information related to the insured person (gender and age, insurance status, insurer), costs medical services rendered in the ambulant and stationary sectors (ICD-10 coded diagnoses), medicines prescribed, and costs related to ambulatory medical services, stationary medical services, dentists’ services, pharmacies, other services and sickness compensation for years 2009–2014. The postal code of insured persons is also available for years 2009 and 2010.

Access to the dataset for research purposes is possible after DIMDI scrutinizes the request for eligibility and adequate protection of privacy. Researchers can develop the analysis script (SQL) themselves based on the example dataset available on the DIMDI website [25], they can request that DIMDI create the script for them, or perform the analysis on-site. Researchers are allowed to inspect and export an aggregated results table. A minimum number of patients per cell in the exported table is set by DIMDI on a case-by-case basis to prevent re-identification [26]. The basic charge of processing a request is €200, with an additional cost of €300 per evaluated year. An additional cost of €100 per personnel hour will be charged to adjust a user-developed script or for DIMDI to develop the script, up to a maximum of €400 or €700, respectively[26]. The legally mandated waiting time between receipt of a request and processing is three months with an additional extension of one month in complex cases or for other justified reasons[18]. In our experience, the waiting time can currently exceed 12 months. ([Table 1])

Table 1 A summary of DaTraV and Vektis databases according to the domains of induced differences identified in [Figure 1].

“DaTraV” (Germany)

Vektis (Netherlands)

Potential comparability

Population representation

Population covered by compulsory insurance (90% of residents).

Population covered by compulsory insurance (100% of residents).

The selective exclusion of privately insured individuals (high-income groups) from the DaTraV dataset could bias the comparison.

Data source

German Federal (Social) Insurance Office (Bundesversicherungsamt - BVA), which receives data from individual insurers.

Individual insurers

The underlying source of data in both cases are insurers.

Data collection

Annual collection in the context of morbidity-based risk equalization procedure among the insurers between BVA and DIMDI after pre-processing according to a non-publicly defined procedure.

Data collection is performed in the context of morbidity-based risk equalization. Data collection occurs every quarter.

Data was collected in both cases with the same overarching purpose. However, the lack of a public description of the pre-processing phase introduces some uncertainty.

Data processing

Pseudonymization and collation into a longitudinal dataset.

Collation into a longitudinal dataset. Pseudonymization when made available to researchers.

The key processing steps are same in both databases, but details might differ.

Database contents and research access

Full dataset and variable description is available (DIMDI, 2018b), and includes information related to the insured person (gender and age, insurance status, insurer), costs medical services rendered in the ambulant and stationary sectors (including diagnoses given according to ICD-10), medicines prescribed, and costs related to ambulatory medical services, stationary medical services, dentists’ services, pharmacies, other services and sickness compensation for years 2009–2014, as well as location data for 2009 and 2010.
Researchers request access in an application process. If successful, they can extract an aggregated results table, where the minimum number of patients per cell needs to make re-identification of patients impossible.
Costs of access depend on the complexity of request (flat fee per year of data included) and amount of expert advice required from DIMDI.

No public in-depth description of the dataset is available. However, all of the claims types (all DRGs and their descriptions, types of consultations, etc.) are clearly defined and their descriptions publicly available elsewhere.
Based on descriptions in research and our experiences, the dataset includes data on claims for hospital and mental health care services based on a home-grown DRG system, consultation based claims for GP care, prescription based claims for pharmaceuticals including the type of medication and the dosage. All claims data include the date on which the service was delivered or commenced, the organization and/or professional who delivered the service, and the price paid for the service.
Researchers can freely access aggregated data per municipality and by 5-year age groups. Researchers can apply for access to individual cost data and clinical data, access to both of which is subject to approval.

Both databases offer access for research purposes.
The architectures are similar in that they offer access to cost information.
However, the DaTraV dataset offers patient selection based on ICD-10 diagnoses, while the Vektis system focuses either on DRGs in the hospital and mental health sectors, broader consultation types in primary care, or pharmaceuticals prescribed for disease identification.


#
#

Vektis insurance claims database

History and legal basis

Since 2006, Vektis manages a nation-wide database of claims covered by the Health Insurance Act (HIA). The act obliges Dutch inhabitants to purchase a basic statutory health insurance package from a private health insurer [14]. The package includes primary care, maternity care, hospital care, mental health care, home nursing care, prescription pharmaceuticals and some allied healthcare services [14]. The majority of Dutch citizens furthermore purchase an additional, voluntary insurance package, which typically cover the costs of services such as physical therapy, dentistry and glasses. All insurers submits their claims data to Vektis, a subsidiary of Health Insurers Netherlands, the umbrella organization of the private health insurers in the country, towards the purpose of risk-equalization. Furthermore, the database includes all claims made under the Long-term Care Act (LCA), which was introduced in 2015 and covers residential and home care, are reimbursed by so-called care offices [14]. ([Table 2])

Table 2 Key data elements in DaTraV and Vektis databases.

DaTraV (Germany)

Vektis (Netherlands)

Demographic information

Year of birth

Year of birth

Gender

Gender

Death

Death

Postal code

Postal code

Insurance information

Number of days insured

Years with the same insurer

Insurer

Insurer

Medicines

Prescription date

Prescription date

Medication (PZN)

Medication

Quantity

Quantity

Prescriber

Supplier

Stationary sector

Discharge month

Start and end date of care product (DBC)

ICD-10 code

Care products (DBC)

Ambulatory sector

Quarter

Month or Quarter

ICD-10 code

Type of practitioner and type of treatment

Number of treatments

Number of treatments

Costs

Year

Specific date of any procedure performed

Doctor

General practitioner

Second-line and specialist mental health care

Dentist

Dentist (split by adult or youth care, some common procedures)

Medicines

Medicines

Hospital

Specialist care

Other

Medical devices

Physiotherapy and other paramedical services (e. g. speech therapy)

Maternal and obstetric care

Primary mental health care

Nursing and home care

Other

Sickness benefit


#

Representation of the population

All (i. e., 100%) insured residents in the Netherlands are included in the dataset. Despite the fact that purchasing basic insurance is mandatory, approximately 25 000 inhabitants do not purchase a basic package [14]. The data therefore covers almost the entire Dutch population for services covered by the basic insurance package. Approximately 85% of the Dutch population purchases voluntary additional insurance, which is also included in the dataset [14]. The LCA claims amount to approximately 340 thousand.


#

Data sources, collection and processing

Insurers only reimburse claims which have undergone, and passed, an (electronic) accuracy review [27]. Towards this purpose, they store data regarding their enrollees, healthcare providers and health services provided [28]. Health insurers (24 insurers consolidated in 9 companies) submit their data to Vektis electronically on a quarterly basis at an individual level per claim. Vektis standardizes the incoming data to remove any remaining discrepancies in storing formats between insurers [29].


#

Database items and research access

The data stored by Vektis includes information of the service provided, the organization where the service was provided (e. g. location and type), the professional who provided the service (e. g. age, specialization, affiliations) and the insured who received the service (e. g. age, gender, postal code, and insurer). The structure of the data regarding the service provided and level of (clinical) detail that can be retrieved from it varies across different types of services. Claims for hospital and mental healthcare services for example, are structured around the Dutch DRG (i. e. DBC) system, which has been majorly reformed in 2012, and contain (detailed) information regarding the condition and treatment. On the other hand, claims for GP care are consultation-based and contain little to none medical information, while claims for pharmaceuticals are prescription based and include the type of medication and the dosage. Depending on the duration of the treatment, claims appear in Vektis’ database within approximately two years.

Researchers can access Vektis’ data for non-commercial projects which ‘aim to improve Dutch health care and have societal relevance’ [30]. Besides the data made publicly available by Vektis through www.zorgprismapubliek.nl, which has been used in academic research (cf. [29]), researchers can access Vektis’ data in two ways. An aggregated version of Vektis’ data can be accessed through Statistics Netherlands (CBS). These data include the sum of the costs incurred in each of the cost categories covered by the basic insurance package per inhabitant per year. Requests need to be approved by Vektis and CBS. This route, including its costs and process, is further specified in the CBS data catalog [31]. Data can be accessed remotely and linked to a range of other sources stored within the CBS environment [32].Researchers who require data at the individual level including claims (i. e., clinical) details can submit a proposal to Vektis, outlining the purpose and relevance of their research and required data. Vektis scrutinizes the proposals and presents them to the health insurers (once every month) who also need to approve it. Individual-level data needs to remain in Vektis’ protected environment and analyzed on-site in Zeist, the Netherlands. Vektis charges a processing fee of 125 Euro/hour (excluding VAT) in addition to a user fee between 250 and 5000 € [30]. Vektis as well as CBS operate under the general rule that datasets with n>10 per cell can be provided to researchers externally.


#
#
#

Discussion

We explored the feasibility of cross-country comparisons between Germany and the Netherlands using health insurance claims data based on comparability of the two national insurance claims databases. We find that while the purpose, underlying data sources and structures of both databases are highly similar, their implementation may present important challenges to comparability.

Both databases collect information on all residents of the countries that are included in the respective national mandatory insurance schemes. However, in Germany, this excludes approximately 10% which is covered as part of the private insurance system. Previous research [23] shows that this portion of the population is systematically different from the remainder, which begs the question whether differences uncovered between the two databases could be influenced by selection bias in the DaTraV database. This is particularly important in cross-national comparisons of disease burden (e. g. incidence of cancer), which can be lower in the usually wealthier population excluded from the DaTraV database. Researchers may begin to overcome this limitation by being cognizant of this source of bias, and carefully considering standard methods of adjusting the epidemiological estimates for differences in socio-demographic characteristics of compared populations [33].

DaTraV and Vektis databases are comparable in terms of sources of data (insurance claims data) and have the same overarching purpose (morbidity-based risk adjustment). However, the opacity in the data processing step makes a final determination of comparability impossible. Potential differences in this step, especially if they involve excluding certain individuals with supposedly poor quality data, could introduce bias. We urge both data holders to make available detailed descriptions of all data processing steps, especially any rules for excluding individuals or group from the databases.

Both databases are available for research purposes after approval from respective data management authorities. Both databases are comparable in terms of variables included: demographic information (e. g. sex and age) of the insured persons and the costs of healthcare services rendered broken down by healthcare sector and condition. However, the way patient populations are identified differs. The DaTraV dataset includes ICD-10 diagnoses, while Vektis relies on procedure codes (e. g. DRGs). While this makes identification of the same patient populations challenging, it is not impossible. Previous research in the Netherlands that used Vektis data successfully identified diabetic and vascular disease treatment pathways using carefully curated collections of procedure codes [28]. This suggests that cross-national comparisons of wider disease groups using DaTraV and Vektis datasets are possible, but that caution and significant effort are necessary. To make this type of research more feasible, we urge both data holders to consider adding more diagnosis-related descriptors to their dataset (e. g. ICD-10, SNOMED CT, etc.).

Beyond a comparison of DaTraV and Vektis databases, this paper addresses a larger challenge for comparative health researchers using routinely collected health data. Valuable data remains locked away in difficult to access silos. Even when accessed, the data might be of questionable comparability and therefore of limited value for cross-national research. While the comparability framework we propose is focused on health insurance claims datasets, we are confident it is just as relevant for other routinely collected health data. Various European level projects are currently working to harmonize the databases of routinely collected health data throughout the European union [34], but these results will not be available in the near term. However, we must not be dissuaded from trying to make progress, working to harmonize routinely collected health databases step-by-step and database-by-database. Only with access to comparable health data can we encourage faster diffusion of best practices and make our health systems more effective, more efficient and more responsive to the needs of patients and citizens.


#

Conclusions

Analysis of administrative health data is a promising approach to comparative health research due to its ready availability, low cost and a large number of diverse patients and settings included. For the results of comparative studies to reflect the realities on the ground, three domains of database characteristics need to be considered: (1) representation of populations compared, (2) data sources and data processing and (3) database contents and availability for research purposes. We compared the German (DaTraV) and Dutch (Vektis) national insurance claims datasets according to these domains and found them to be an incompletely comparable source for cross-national comparative health research. We suggest that using various standard adjustment techniques for socio-demographic differences, making available more information about data collection and processing procedures and adding more diagnosis-related descriptors offer ways to overcome the identified threats to comparability.


#
#

Conflict of Interest

The authors declare that they have no conflict of interest.

Acknowledgements

We would like to thank the anonymous reviewers for helpful comments. The EMRaDi project, undertaken via the Interreg V-A Euregio Meuse-Rhine programme, is supported by the European Union, the European Regional Development Fund and the regional authorities.


Correspondence

Rok Hrzic
CAPHRI School for Public Health and Primary Care,
International Health,
Maastricht University
PO Box 616
6200MD Maastricht
Netherlands   


Zoom Image
Fig. 1 Sources of difference in cross-national comparisons.