Methods Inf Med 2015; 54(01): 41-44
DOI: 10.3414/ME13-02-0027
Focus Theme – Original Articles
Schattauer GmbH

An Eligibility Criteria Query Language for Heterogeneous Data Warehouses[*]

R. Bache
1   Department of Informatics, King’s College London, London, UK
2   Department of Primary Care and Public Health Sciences, King’s College London, London, UK
,
A. Taweel
1   Department of Informatics, King’s College London, London, UK
2   Department of Primary Care and Public Health Sciences, King’s College London, London, UK
,
S. Miles
1   Department of Informatics, King’s College London, London, UK
,
B. C. Delaney
2   Department of Primary Care and Public Health Sciences, King’s College London, London, UK
› Author Affiliations
Further Information

Publication History

received: 15 June 2013

accepted: 07 May 2014

Publication Date:
22 January 2018 (online)

Summary

Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.

Objectives: The increasing availability of electronic clinical data provides great potential for finding eligible patients for clinical research. However, data heterogeneity makes it difficult for clinical researchers to interrogate sources consistently. Existing standard query languages are often not sufficient to query across diverse representations. Thus, a higher- level domain language is needed so that queries become data-representation agnostic. To this end, we define a clinician-readable computational language for querying whether patients meet eligibility criteria (ECs) from clinical trials. This language is capable of implementing the temporal semantics required by many ECs, and can be automatically evaluated on heterogeneous data sources.

Methods: By reference to standards and examples of existing ECs, a clinician-readable query language was developed. Using a model-based approach, it was implemented to transform captured ECs into queries that interrogate heterogeneous data warehouses. The query language was evaluated on two types of data sources, each different in structure and content.

Results: The query language abstracts the level of expressivity so that researchers construct their ECs with no prior knowledge of the data sources. It was evaluated on two types of semantically and structurally diverse data warehouses. This query language is now used to express ECs in the EHR4CR project. A survey shows that it was perceived by the majority of users to be useful, easy to understand and unambiguous.

Discussion: An EC-specific language enables clinical researchers to express their ECs as a query such that the user is isolated from complexities of different heterogeneous clinical data sets. More generally, the approach demonstrates that a domain query language has potential for overcoming the problems of semantic interoperability and is applicable where the nature of the queries is well understood and the data is conceptually similar but in different representations.

Conclusions: Our language provides a strong basis for use across different clinical domains for expressing ECs by overcoming the heterogeneous nature of electronic clinical data whilst maintaining semantic consistency. It is readily comprehensible by target users. This demonstrates that a domain query language can be both usable and interoperable.

* Supplementary material published on our web-site www.methods-online.com