CC BY-NC-ND 4.0 · Appl Clin Inform 2021; 12(04): 826-835
DOI: 10.1055/s-0041-1733847
Research Article

Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository

Lorenz A. Kapsner
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
2   Department of Radiology, Universitätsklinikum Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
Jonathan M. Mang
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
Sebastian Mate
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
Susanne A. Seuchter
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
Abishaa Vengadeswaran
3   Medical Informatics Group (MIG), Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt am Main, Germany
Franziska Bathelt
4   Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technical University Dresden, Dresden, Germany
Noemi Deppenwiese
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
Dennis Kadioglu
3   Medical Informatics Group (MIG), Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt am Main, Germany
5   Data Integration Center, University Hospital Frankfurt, Frankfurt am Main, Germany
Detlef Kraska
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
Hans-Ulrich Prokosch
1   Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
6   Department of Medical Informatics, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Erlangen, Germany
› Author Affiliations
Funding This work was funded in part by the German Federal Ministry of Education and Research (BMBF) within the Medical Informatics Initiative (MIRACUM Consortium) under the Funding Numbers FKZ: 01ZZ1801A (Erlangen), 01ZZ1801C (Frankfurt), and 01ZZ1801L (Dresden).


Background Many research initiatives aim at using data from electronic health records (EHRs) in observational studies. Participating sites of the German Medical Informatics Initiative (MII) established data integration centers to integrate EHR data within research data repositories to support local and federated analyses. To address concerns regarding possible data quality (DQ) issues of hospital routine data compared with data specifically collected for scientific purposes, we have previously presented a data quality assessment (DQA) tool providing a standardized approach to assess DQ of the research data repositories at the MIRACUM consortium's partner sites.

Objectives Major limitations of the former approach included manual interpretation of the results and hard coding of analyses, making their expansion to new data elements and databases time-consuming and error prone. We here present an enhanced version of the DQA tool by linking it to common data element definitions stored in a metadata repository (MDR), adopting the harmonized DQA framework from Kahn et al and its application within the MIRACUM consortium.

Methods Data quality checks were consequently aligned to a harmonized DQA terminology. Database-specific information were systematically identified and represented in an MDR. Furthermore, a structured representation of logical relations between data elements was developed to model plausibility-statements in the MDR.

Results The MIRACUM DQA tool was linked to data element definitions stored in a consortium-wide MDR. Additional databases used within MIRACUM were linked to the DQ checks by extending the respective data elements in the MDR with the required information. The evaluation of DQ checks was automated. An adaptable software implementation is provided with the R package DQAstats.

Conclusion The enhancements of the DQA tool facilitate the future integration of new data elements and make the tool scalable to other databases and data models. It has been provided to all ten MIRACUM partners and was successfully deployed and integrated into their respective data integration center infrastructure.

Protection of Human and Animal Subjects

Pseudonymized EHR data were used for developing and testing this software. No formal intervention was performed and no additional (patient-) data were collected. The authors declare that this research was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects.

Publication History

Received: 19 April 2021

Accepted: 27 June 2021

Article published online:
25 August 2021

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Helmer KG, Ambite JL, Ames J. et al; Biomedical Informatics Research Network. Enabling collaborative research using the Biomedical Informatics Research Network (BIRN). J Am Med Inform Assoc 2011; 18 (04) 416-422
  • 2 Holve E, Segal C, Lopez MH, Rein A, Johnson BH. The Electronic Data Methods (EDM) forum for comparative effectiveness research (CER). Med Care 2012; 50 (suppl): S7-S10
  • 3 McMurry AJ, Murphy SN, MacFadden D. et al. SHRINE: enabling nationally scalable multi-site disease studies. PLoS ONE 2013; 8 (Suppl. 03) e55811
  • 4 Hripcsak G, Duke JD, Shah NH. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216 (216) 574-578
  • 5 Juárez D, Schmidt EE, Stahl-Toyota S, Ückert F, Lablans M. A generic method and implementation to evaluate and improve data quality in distributed research networks. Methods Inf Med 2019; 58 (2-03): 86-93
  • 6 Semler S, Wissing F, Heyder R. German medical informatics initiative: a national approach to integrating health data from patient care and medical research. Methods Inf Med 2018; 57 (01) e50-e56
  • 7 Kahn MG, Callahan TJ, Barnard J. et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC) 2016; 4 (01) 1244
  • 8 Brennan PF, Stead WW. Assessing data quality: from concordance, through correctness and completeness, to valid manipulatable representations. J Am Med Inform Assoc 2000; 7 (01) 106-107
  • 9 Kahn MG, Mis BBE, Bathurst J. Quantifying clinical data quality using relative gold standards. Paper presented at: AMIA Annual Symposium proceedings AMIA Symposium. 2010: 356-360
  • 10 Hersh WR, Weiner MG, Embi PJ. et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 2013; 51 (08, Suppl 3): S30-S37
  • 11 International Organization of Standardization (ISO). ISO/IEC 11179, Information Technology—Metadata Registries (MDR). Part 3: Registry Metamodel and Basic Attributes. 3 rd ed. published 2013–02–12). SC32 WG2 Metadata Standards Home Page; 2013
  • 12 Prokosch H-U, Acker T, Bernarding J. et al. MIRACUM: Medical Informatics in Research and Care in University Medicine: a large data sharing network to enhance translational research and medical care. Methods Inf Med 2018; 57 (01) 82-91
  • 13 Kadioglu D, Breil B, Knell C. et al. Samply.MDR—a metadata repository and its application in various research networks. Stud Health Technol Inform 2018; 253: 50-54
  • 14 MIRACUM Consortium. Miracum MDR. 2021 . Accessed March 23, 2021 at:
  • 15 Kapsner LA, Kampf MO, Seuchter SA. et al. Moving towards an EHR data quality framework: the MIRACUM approach. Stud Health Technol Inform 2019; 267: 247-253
  • 16 Haverkamp C, Ganslandt T, Horki P. et al. Regional differences in thrombectomy rates : secondary use of billing codes in the MIRACUM (Medical Informatics for Research and Care in University Medicine) Consortium. Clin Neuroradiol 2018; 28 (02) 225-234
  • 17 Allaire JJ, Xie Y, McPherson J. et al. Rmarkdown: dynamic documents for R. R package version 2.7, 2021. Accessed July 20, 2021 at:
  • 18 Xie Y, Allaire JJ, Grolemund G. R Markdown: The Definitive Guide. Boca Raton, FL: Chapman and Hall/CRC; 2018
  • 19 Nasseh D, Nonnemacher M, Stausberg J. Datenqualität in der medizinischen Forschung: Leitlinie zum adaptiven Management von Datenqualität in Kohortenstudien und Registern. MWV Med Wissenschaftliche Verlagsgesellschaft; 2014
  • 20 Johnson SG, Speedie S, Simon G, Kumar V, Westra BL. A data quality ontology for the secondary use of EHR data. Paper presented at: AMIA Annual Symposium proceedings AMIA Symposium; 2015: 1937-1946
  • 21 Strong DM, Lee YW, Wang RY. Data quality in context . Commun ACM 1997; 40 (05) 103-110
  • 22 Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (01) 144-151
  • 23 Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform 2013; 46 (05) 830-836
  • 24 Khare R, Utidjian L, Ruth BJ. et al. A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc 2017; 24 (06) 1072-1079
  • 25 Lee K, Weiskopf N, Pathak J. A Framework for Data Quality Assessment in Clinical Research Datasets. Paper presented at: AMIA Annual Symposium Proceedings; 2017: 1080-1089
  • 26 Callahan TJ, Bauck AE, Bertoch D. et al. A comparison of data quality assessment checks in six data sharing networks. EGEMS (Wash DC) 2017; 5 (01) 8
  • 27 Qualls LG, Phillips TA, Hammill BG. et al. Evaluating foundational data quality in the national Patient-Centered Clinical Research Network (PCORnet®). EGEMS (Wash DC) 2018; 6 (01) 3
  • 28 Observational Health Data Sciences and Informatics. The Book of OHDSI. Observational Health Data Sciences and Informatics. 2019 . Accessed July 20, 2021 at:
  • 29 Lynch KE, Deppen SA, DuVall SL. et al. Incrementally transforming electronic medical records into the observational medical outcomes partnership common data model: a multidimensional quality assurance approach. Appl Clin Inform 2019; 10 (05) 794-803
  • 30 Wang Z, Talburt JR, Wu N, Dagtas S, Zozus MN. A rule-based data quality assessment system for electronic health record data. Appl Clin Inform 2020; 11 (04) 622-634
  • 31 Liaw S-T, Guo JGN, Ansari S. et al. Quality assessment of real-world data repositories across the data life cycle: A literature review. J Am Med Inform Assoc 2021; ocaa340
  • 32 R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2020
  • 33 Xie Y. Knitr: A Comprehensive Tool for Reproducible Research in r. In: Stodden V, Leisch F, Peng RD. eds. Implementing Reproducible Computational Research. Boca Raton, FL: Chapman and Hall/CRC; 2014
  • 34 Xie Y. Dynamic Documents with R and Knitr. 2nd ed.. Boca Raton, Florida: Chapman and Hall/CRC; 2015
  • 35 Knitr XY. A General-Purpose Package for Dynamic Report Generation in R. R package version 1.31, 2021. Accessed July 20, 2021 at:
  • 36 Merkel D. Docker: lightweight Linux containers for consistent development and deployment. Linux J 2014; 239: 2
  • 37 Gruendner J, Gulden C, Kampf M, Mate S, Prokosch H-U, Zierk J. A framework for criteria-based selection and processing of fast healthcare interoperability resources (FHIR) data for statistical analysis: design and implementation study. JMIR Med Inform 2021; 9 (04) e25645
  • 38 Murphy SN, Mendis M, Hackett K. et al. Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside. Paper presented at: AMIA Annu Symp Procamia Annu Symp Proc.; 2007: 5
  • 39 Ryan P, Schuemie M, Huser V, Knoll C, Londhe A. , and Taha Abdul-Basser. Achilles: Creates Descriptive Statistics Summary for an Entire OMOP CDM Instance; R package version 1.6.7, 2019. Accessed July 20, 2021 at:
  • 40 Maier C, Lang L, Storf H. et al. Towards Implementation of OMOP in a German University Hospital Consortium. Appl Clin Inform 2018; 9 (01) 54-61
  • 41 DataQualityDashboard. . Published 2021. Accessed April 6, 2021 at:
  • 42 Schmidt CO, Struckmann S, Enzenbach C. et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol 2021; 21 (01) 63
  • 43 Richter A, Schmidt CO, Krüger M, Struckmann S. DataquieR: assessment of data quality in epidemiological research. Journal of Open Source Software, 6(61), 3093,
  • 44 Lablans M, Kadioglu D, Mate S, Leb I, Prokosch H-U, Ückert F. Strategien zur Vernetzung von Biobanken. Klassifizierung verschiedener Ansätze zur Probensuche und Ausblick auf die Zukunft in der BBMRI-ERIC. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2016; 59 (03) 373-378
  • 45 German Cancer Consortium. , Available at:
  • 46 Souibgui M, Atigui F, Zammali S, Cherfi S, Yahia SB. Data quality in ETL process: a preliminary study. Procedia Comput Sci 2019; 159: 676-687
  • 47 Juran JM. ed. Juran's Quality Handbook. 5th ed.. New York, NY: McGraw-Hill; 1999