Data Integration for Future Medicine (DIFUTURE)An Architectural and Methodological OverviewThe work of the DIFUTURE consortium during the conceptual phase was funded by the German Federal Ministry of Education and Research (BMBF) within the “Medical Informatics Funding Scheme” under reference numbers 01ZZ1603[A-D].
01 December 2017
accepted: 17 April 2018
17 July 2018 (online)
Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers.
Objectives: The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing.
Governance and Policies: Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change.
Architectural Framework and Methodology: The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing.
Use Cases: From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios.
Discussion: Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.
KeywordsHealth information systems - data warehousing - information dissemination - data sharing - privacy
* for the DIFUTURE Consortium
- 1 Flores M, Glusman G, Brogaard K, Price ND, Hood L. P4 medicine: how systems medicine will transform the healthcare sector and society. Per Med 2013; 10 (06) 565-576.
- 2 Dyke SO, Philippakis AA, Rambla De Argila J. et al. Consent Codes: Upholding Standard Data Use Conditions. PLoS Genet 2016; 12 (01) e1005772.
- 3 Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 03: 160018.
- 4 Prasser F, Kohlmayer F, Spengler H, Kuhn KA. A scalable and pragmatic method for the safe sharing of high-quality health data. IEEE J Biomed Health Inform 2018; 22 (02) 611-622.
- 5 DIFUTURE – Scientific Advisory Board. [cited 2017 Nov 27]. Available from: https://difuture.de/advisory-board/
- 6 IHE IT Infrastructure Technical Framework. IHE International Inc. 2017 Jul 21 [cited 2017 Nov 30]. Available from: https://www.ihe.net/Technical_Frameworks/#IT
- 7 7® Version 9.1, an Open Group Standard. The Open Group. [cited 2017 Oct 27]. Available from: http://www.opengroup.org/subjectareas/enterprise/7/
- 8 Federal Enterprise Architecture Framework Version 2. The White House. 2013 Jan 29 [cited 2017 Nov 17]. Available from: https://obamawhitehouse.archives.gov/sites/default/files/omb/assets/egov_docs/fea_v2.pdf
- 9 Fielding RT. Architectural styles and the design of network-based software architectures [dissertation]. Irvine: University of California; 2000
- 10 Boettiger C. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review 2015; 49 (01) 71-79.
- 11 Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One 2017; 12 (05) e0177459.
- 12 Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. 26th IEEE International Symposium on Computer-Based Medical Systems 2013; 326-331.
- 13 OpenESB – The Open Enterprise Service Bus. [cited 2017 Nov 27]. Available from: http://www.open-esb.net/
- 14 MIRC Clinical Trials Processor. Radiological Society of North America, Inc. [cited 2017 Nov 27]. Available from: http://mircwiki.rsna.org/index.php?title=MIRC_CTP
- 15 Jodogne S, Bernard C, Devillers M, Lenaerts E, Coucke P. Orthanc – A lightweight, restful DICOM server for healthcare and medical research. 10th IEEE International Symposium on Biomedical Imaging 2013; 190-193.
- 16 Stein B, Morrison A. The enterprise data lake: Better integration and deeper analytics. PwC Technology Forecast: Rethinking integration 2014; 01: 1-9.
- 17 Bauch A, Adamczyk I, Buczek P, Elmer FJ, Enimanev K, Glyzewski P. et al. openBIS: a flexible framework for managing and analyzing complex data in biology research. BMC Bioinformatics 2011; 12 (01) 468.
- 18 Marcus DS, Olsen TR, Ramaratnam M, Buckner RL. The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics 2007; 05 (01) 11-34.
- 19 Casters M, Bouman R, Van Dongen J. Pentaho Kettle solutions: building open source ETL solutions with Pentaho Data Integration. Indianapolis: John Wiley Publishing Incorporated; 2010
- 20 Bowen J. Getting Started with Talend Open Studio for Data Integration. Birmingham: Packt Publishing Limited; 2012
- 21 Bauer C, Ganslandt T, Baum B, Christoph J, Engel I, Löbe M. et al. The integrated data repository toolkit (IDRT): accelerating translational research infrastructures. J Clin Bioinforma 2015; 05 (Suppl. 01) S6.
- 22 de la Garza L, Veit J, Szolek A, Röttig M, Aiche S, Gesing S. et al. From the desktop to the grid: scalable bioinformatics via workflow conversion. BMC Bioinformatics 2016; 17 (01) 127.
- 23 Streit A, Bala P, Beck-Ratzka A, Benedyczak K, Bergmann S, Breu R. et al. UNICORE 6 – recent and future advancements. Ann Telecommun 2010; 65 (11–12): 757-762.
- 24 Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T. et al. KNIME: The Konstanz information miner. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R. editors: Data Analysis, Machine Learning and Applications. Berlin: Springer; 2008: 319-326.
- 25 ISO/IEC 11179, Information Technology – Metadata registries (MDR). International Organization of Standardization (ISO). [cited 2017 Nov 30]. Available from: http://metadata-standards.org/11179/
- 26 DIFUTURE – Partners. [cited 2017 Nov 28]. Available from: https://difuture.de/partners/
- 27 Averbis Information Discovery. [cited 2017 Nov 27]. Available from: https://averbis.com/information-discovery/
- 28 Ragan ED, Endert A, Sanyal J, Chen J. Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes. IEEE Trans Vis Comput Graph 2016; 22 (01) 31-40.
- 29 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc 2010; 17 (02) 124-130.
- 30 Scheufele E, Aronzon D, Coopersmith R, McDuffie MT, Kapoor M, Uhrich CA. et al. tranSMART: An Open Source Knowledge Management and High Content Data Analytics Platform. AMIA Jt Summits Transl Sci Proc 2014; 2014: 96-101.
- 31 Schumacher A, Rujan T, Hoefkens J. A collaborative approach to develop a multi-omics data analytics platform for translational research. Appl Transl Genom 2014; 03 (04) 105-108.
- 32 Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 2013; 06 (269) 1.
- 33 Loraine AE, Blakley IC, Jagadeesan S, Harper J, Miller G, Firon N. Analysis and visualization of NA-Seq expression data using RStudio, Bioconductor, and Integrated Genome Browser. Methods Mol Biol 2015; 1284: 481-501.
- 34 Lautenschläger R, Kohlmayer F, Prasser F, Kuhn KA. A generic solution for web-based management of pseudonymized data. BMC Med Inform Decis Mak 2015; 15: 100.
- 35 Bialke M, Penndorf P, Wegner T. et al. A workflowdriven approach to integrate generic software modules in a Trusted Third Party. J Transl Med 2015; 13: 176.
- 36 Automatable Discovery and Access Matrix. GA4GH. [cited 2017 Nov 30]. Available from: https://www.ga4gh.org/ga4ghtoolkit/regulatoryandethics/
- 37 Durham EA, Kantarcioglu M, Xue Y, Toth C, Kuzu M, Malin B. Composite Bloom Filters for Secure Record Linkage. IEEE Trans Knowl Data Eng 2014; 26 (12) 2956-2968.
- 38 Schera F, Weiler G, Neri E, Kiefer S, Graf N. The p-medicine portal – a collaboration platform for research in personalised medicine. Ecancermedicalscience 2014; 08: 398.
- 39 McMurry AJ, Murphy SN, MacFadden D, Weber G, Simons WW, Orechia J. et al. SHRINE: enabling nationally scalable multi-site disease studies. PloS One 2013; 08 (03) e55811.
- 40 Prasser F, Kohlmayer F, Lautenschläger R, Kuhn KA. ARX – A Comprehensive Tool for Anonymizing Biomedical Data. AMIA Annual Symposium 2014; 984-993.
- 41 Templ M, Kowarik A, Meindl B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. J Stat Softw 2015; 67 (04) 1-36.
- 42 Brandizi M, Melnichuk O, Bild R, Kohlmayer F, Rodriguez-Castro B, Spengler H. et al. Orchestrating differential data access for translational research: a pilot implementation. BMC Med Inform Decis Mak 2017; 17 (01) 30.
- 43 Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S. et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform 2014; 52: 28-35.
- 44 Boussadi A, Caruba T, Zapletal E, Sabatier B, Durieux P, Degoulet P. A clinical data warehouse–based process for refining medication orders alerts. J Am Med Inform Assoc 2012; 19 (05) 782-785.
- 45 Jannot AS, Zapletal E, Avillach P, Mamzer MF, Burgun A, Degoulet P. The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience. Int J Med Inform 2017; 102: 21-28.
- 46 Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL. et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016; 23 (06) 1046-1052.
- 47 Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA. et al. The electronic medical records and genomics (eMERGE) network: past, present, and future. Genet Med 2013; 15 (10) 761-771.
- 48 Ritchie F. Secure access to confidential microdata: four years of the Virtual Microdata Laboratory. The Labour Gazette 2008; 02 (05) 29-34.
- 49 European Medicines Agency. EMA/90915/2016 (Version 1.3) – External guidance on the implementation of the European Medicines Agency Policy on Publication of Clinical Data for Medicinal Products for Human Use. 2017