A Privacy-Preserving Distributed Analytics Platform for Health Care Data

Sascha Welten; Yongli Mou; Laurenz Neumann; Mehrshad Jaberansary; Yeliz Yediel Ucer; Toralf Kirsten; Stefan Decker; Oya Beyan

doi:10.1055/s-0041-1740564

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

CC BY-NC-ND 4.0 · Methods Inf Med 2022; 61(S 01): e1-e11
DOI: 10.1055/s-0041-1740564

Original Article

A Privacy-Preserving Distributed Analytics Platform for Health Care Data

Sascha Welten

¹Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany

,

Yongli Mou

¹Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany

,

Laurenz Neumann

¹Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany

,

Mehrshad Jaberansary

¹Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany

,

Yeliz Yediel Ucer

²Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Sankt Augustin, Germany

,

Toralf Kirsten

³Department of Medical Data Science, University Medical Center Leipzig, Leipzig, Germany

,

Stefan Decker

¹Chair of Computer Science 5, RWTH Aachen University, Aachen, Germany

²Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Sankt Augustin, Germany

,

Oya Beyan

²Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Sankt Augustin, Germany

⁴Institute for Medical Informatics, Faculty of Medicine, University Hospital Cologne, University of Cologne, Cologne, Germany

› Author Affiliations Funding This work was supported by the German Ministry for Research and Education (BMBF) as part of the SMITH consortium (SW, LN, MJ, YUY, TK, SD, and OB, grant no. 01ZZ1803K). This work was conducted jointly by RWTH Aachen University and Fraunhofer FIT as part of the PHT and Go FAIR implementation network, which aims to develop a proof-of-concept information system to address current data reusability challenges occurring in the context of so-called data integration centres that are being established as part of ongoing German Medical Informatics BMBF projects.

› Further Information

Abstract
Full Text
References
Supplementary Material

Permissions and Reprints

Abstract

Background In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy risks like the accidental disclosure of data to third parties. Therefore, alternative data usage policies, which comply with present privacy guidelines, are of particular interest.

Objective We aim to enable analyses on sensitive patient data by simultaneously complying with local data protection regulations using an approach called the Personal Health Train (PHT), which is a paradigm that utilises distributed analytics (DA) methods. The main principle of the PHT is that the analytical task is brought to the data provider and the data instances remain in their original location.

Methods In this work, we present our implementation of the PHT paradigm, which preserves the sovereignty and autonomy of the data providers and operates with a limited number of communication channels. We further conduct a DA use case on data stored in three different and distributed data providers.

Results We show that our infrastructure enables the training of data models based on distributed data sources.

Conclusion Our work presents the capabilities of DA infrastructures in the health care sector, which lower the regulatory obstacles of sharing patient data. We further demonstrate its ability to fuel medical science by making distributed data sets available for scientists or health care practitioners.

Keywords

distributed analytics - Personal Health Train - FAIR

Supplementary Material

Supplementary Material

Publication History

Received: 30 March 2021

Accepted: 22 September 2021

Article published online:
17 January 2022

© 2022. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Chang K, Balachandar N, Lam C. et al. Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc 2018; 25 (08) 945-954

Crossref PubMed Google Scholar
2 Das A, Upadhyaya I, Meng X. et al. Collaborative filtering as a case-study for model parallelism on bulk synchronous systems. In: ACM Conference on Information and Knowledge Management - CIKM '17. New York, New York, USA: ACM Press; 2017: 969-977

PubMed Google Scholar
3 McMahan B, Moore E, Ramage D. et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. In: Artificial Intelligence and Statistics - AISTATS 2016. PMLR; 2017: 1273-1282

Google Scholar
4 Sheller MJ, Reina GA, Edwards B, Martin J, Bakas S. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. Brainlesion 2019; 11383: 92-104

PubMed Google Scholar
5 Su H, Chen H. Experiments on parallel training of deep neural network using model averaging. 2015 . ArXiv: 1507.01239

PubMed Google Scholar
6 Su Y, Lyu M, King I. Communication-Efficient Distributed Deep Metric Learning with Hybrid Synchronization. In: 27th ACM International Conference on Information and Knowledge Management - CIKM '18. New York, USA: ACM Press; 2018: 1463-1472

Google Scholar
7 Sheller MJ, Edwards B, Reina GA. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 2020; 10 (01) 12598

Crossref PubMed Google Scholar
8 Beyan O, Choudhury A, van Soest J. et al. Distributed analytics on sensitive medical data: the Personal Health Train. Data Intelligence 2020; 2 (1–2): 96-107

Crossref PubMed Google Scholar
9 Sun C, Ippel L, van Soest J. et al. A privacy-preserving infrastructure for analyzing personal health data in a vertically partitioned scenario. Stud Health Technol Inform 2019; 264: 373-377

PubMed Google Scholar
10 Shi Z, Zhovannik I, Traverso A. et al. Distributed radiomics as a signature validation study using the Personal Health Train infrastructure. Sci Data 2019; 6 (01) 218

Crossref PubMed Google Scholar
11 Deist TM, Dankers FJWM, Ojha P. et al. Distributed learning on 20 000+ lung cancer patients - The Personal Health Train. Radiother Oncol 2020; 144: 189-200

Crossref PubMed Google Scholar
12 Mou Y, Welten S, Jaberansary M. et al. Distributed skin lesion analysis across decentralised data sources. Stud Health Technol Inform 2021; 281: 352-356

PubMed Google Scholar
13 Wilson RC, Butters OW, Avraam D. et al. DataSHIELD – new directions and dimensions. Data Sci J 2017; 16: 21

Crossref PubMed Google Scholar
14 Bonofiglio F, Schumacher M, Binder H. Recovery of original individual person data (IPD) inferences from empirical IPD summaries only: applications to distributed computing under disclosure constraints. Stat Med 2020; 39 (08) 1183-1198

Crossref PubMed Google Scholar
15 Pinart M, Jeran S, Boeing H. et al. Dietary macronutrient composition in relation to circulating HDL and non-HDL cholesterol: a federated individual-level analysis of cross-sectional data from adolescents and adults in 8 European studies. J Nutr 2021; 151 (08) 2317-2329

Crossref PubMed Google Scholar
16 Zhao C, Zhao S, Zhao M. et al. Secure multi-party computation: theory, practice and applications. Inf Sci 2019; 476: 357-372

Crossref PubMed Google Scholar
17 Doganay MC, Pedersen TB, Förg F. et al. Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, PAIS'08, New York, USA: ACM; 2008: 3-11

PubMed Google Scholar
18 Stammler S, Kussel T, Schoppmann P. et al. Mainzelliste SecureEpiLinker (MainSEL): privacy-preserving record linkage using secure multi-party computation. Bioinformatics 2020; btaa764

PubMed Google Scholar
19 Wüller S, Mayer D, Förg F. et al. Designing privacy-preserving interval operations based on homomorphic encryption and secret sharing techniques. J Comput Secur 2017; 25: 59-81

Crossref PubMed Google Scholar
20 Welten S, Neumann L, Ucer YedielY. et al. DAMS: A Distributed Analytics Metadata Schema. Data Intelligence; 2021

Google Scholar
21 Kermany D, Zhang K, Goldbaum M. Labeled optical coherence tomography (OCT) and chest X-ray images for classification. Mendeley data 2018;2(02):

PubMed Google Scholar
22 Kermany DS, Goldbaum M, Cai W. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018; 172 (05) 1122-1131.e9

Crossref PubMed Google Scholar
23 Fang H, Qian Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet 2021; 13 (04) 94

Crossref PubMed Google Scholar
24 Li W, Milletarì F, Xu D. et al. Privacy-Preserving Federated Brain Tumour Segmentation. In: Suk HI, Liu M, Yan P, Lian C. eds. Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science, Vol 11861. Cham: Springer; 2019

Google Scholar
25 Melis L, Song C, De Cristofaro E. et al. Exploiting unintended feature leakage in collaborative learning. In: Proceedings of 40th IEEE Symposium on Security & Privacy, San Francisco, USA; 2019: 497-512

PubMed Google Scholar
26 Hitaj B, Ateniese G, Perez-Cruz F. Deep models under the GAN: Information leakage from collaborative deep learning. In: Proceedings of the 24th Conference on Computer and Communications Security, Dallas, USA; 2017: 603-618

PubMed Google Scholar
27 Vatsalan D, Christen P, Rahm E. Incremental clustering techniques for multi-party privacy-preserving record linkage. Data Knowl Eng 2020; 128: 101809

Crossref PubMed Google Scholar

Supplementary Material

Supplementary Material

Subscribe to RSS

Share / Bookmark

A Privacy-Preserving Distributed Analytics Platform for Health Care Data

Abstract

Keywords

Supplementary Material

Publication History

References