Methods Inf Med 2015; 54(04): 346-352
DOI: 10.3414/ME14-01-0137
Original Articles
Schattauer GmbH

Exploiting Distributed, Heterogeneous and Sensitive Data Stocks while Maintaining the Owner’s Data Sovereignty

M. Lablans
1   University Medical Center Mainz, Mainz, Germany
D. Kadioglu
1   University Medical Center Mainz, Mainz, Germany
M. Muscholl
1   University Medical Center Mainz, Mainz, Germany
F. Ückert
1   University Medical Center Mainz, Mainz, Germany
› Author Affiliations
Further Information

Publication History

received: 10 December 2014

accepted: 30 May 2015

Publication Date:
22 January 2018 (online)


Background: To achieve statistical significance in medical research, biological or data samples from several bio- or databanks often need to be complemented by those of other institutions. For that purpose, IT-based search services have been established to locate datasets matching a given set of criteria in databases distributed across several institutions. However, previous approaches require data owners to disclose information about their samples, raising a barrier for their participation in the network.

Objective: To devise a method to search distributed databases for datasets matching a given set of criteria while fully maintaining their owner’s data sovereignty.

Methods: As a modification to traditional federated search services, we propose the decentral search, which allows the data owner a high degree of control. Relevant data are loaded into local bridgeheads, each under their owner’s sovereignty. Researchers can formulate criteria sets along with a project proposal using a central search broker, which then notifies the bridgeheads. The criteria are, however, treated as an inquiry rather than a query: Instead of responding with results, bridgeheads notify their owner and wait for his/her decision regarding whether and what to answer based on the criteria set, the matching datasets and the specific project proposal. Without the owner’s explicit consent, no data leaves his/ her institution.

Results: The decentral search has been deployed in one of the six German Centers for Health Research, comprised of eleven university hospitals. In the process, compliance with German data protection regulations has been confirmed. The decentral search also marks the centerpiece of an open source registry software toolbox aiming to build a national registry of rare diseases in Germany.

Conclusions: While the sacrifice of real-time answers impairs some use-cases, it leads to several beneficial side effects: improved data protection due to data parsimony, tolerance for incomplete data schema mappings and flexibility with regard to patient consent. Most importantly, as no datasets ever leave their institution, owners can reject projects without facing potential peer pressure. By its lower barrier for participation, a decentral search service is likely to attract a larger number of partners and to bring a researcher into contact with the right potential partners.