Methods Inf Med 2021; 60(01/02): 021-031
DOI: 10.1055/s-0041-1731387
Original Article

MAGICPL: A Generic Process Description Language for Distributed Pseudonymization Scenarios

Galina Tremper
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Torben Brenner
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Florian Stampe
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
,
Andreas Borg
3   Institute of Medical Biostatistics, Epidemiology and Informatics, Johannes Gutenberg-Universität Mainz, Universitätsmedizin, Mainz, Germany
,
Martin Bialke
4   Department Epidemiology of Health Care and Community Health, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
,
David Croft
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Esther Schmidt
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
,
Martin Lablans
1   Federated Information Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
2   Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
› Author Affiliations
Funding The MAGIC consortium was supported by the Deutsche Forschungsgemeinschaft (DFG) under grant number LA 3859/1-1.

Abstract

Objectives Pseudonymization is an important aspect of projects dealing with sensitive patient data. Most projects build their own specialized, hard-coded, solutions. However, these overlap in many aspects of their functionality. As any re-implementation binds resources, we would like to propose a solution that facilitates and encourages the reuse of existing components.

Methods We analyzed already-established data protection concepts to gain an insight into their common features and the ways in which their components were linked together. We found that we could represent these pseudonymization processes with a simple descriptive language, which we have called MAGICPL, plus a relatively small set of components. We designed MAGICPL as an XML-based language, to make it human-readable and accessible to nonprogrammers. Additionally, a prototype implementation of the components was written in Java. MAGICPL makes it possible to reference the components using their class names, making it easy to extend or exchange the component set. Furthermore, there is a simple HTTP application programming interface (API) that runs the tasks and allows other systems to communicate with the pseudonymization process.

Results MAGICPL has been used in at least three projects, including the re-implementation of the pseudonymization process of the German Cancer Consortium, clinical data flows in a large-scale translational research network (National Network Genomic Medicine), and for our own institute's pseudonymization service.

Conclusions Putting our solution into productive use at both our own institute and at our partner sites facilitated a reduction in the time and effort required to build pseudonymization pipelines in medical research.

Note

The research reported in this article is of a purely technical nature. Neither human nor animal subjects were involved.




Publication History

Received: 22 September 2020

Accepted: 04 May 2021

Article published online:
05 July 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany