Abstract
Background Tuberculosis (TB) is an infectious disease and is among the top 10 causes of death
in the world, and Brazil is part of the top 30 high TB burden countries. Data collection
is an essential practice in health studies, and the adoption of electronic data capture
(EDC) systems can positively increase the experience of data acquisition and analysis.
Also, data-sharing capabilities are crucial to the construction of efficient and effective
evidence-based decision-making tools for managerial and operational actions in TB
services. Data must be held secure and traceable, as well as available and understandable,
for authorized parties.
Objectives In this sense, this work aims to propose a blockchain-based approach to build a reusable,
decentralized, and de-identified dataset of TB research data, while increasing transparency,
accountability, availability, and integrity of raw data collected in EDC systems.
Methods After identifying challenges and gaps, a solution was proposed to tackle them, considering
its relevance for TB studies. Data security issues are being addressed by a blockchain
network and a lightweight and practical governance model. Research Electronic Data
Capture (REDCap) and KoBoToolbox are used as EDC systems in TB research. Mechanisms
to de-identify data and aggregate semantics to data are also available.
Results A permissioned blockchain network was built using Kaleido platform. An integration
engine integrates the EDC systems with the blockchain network, performing de-identification
and aggregating meaning to data. A governance model addresses operational and legal
issues for the proper use of data. Finally, a management system facilitates the handling
of necessary metadata, and additional applications are available to explore the blockchain
and export data.
Conclusions Research data are an important asset not only for the research where it was generated,
but also to underpin studies replication and support further investigations. The proposed
solution allows the delivery of de-identified databases built in real time by storing
data in transactions of a permissioned network, including semantic annotations, as
data are being collected in TB research. The governance model promotes the correct
use of the solution.
Keywords
electronic data capture - data management - blockchain - data sharing - tuberculosis