Appendix: Content Summaries of Selected Best Papers for the 2024 IMIA Yearbook, Section
Clinical Research Informatics
Gierend K, Waltemath D, Ganslandt T, Siegel F.
Traceable Research Data Sharing in a German Medical Data Integration Center With FAIR
(Findability, Accessibility, Interoperability, and Reusability)-Geared Provenance
Implementation: Proof-of-Concept Study.
JMIR Form Res 2023;7:e50027.
doi: 10.2196/50027
The study aimed to enhance the reusability of clinical routine data within a medical
data integration center (DIC) for secondary use in clinical research and to establish
traceable provenance information to ensure data integrity, reliability, and trust.
This was achieved by developing a proof-of-concept provenance class, which integrated
provenance traces at the data element level using the W3C international standard.
The study employed a customized roadmap for a provenance framework, aligning records
with healthcare standards such as FHIR (Fast Healthcare Interoperability Resources)
and conducted a comprehensive assessment of provenance trace measurements. The results
demonstrated successful implementation of traceable provenance information within
a German medical DIC, marking the first such integration. The study showcased effective
data management practices enhanced by provenance metadata, with commendable execution
times and accuracy in processing clinical routine data. Provenance traces allowed
for a detailed and reliable presentation of data transformations and their lineage,
thereby supporting secondary use and research. The study concluded that the innovative
method of integrating provenance information into clinical data promotes effective
and reliable data management. This approach enhances trust and accountability in clinical
data used for research, with potential applications beyond the medical sector. The
traceable provenance information significantly improves the quality and reliability
of data. The system mitigates risks by ensuring that data analysis is informed by
knowledge of the origin and quality of all data elements, thus preventing ineffective
analyses based on compromised data. These principles, although developed for the medical
DIC use case, can be universally applied throughout the scientific domain, thereby
enhancing the reliability and safety of quality-assured patient data for secondary
use.
Hamidi B, Flume PA, Simpson KN, Alekseyenko AV.
Not all phenotypes are created equal: covariates of success in e-phenotype specification.
J Am Med Inform Assoc. 2023 Jan 18;30(2):213-221.
doi: 10.1093/jamia/ocac157.
The goal of the study was to understand the factors contributing to the successful
creation of electronic phenotypes (e-phenotypes) and identify covariates associated
with success rates in e-phenotype validation. Specifically, it aimed to compare the
performance of “computer scientists” and “noninformaticists” in this task. Noninformaticist
experts (n=21) created e-phenotypes using the i2b2 platform with support from a data
broker and a project coordinator. Validation involved re-identifying patient and visit
sets and selecting a random sample of charts for experts to review, assessing their
match to the intended e-phenotypes. The study focused on characteristics of the queries
and the experts themselves.
Results showed significant variability in validation rates for e-phenotypes, largely
influenced by the domain of expertise and query characteristics. Domains such as infectious
diseases, rheumatic conditions, neonatal issues, and cancers demonstrated better performance.
Challenges included distinguishing between patient characteristics and clinical events
and configuring temporal constraints. Besides, inpatient-focused domains, which collect
more comprehensive data in electronic health records, had higher match rates compared
to outpatient-focused domains. The study highlighted that expert domain knowledge
and the design of queries are crucial for the success of e-phenotypes. It emphasized
the importance of specialized support in phenotype design to ensure high-quality,
reliable e-phenotype creation for diverse clinical and research applications. The
collaborative process between clinical domain experts and data brokers, referred to
as “biomedical query mediation,” played a significant role in achieving successful
phenotyping.
Ozonze O, Scott PJ, Hopgood AA.
Automating Electronic Health Record Data Quality Assessment.
J Med Syst. 2023 Feb 13;47(1):23.
doi: 10.1007/s10916-022-01892-2.
The topic of data quality has gained importance across the clinical and research spectrum,
and fuelled important research in recent years that has also been reflected in some
CRI best papers. However, to scale the process of assessing data quality, the definition
of dimensions and data element specific rules need to be progressively standardised,
and automated tooling is required. These aspects of standardisation and automation
are still emerging, and this paper was selected as one of the best papers because
it provides an up-to-date review of that automation process. The authors have focused
attention on three recognised dimensions of data quality: completeness, correctness
and currency and drawn attention to the need to assess both univariate and multivariate
data quality issues. Their review focuses on 23 articles reporting data quality assessment
implementations over the past eight years. Their assessment includes which dimensions
of data quality were capable of assessment, the technical nature of the tool that
was produced, and the points in the data quality assessment life cycle the tools and
any accompanying rules can be used. The paper proposes a concept model for data quality
assessments that is comprehensive and may serve the developers of future data quality
tools to position their work consistently. The authors examined the functional characteristics
of the reported tools: the extent to which the tool could be configured for different
clinical domains, usability, performance and information security. They conclude that
there is still a lack of clarity about the positioning of data quality assessment
tools within an assessment workflow, and the need for the tooling to be better able
to handle the heterogeneity of EHR data repositories. They also note that there is
no current culture of quality assessing the tools used for data quality assessment.
Maturity in this field of data quality assessment methodologies and tools is vitally
needed to scale up the routine assessment of data quality and to propel the data quality
improvement momentum.