Appendix: Content Summaries of Selected Best Papers for the 2024 IMIA Yearbook, Section
Bioinformatics and Translational Informatics
Bhattacharya A, Vo DD, Jops C, Kim M, Wen C, Hervoso JL, Pasaniuc B, Gandal MJ.
Isoform-level transcriptome-wide association uncovers genetic risk mechanisms for
neuropsychiatric disorders in the human brain.
Nat Genet. 2023 Dec;55(12):2117-2128.
doi: 10.1038/s41588-023-01560-2
This study introduces the isoform-level Transcriptome-Wide Association Study (isoTWAS),
a multivariate, stepwise approach that uses genetic data to impute isoform-level expression
and associate this with phenotypes. Unlike traditional gene-level methods, isoTWAS
leverages the unique transcriptional profiles of distinct transcript isoforms produced
by splicing in brain tissue, improving the prediction accuracy and power for discovering
trait associations within genetic loci identified by genome-wide association studies
(GWAS). The researchers demonstrated the efficacy of isoTWAS across 15 neuropsychiatric
traits. They showed that isoTWAS significantly outperforms gene-level models by providing
a more detailed understanding of transcriptomic mechanisms underlying genetic associations.
Notably, isoTWAS identified multiple associations undetectable at the gene-level,
such as isoforms of the genes AKT3, CUL3, HSPD1, and PCLO, emphasizing the importance
of considering isoform-level variations in complex trait mapping. Key findings include
the improved prediction of isoform and gene expression, which directly correlates
with increased power for identifying trait associations. The isoTWAS framework adjusts
for multiple testing and local linkage disequilibrium structures, enhancing the robustness
of its genetic associations. This work underscores the value of incorporating isoform-level
resolution in genetic studies of brain-related traits, offering new avenues for understanding
the molecular basis of neuropsychiatric disorders and potentially guiding more targeted
therapeutic strategies.
Li Y, Guo Z, Gao X, Wang G.
MMCL-CDR: enhancing cancer drug response prediction with multi-omics and morphology
images contrastive representation learning.
Bioinformatics. 2023 Dec 1;39(12):btad734.
doi: 10.1093/bioinformatics/btad734.
Cancer is a highly heterogeneous and complex disease, which prevents a one-size-fits-all
approach for effective treatment. In this study, Li et al. developed Multimodal Contrastive
Learning for Cancer Drug Responses (MMCL-CDR), a machine learning model for predicting
drug resistance and sensitivity in cancer cell lines. MMCL-CDR leverages two state-of-the-art
approaches to learn a representation of the cancer cell (trained on multi-modal data,
including gene expression levels, copy-number variation, and cell morphology) and
the cancer drug (derived from a graph convolutional network trained on chemical structures).
These two representations are then used as input into a final multilayer perceptron
that predicts resistance or sensitivity for that cell-drug combination. MMCL-CDR outperforms
its competitors in the area under the receiver operating characteristic curve (AUC
= 0.89) and the precision-recall curve (AUC = 0.90), suggesting that the model is
better able to classify cell-drug pairs. Through a series of ablation studies, they
find that the multi-modal features significantly improve model performance, highlighting
the importance of integrating diverse features into multi-omic prediction models.
This work has the potential to improve our understanding of the features that lead
to cancer drug resistance and sensitivity, as well as identify novel anticancer drugs.
Theodoris CV, Xiao L, Chopra A, Chaffin MD, Al Sayed ZR, Hill MC, Mantineo H, Brydon
EM, Zeng Z, Liu XS, Ellinor PT.
Transfer learning enables predictions in network biology.
Nature. 2023 Jun;618(7965):616-624.
doi: 10.1038/s41586-023-06139-9
This study introduces “Geneformer”, a context-aware, attention-based deep learning
model, designed to enhance predictive accuracy in network biology, especially under
conditions of limited data availability. Built on a foundation of transfer learning,
Geneformer was pretrained on a substantial corpus of approximately 30 million human
single-cell transcriptomes, which represent a broad spectrum of human tissues. This
pretraining allowed the model to internalize a deep understanding of network dynamics,
which it could then apply to various downstream tasks. Key insights were demonstrated
in the model's application to disease modeling, particularly cardiomyopathy, where
Geneformer identified novel candidate therapeutic targets. By fine-tuning the model
with limited disease-specific data, it predicted genes with potential therapeutic
implications, which were subsequently validated experimentally. For instance, inhibition
of certain predicted targets in induced pluripotent stem cell-derived cardiomyocytes
showed marked improvement in cellular function, underscoring the model's practical
utility in identifying actionable biological targets. The paper emphasizes the transformative
potential of transfer learning in computational biology, showing how models trained
on extensive datasets can transcend their initial conditions to provide significant
insights in specialized applications with scarce data.
Liao W-W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al.
A draft human pangenome reference.
Nature. 2023 May;617(7960):312-324.
doi: 10.1038/s41586-023-05896-x.
In this study, the Human Pangenome Reference Consortium presents a human pangenome
draft assembled from 47 diverse individuals. Utilizing advanced long-read sequencing
from Pacific Biosciences and Oxford Nanopore, the team generated phased, diploid assemblies
that more accurately reflect human genetic diversity than the traditional single-reference
genomes. The pangenome encompasses 47 individual genomes, each phased and assembled
to high standards of accuracy, covering over 99% of the expected genetic sequences
and displaying high fidelity at both structural and base pair levels. Importantly,
this new reference includes 119 million base pairs of euchromatic polymorphic sequences
that are not present in the current GRCh38 reference genome and identifies 1,115 gene
duplications, significantly enriching our genomic reference materials. The application
of this pangenome in genetic analysis offers substantial improvements over the GRCh38
reference, with a 34% reduction in small variant discovery errors and a doubling in
the detection of structural variants per haplotype. These enhancements are pivotal
for advancing our understanding of genetic variations and their implications across
different human populations, laying a stronger foundation for future genomic research
and medical applications.