Subscribe to RSS

DOI: 10.1055/a-2599-3728
Big Data Analytics in Large Cohorts: Opportunities and Challenges for Research in Hepatology
- Introduction
- Opportunities in Big Data Analytics
- Advancing Precision Medicine and Drug Repurposing
- Applications of AI in Big Data Analytics
- Challenges in Big Data for Hepatology
- Future Directions
- Conclusion
- Main Concepts and Learning Points
- References
Abstract
Advances in big data analytics, precision medicine, and artificial intelligence are transforming hepatology, offering new insights into disease mechanisms, risk stratification, and therapeutic interventions. In this review, we explore how the integration of genetic studies, multi-omics data, and large-scale population cohorts has reshaped our understanding of liver disease, using steatotic liver disease as a prototype for data-driven discoveries in hepatology. We highlight the role of artificial intelligence in identifying patient subgroups, optimizing treatment strategies, and uncovering novel therapeutic targets. Furthermore, we discuss the importance of collaborative networks, open data initiatives, and implementation science in translating these findings into clinical practice. Although data-driven precision medicine holds great promise, its impact depends on structured approaches that ensure real-world adoption.
#
New technologies like big data, artificial intelligence, and precision medicine are changing how we understand and treat liver diseases. This review looks at how large research studies and genetic data help scientists learn more about conditions like steatotic liver disease. Artificial intelligence can sort patients into groups, suggest better treatments, and even help find new drugs. Collaboration between researchers and open data sharing are important to bring these discoveries into real-world healthcare. Although these advances have great potential, they need careful planning to make sure they truly help patients.
Introduction
In our current era of digital medicine, public databases are an indispensable resource for understanding liver disease. Population-based databases are digital collections of data that are available to researchers from all over the world,[1] often after submitting an application and usually for a fee. They are a crucial tool in modern medical research, providing unparalleled access to information that is essential for hypothesis generation, testing, or validation.[2] These new sources that often also include increasingly multi-omic data are of interest in hepatology, as it has increased dimensionality and accessibility for researchers. The role that large databases can play in research is diverse, allowing researchers to perform various types of studies, including population-level comparative studies, disease monitoring and phenotyping, predictive modeling to improve prognosis or risk-stratification, or assessing unmet public health needs.[3] The integration of large-scale cohorts with increasingly sophisticated computational methods has transformed the landscape of liver disease research. As these technologies continue to evolve, their impact will only grow in the coming years.[4] By exploring the role of big data analytics in hepatology, we aim to highlight the opportunities it presents for both clinicians and researchers while addressing the challenges that must be overcome to fully unlock its potential.
Ongoing global scientific efforts aim to refine the classification of liver diseases and establish well-characterized phenotypic subgroups based on shared traits or treatment responses. Large-scale databases play a crucial role in this process, as they integrate diverse datasets: this includes patient demographics, lifestyle information including social history (i.e., alcohol intake, nutrition, pack years), genotypic information, detailed clinical data from collected specimens, results of imaging procedures (ultrasound, MRI, CT, or even FibroScan®), liver biopsy reports, microbiome data, and increasingly often omics data (lipidomics, metabolomics, proteomics, transcriptomics, etc.). A summary of these applications is depicted in [Fig. 1].


In population-based datasets, there are two different types of data: Unstructured and structured.[5] Structured data consists of quantitative metrics organized within predefined formats, such as electronic health records (EHRs), laboratory test results, and standardized clinical measurements. These datasets can be readily accessible through databases and are easier to analyze using conventional statistical and machine learning approaches.
However, unstructured data accounts for nearly 80% of all data and includes a wide range of complex, non-tabular information, such as physician notes, imaging reports, pathology slides, and genomic sequences. Unlike structured data, unstructured data are not directly searchable or analyzable within traditional data management systems. Extracting meaningful insights from these datasets requires advanced computational techniques, including natural language processing (NLP) for text mining[6] or deep learning for medical imaging to extract radiomic data.[7] By utilizing newer technologies with language models,[8] it is expected that unstructured data can become easily structured by creating more easily accessible summaries of medical information. A recent study demonstrated that large language models (like GPT4) are highly accurate for identifying cirrhosis and its complications from discharge summaries, outperforming traditional code-based classification, suggesting they could potentially augment or replace labor-intensive chart reviews in cirrhosis research cohort identification.[8] In hepatology, there is significant clinical utility in leveraging both structured and unstructured data due to the complexity of liver diseases and the need for comprehensive patient assessment.[9] Structured data, such as laboratory values (e.g., ALT, AST, bilirubin), imaging-derived biomarkers (e.g., liver stiffness from elastography), and standardized clinical scores (e.g., MELD, FIB-4), provide quantitative measures that facilitate disease classification, prognosis, and treatment decision-making.[10] Conversely, unstructured data, including physician notes, pathology reports, imaging scans, and endoscopic findings, contain valuable contextual information that structured data alone cannot capture.[11] Despite its clinical significance, unstructured data are often not systematically included in population-based datasets, limiting its potential for large-scale studies.
There are various types of population-based databases relevant to liver research, as summarized in [Table 1]. Many of these databases have been collecting data over years or even decades, enabling longitudinal studies that provide valuable insights into disease progression and long-term outcomes.
Region |
Biobank |
Number of participants |
Type of data |
Special features |
Relevance for hepatology |
---|---|---|---|---|---|
Europe |
KORA[73] (Cooperative Health Research in the Augsburg Region) |
Over 18,000 |
Clinical, genetic, and environmental data |
Focus on cardiovascular and environmental research |
Data on liver disease and metabolic syndrome |
LURIC[74] (Ludwigshafen Risk and Cardiovascular Health Study) |
Over 3,300 |
Genetic, metabolic, and inflammatory data |
Focus on cardiovascular diseases |
Information on interactions between cardiovascular and liver diseases |
|
NAKO[75] (German National Cohort) |
Approximately 200,000 |
Clinical, genetic, and lifestyle data |
One of the largest population-based studies in Germany |
Comprehensive data on liver disease |
|
UK Biobank[76] |
Over 500,000 |
Genetic, clinical, and lifestyle data |
Extensive genetic and medical data |
Genetics, liver MRI, “omics” |
|
FinnGen[77] |
Over 500,000 |
Genetic and disease registry data |
Combination of genetic and registry data |
Genetics, “omics” |
|
Lifelines[78] |
Over 167,000 |
Clinical, questionnaire, biological data Microbiome |
Longitudinal study on health and disease |
Longitudinal data on the development of liver disease and microbiota |
|
Biobank Graz[79] |
Over 7.5 million samples |
Biological samples, clinical data |
One of the largest biobanks in Europe |
Liver samples |
|
DeCODE Genetics[80] |
Over 160,000 |
Genetic and clinical data |
Focus on the population of Iceland with extensive genealogical information |
Genetic predispositions to liver diseases |
|
Our Future Health[81] |
Target of 5 million |
Genetic, clinical, and lifestyle data |
Aims to be the UK's largest health research program |
Potential for large-scale studies on liver disease prevention and treatment |
|
United States |
Penn Medicine Biobank[82] |
60,000 |
Genetic and clinical data |
Focus on personalized medicine and research |
Data on genetic markers and personalized medicine in liver disease |
Mass General Brigham Biobank[83] |
Over 145,000 |
Genetic, clinical, and lifestyle data |
Large-scale repository |
Provides data for studies on liver disease risk factors and genetics |
|
Mayo Clinic Biobank[84] |
Over 21,000 |
Genetic, clinical, and lifestyle data |
Comprehensive health data with a focus on individualized medicine |
Resource for studies on genetic predispositions to liver diseases |
|
Kaiser Permanente Research Bank[85] |
Over 300,000 |
Genetic, clinical, and environmental data |
Emphasis on health research across diverse populations |
Enables studies on environmental and genetic factors in cancer |
|
All of Us[86] |
Target of 1 million |
Genetic, environmental, and lifestyle data |
Diversity and inclusion of a broad population |
Diverse data on environmental and lifestyle factors that influence liver disease |
|
Million Veteran Program[87] |
Over 825,000 |
Genetic and military exposure data |
Study of genetic and environmental factors in veterans |
Effects of environmental and lifestyle factors on liver disease |
|
NHANES[88] |
Annually approximately 5,000 |
Clinical, nutritional, and health data |
Longitudinal study on health and nutrition in the USA |
Diet and lifestyle data, FibroScan |
|
MyCode Geisinger Health System[89] |
Over 200,000 |
Genetic and clinical data |
System-wide biobank linked to electronic health records |
Enables studies on genetic factors and personalized medicine approaches in liver disease |
|
Canada |
CARTaGENE[90] |
Approximately 40,000 |
Genetic, clinical, environmental, and lifestyle data |
Population-based biobank in Quebec focusing on chronic diseases |
Provides data for research on environmental, nutrition, and genetic risk factors for liver diseases |
Canadian Partnership for Tomorrow's Health (CanPath)[91] |
Over 300,000 |
Genetic, clinical, environmental, and lifestyle data |
Pan-Canadian cohort integrating regional studies like CARTaGENE |
Facilitates large-scale studies on liver disease risk factors across diverse Canadian populations |
|
Asia |
China Kadoorie Biobank[92] |
Approximately 500,000 |
Clinical and lifestyle data |
Focus on chronic diseases in China |
Data on common liver diseases in Asia and their risk factors |
BioBank Japan[93] |
Over 200,000 |
Genetic and clinical data |
Focus on genetic data of Japanese individuals |
Specific genetic data on liver disease in the Japanese population |
To maximize the utility of these datasets, it is often necessary to systematically integrate data from multiple sources, such as linking omics data with cohort studies through a process known as data fusion.[12] This approach facilitates an integrative analysis of liver disease mechanisms, risk factors, and treatment responses across different populations and healthcare systems. A not yet existent uniform data structuring would allow for seamless integration and interpretation, making insights more actionable for both researchers and clinicians.[13] With well-organized and standardized datasets, big data holds great promise for improving clinical decision-making, improving diagnostic accuracy, enabling early risk prediction, and guiding personalized interventions for liver diseases.
#
Opportunities in Big Data Analytics
In clinical hepatology, big data and omics based technology facilitates real-world evidence generation, enabling better stratification of patients and personalized medicine.[3] As disease entities in the field of hepatology are often heterogenous with a wide range of clinical phenotypes associated with age of onset, course of disease, and treatment responsiveness, analyzing big data can allow for refined subclassification of diseases through a concept called “phenomapping,”[14] allowing clinicians to better understand the pathogenesis of disease entities as well as develop predictive biomarkers/models. [Fig. 2] summarizes how big data analytics could be applied to different forms of omics based data and its clinical applications described below.


Biomarkers, Predictions, and Precision Therapy
Big data analytics enables the identification of biomarkers that can serve as diagnostic, prognostic, or therapeutic targets. The National Institute of Health identifies a biomarker as an objective measure and indicator of biologic function, pathophysiological processes, and response to treatment.[15] This has allowed for the application of big data in liver disease biomarker discovery. Moreover, we want to emphasize how novel targeted therapies for MASLD (e.g., TR-β agonists, GLP-1 receptor agonists, FGF21 analogs, and SGLT2 inhibitors) might leverage biomarker databases to facilitate personalized treatment approaches.
For identification of steatosis in the Kadoorie Biobank glutathione disulfide and diacylglycerol were found to be useful.[16] A study in UK Biobank has shown that lipidomic profiles, especially large HDL, may differentiate metabolic dysfunction-associated steatotic liver disease (MASLD) and combined metabolic dysfunction- and alcohol-related steatotic liver disease (MetALD).[17] As MASLD exhibits heterogeneity, a recent study identified two distinct phenotypic clusters identified through partitioning around medoids clustering: a liver-specific type with genetic links and rapid progression of chronic liver disease but limited cardiovascular risk, and a cardiometabolic type associated with dysglycemia, hypertriglyceridemia, and increased cardiovascular and diabetes risk. These clusters, validated across multiple cohorts, demonstrated distinct liver transcriptomic signatures.[18] Moreover, 32 biomarkers, including SHGB and ApoB, were associated with an increased incidence of HCC.[19] Plasma GDF15 was proposed to identify patients at risk of steatohepatitis. With increased accessibility in accessing population-based health datasets, clinicians and researchers can additionally check additional protein biomarkers associated with liver disease in the new proteome-phenome atlas from the UK Biobank.[20] In addition, biomarker discovery can lead to the validation of newer targeted therapies such as thyroid receptor β and FGF21 analogs.
Predictive models derived from big data are useful for stratifying patients based on their disease progression and treatment responses. Combining clinical parameters and sometimes omics data, these models can often predict outcomes such as fibrosis progression, HCC risk, or mortality with greater accuracy than traditional clinical scoring systems. Traditional liver injury markers such as transaminases as well as intrinsic liver function laboratories including albumin and coagulation panels have been critical in risk prediction of liver disease.[21] Traditional risk scores derived from large cohorts relevant in hepatology include the Model for End-stage Liver Disease (MELD)[22] score, FIB-4 Index, AST/ALT ratio, and the Albumin-to-Platelet Ratio Index (APRI).[23] However, the availability of big data has allowed researchers and clinicians to improve risk prediction with a combination of serum parameters and baseline demographics including more novel scores such as the LiverRisk score[24] and the Chronic Liver Disease (CLivD) score.[25] In the general population, it is difficult to diagnose liver disease as the presentation is generally asymptomatic. With the utilization of age, sex, alcohol intake, waist–hip ratio (as opposed to BMI), diabetes, smoking, and GGT, the CLivD score was developed to estimate 15-year risk of liver-related events.
Similarly in the study by Njei et al, the development of a machine learning model with FAST scores, liver stiffness measurements, and aminotransferase levels outperformed the ability to predict high-risk MASH compared with traditional FIB-4 or APRI scores.[26] For decompensation of cirrhosis the Early Prediction of Decompensation (EPOD) score was developed in three population-based studies.[27] Liver cancer, one of the most common cancers in the world, poses a serious public health burden due to high morbidity on diagnosis. Liu et al, using the UK Biobank cohort, developed a risk prediction model for liver cancer in a subset of over 300,000 participants without previous diagnoses of cancer on the basis of sociodemographic factors, physical measurements, lifestyle behaviors, and personal medical/family history.[28]
One of the most crucial developments in hepatology research has come from genome-wide association studies (GWAS), uncovering the genetic basis of complex traits associated with elevated liver enzymes[29] or steatotic liver disease.[30] As shown in [Table 1] many of the population-based studies include some type of genetic information, some even whole genome sequencing.
One prime example of how genetic has been used to advance hepatology is MASLD: Understanding genetic variants associated with MASLD has allowed us to better understand the functional mechanism leading to hepatic steatosis, as these genes are often involved in hepatic lipid metabolism, oxidative stress pathways, and inflammation leading to hepatic ballooning.[31] [32] GWAS have promoted several genetic risk loci for MASLD by leveraging novel phenotypes including ALT proxies and magnetic resonance imaging proton density fractionated fat across various large cohorts.[30]
As GWAS are insightful at drawing various associations with genetic variants in large-scale cohorts, a reverse methodology through phenome-wide association studies (PheWAS) and genotype-first approaches has been employed to map variant of interest within a single gene for multiple disease pathologies. In the setting of MASLD, this becomes relevant as its fundamental pathophysiological mechanisms arise from insulin resistance and dyslipidemia, thus leading to various phenotypic manifestations. Several genome-first approaches have been employed for the study of hepatic steatosis, all of which incorporate the data from large-scale biobanks with sometimes in vitro validation on cell models.[33] [34] [35] [36] These advances highlight the importance of integrating genetic insights with clinical and multi-omics data to refine our understanding of liver disease pathophysiology. Although GWAS and PheWAS have significantly contributed to identifying risk loci and elucidating disease mechanisms, future studies will need to incorporate functional validation and mechanistic studies to translate these findings into clinical applications.
Notably, this genetic knowledge has already paved the way for the development of targeted therapies, exemplified by the first siRNA therapies for PNPLA3,[37] the most well-known genetic variant associated with MASLD.[31]
#
#
Advancing Precision Medicine and Drug Repurposing
Big data analytics facilitates the identification of patient subgroups that may benefit from targeted therapies. In hepatology, this approach is particularly relevant for tailoring treatments such as GLP-1 receptor agonists, immunotherapies, and antifibrotic agents to specific patient profiles. By refining treatment strategies, precision medicine can improve outcomes and reduce healthcare costs.
A key example of this progress is the rapid evolution of clinical trials leveraging big data, which has recently culminated in the first FDA-approved treatment for advanced fibrosis in MASH.[38] When starting this review in 2024, there were no FDA-approved treatment for MASH despite these cohorts being at a higher risk of histologically proved liver fibrosis. However, resmetirom, a liver-specific thyroid hormone receptor β-selective agonist, was shown to achieve MASH resolution in almost 30% of patients who received 100 mg of the medication as well as fibrosis improvement by at least one stage without worsening of the MASLD activity score. In addition, there is an emerging evidence that fibroblast growth factor 21 (FGF21) exhibits a substantial role in lipogenesis and hepatic insulin sensitivity, which has led to an additive effect to lipid profiles in combination with lifestyle changes such as diet.[39] As FGF21 also reflects liver fat accumulation, the exploration of transcriptomic data of MASLD models in mice suggested that upregulated FGF21 levels were protective against lipotoxicity and endoplasmic reticulum (ER) stress. As such, a phase 2b multicenter clinical trial on the FGF21 analog pegozafermin has been shown to improve fibrosis in biopsy-confirmed MASH.[40] New studies, which also include large cohorts, can now help pinpoint which patients might benefit most from these new drugs.
The increasing integration of precision medicine in hepatology extends beyond novel drug development to the potential repurposing of existing medications.[41] Given the metabolic constitution of MASLD, there is growing interest in leveraging anti-diabetic agents to improve liver-related outcomes. As insulin resistance and dyslipidemia coincide with the pathophysiology of hepatic steatosis, growing interest has focused on the role of anti-diabetic agents in improving clinical outcomes for these patients. With MASLD increasing in prevalence, institutions around the world have used large-scale population databases to investigate how glucagon-like peptide 1 receptor agonists (GLP-1 RAs) could improve liver inflammation.[42] Multiple large-scale studies have suggested that GLP-1 RAs could be repurposed to lower the risk of cirrhosis progression as well as reduce liver fat accumulation among patients with concurrent MASLD and type 2 diabetes.[43] [44] Sodium glucose transporter 2 (SGLT2) inhibitors have also been shown to reduce histological steatosis as well as hepatic ballooning in MASLD likely by modulating inflammatory pathways associated with interleukins and AMP-activated protein kinase/mammalian target of rapamycin signaling.[45] [46] Similarly, such mechanistic studies have allowed the study of SGLT2i in the context of hepatocellular carcinoma and diabetes, which showed improvements in lipid profiles and fibrosis.[47] However, there is not sufficient clinical evidence to suggest that SGLT2 inhibitors could be effectively repurposed. As for dyslipidemia and cardiovascular disease, studies in the UK Biobank as well as TriNetX cohort have suggested that statins, aspirin, and omega-3 intake could reduce the incidence of liver disease as well as mortality irrespective of genetic risk.[41] [48] [49]
Although targeted personalized therapies and drug repurposing hold promise, further validation through well-powered clinical trials and mechanistic studies is essential. Moreover, there is a new player that is poised to further revolutionize precision hepatology.
#
Applications of AI in Big Data Analytics
The application of artificial intelligence (AI) is revolutionizing liver research,[50] especially in the analysis of extensive data from databases. AI technologies, such as machine learning and deep learning, make it possible to detect complex patterns in large datasets that are too complex for human analysts. By developing predictive models, AI can improve the prediction of disease progression, treatment responses, and potential risk factors.[51] AI tools can automate time-consuming tasks, such as data processing and analysis, thus increasing research efficiency.[52] This is where the application of AI to population-based datasets is interesting, with [Fig. 3] demonstrating a general schematic of different forms of data in hepatology that could benefit from AI processing.


Traditional statistical methods, such as regression models, have long served as foundational tools in hepatology research. Regression analysis remains a cornerstone for identifying associations between clinical variables, genetic risk factors, and disease outcomes, providing interpretable insights into MASLD progression and treatment response. Building on these statistical methods, machine learning techniques, including decision trees, support vector machines, and neural networks, offer more sophisticated ways to model complex interactions within large-scale datasets.[53] In a recent study by Yu et al, applications of deep learning algorithms such as spatio-temporal 3D convolution networks demonstrated robustness in accurately diagnosing HCC early based on triphasic CT liver scans.[54] Various methodologies stemming from AI models have allowed researchers to employ novel parameters such as anthropometric and body composition indices to accurately predict MASLD based on machine learning techniques applied to outcomes based on FibroScan classifications of steatosis and fibrosis.[55]
A recent study identified how machine learning algorithms have been applied in hepatology: Random forest classifications followed by decision trees and support vector machines exhibited the highest average accuracy across 58 studies, with elevations in liver enzymes (i.e., aminotransferases) being the most utilized feature to define the liver phenotype.[56] Some examples of comparisons made between ML-based algorithms and conventional algorithms include early detection and diagnosis of reversible stages of steatotic liver disease, fibrosis risk in chronic hepatitis B, and development of hepatocellular carcinoma in metabolic liver disease. In a study done across patients with chronic viral hepatitis in Hong Kong, machine learning methods such as logistic regression, Adaboost, random forest classifiers, and decision trees were comparatively employed to construct accurate predictions of HCC in patients with chronic viral hepatitis based on already validated risk scores such as the HCC ridge score.[57]
A comprehensive review paper by Ghosh et al well summarizes key areas that AI had been applied to study a wide range of liver diseases.[58] As AI continues to evolve, its integration into hepatology promises to refine risk stratification, enhance early diagnosis, and personalize treatment strategies for MASLD and related liver diseases. Although traditional regression models remain essential for hypothesis-driven research, advanced AI techniques not only offer the ability to uncover hidden patterns and optimize decision-making, but also pose some challenges.
#
Challenges in Big Data for Hepatology
The increasing availability of diverse datasets and databases will revolutionize research in hepatology. Not only does it allow new hypotheses to be generated, but it also helps accelerate scientific discovery and innovation. However, challenges associated with the use of these databases—such as data quality, ethical concerns, privacy, and the need to distinguish between association and causality—must continue to be addressed with care and integrity.
Data Quality and Heterogeneity
Although public databases offer a wealth of opportunities for liver research, there are also several pitfalls that researchers should be aware of. These challenges can influence the interpretation and use of the data and must be carefully considered to achieve valid and reliable research results. Big data are often plagued by issues such as missing data, variability in clinical measurements, and biases related to population selection. These challenges can lead to misleading conclusions if not addressed through robust data cleaning and normalization techniques. One of the biggest pitfalls is the quality and completeness of the data. In some cases, records may have gaps, be inconsistent, or contain errors. But also, the missing parameter in a cohort can be useful information; therefore, imputation is hard and can lead to erroneous conclusions. This can be especially the case if the data come from different sources or have been collected over long periods of time. A critical assessment of data quality is crucial to avoid erroneous conclusions. Another problem is the distortion of the data (bias). For example, databases could contain a disproportionate amount of data from certain demographic groups or regions, which can lead to a bias in research results. Moreover, loss of follow-up has to be accounted for. Researchers need to consider these potential biases and apply appropriate statistical methods to minimize them.[59] The results obtained from specific datasets may not be transferable to other populations or contexts. This is particularly relevant when research data come from a limited or special group of patients. Therefore, it is desirable to validate results in as many diverse datasets as possible. Correctly interpreting large and complex datasets also requires advanced statistical skills. Without proper methods and knowledge, researchers can draw misleading conclusions or miss important patterns and connections. It is also often underestimated that an advanced IT infrastructure is needed to carry out analyses according to the current state of the art.
#
Computational and Technical Challenges
Despite promising research findings, the translation of big data insights into clinical practice remains challenging. Factors such as a lack of interoperability between systems, limited clinician training in data-driven tools, and resistance to change hinder the implementation of big data in routine hepatology care. Analyzing large datasets requires significant computational power. Scalability, efficient data storage, and the development of algorithms capable of handling high-dimensional data are critical technical hurdles in hepatology research. Similar to automated driving, the crucial question of responsibility and liability in the context of a decision based on the integration of AI algorithms in big data analytics is a point of contention. The challenges in hepatology are therefore also to enable a safe and evidence-based implementation of AI to support clinical decision-making. Various AI approaches have been used in hepatology, with a focus on predicting events,[60] histological analyses,[61] and the prediction of treatment response.[62] The inability of AI algorithms to take into account information from the direct interaction between patient and doctor is still an inherent limitation. AI algorithms will therefore not yet be able to replace direct interaction between doctor and patient. We see AI as a complementary tool to significantly improve patient–doctor interaction and patient care in general.
#
#
Future Directions
Collaborative Networks and Open Data
The concept of Open Science plays a central role in transforming hepatology research.[63] By disclosing research data and results, scientists worldwide can access and use the same resources to create high-quality and representative datasets.[64] It promotes the accessibility, transparency, and shareability of knowledge and data, which often applies to the population-based datasets. Moreover, medical collaborative networks aimed at sharing data sources from various different countries may provide a streamlined process to collaborate on these datasets.[65] Peng et al, in a bibliometric and social network analysis, suggested that at present, small academic groups are more popular in big data research with a broad range that makes interaction between different groups sparse.[66] Examples of collaborative initiatives that have aimed to accelerate the sharing of genomic and clinical data for the study of various disease processes include the Global Alliance for Genomics and Health (GA4GH) which aims to distribute access to central databases and allow researchers to reproduce their work by running established methods over the same underlying data.[67] In the broader scheme, other additional open sources including the European Health Data Evidence Network, the Observational Health Data Sciences and Informatics (ODHSI), and All of Us have paved way for establishing international networks of researchers with central databases of observational nature and establishing more streamlined pipelines to analyze data. With the ODHSI data, researchers have been able to identify metabolic risk factors in the cardiometabolic spectrum that may be relevant to patients with fatty liver compared with alcohol-related fatty liver disease.[68] In the United States, the All of Us program maintains large-scale population information to investigate various liver phenotypes, including drug-induced liver injury linked to antibiotics.[69]
#
Hypotheses versus Big Data
The growing emphasis on data sharing and collaborative networks has not only expanded access to biomedical datasets but has also influenced how research is conducted. Traditionally, biomedical research has been dominated by hypothesis-driven approaches, where studies were designed to test predefined mechanisms based on prior knowledge. However, the advent of big data has shifted the paradigm toward data-driven discovery, allowing researchers to uncover previously unknown patterns and relationships in hepatology without a predefined hypothesis. This shift has sparked a debate: Should research prioritize hypothesis testing or allow data to generate new hypotheses?
Big data approaches now enable the development of complex computational models integrating multi-omics datasets.[70] These models facilitate network-based disease characterization, allowing for a systems biology approach to understanding liver diseases. Rather than relying solely on predefined mechanistic pathways, machine learning algorithms can reveal hidden biomarker signatures and molecular interactions that may not have been considered in traditional hypothesis-driven research. Despite the power of data-driven approaches, the debate remains unresolved—hypothesis-driven research ensures biological interpretability and mechanistic understanding, while data-driven methods offer unparalleled discovery potential. The future of hepatology will likely depend on integrating both approaches, where data-driven insights inform hypothesis generation, leading to targeted experimental validation and ultimately improved clinical translation.
#
Focus on Implementation Science
Regardless of the approach, the ultimate goal remains the same: improving patient care by translating research insights into clinical practice. This is where implementation science plays a critical role. Although big data and AI have transformed our ability to generate knowledge, their true impact depends on how effectively new findings, risk models, and therapeutic strategies are integrated into healthcare systems. Implementation science provides the framework to bridge this gap, ensuring that advances in hepatology research lead to tangible improvements. Implementation science focuses on promoting uptake of research into routine healthcare across clinical and organizational contexts. By improving the adoption and implementation of strategies derived from large cohorts, strategies must focus on identifying gaps in knowledge and practice to tailor interventions for a specific target audience. One example in hepatology is the Veterans Administration's National Hepatitis C Elimination Program, which successfully increased treatment rates by strengthening strategies to facilitate the transition to direct-acting antivirals from Year 1 to Year 2 while fostering stakeholder collaboration.[71(p2)] Another example is the EASL-Lancet Commission on Liver Disease in Europe, which has advocated for policy changes and public health interventions to address the growing burden of liver disease.[72] Through a combination of data-driven decision-making and implementation strategies, this initiative has helped shape guidelines for MASLD screening and prevention, emphasizing the role of healthcare infrastructure in bridging the gap between research and practice. But still implementation of results derived from large cohorts is not the standard. Implementation, therefore, serves as a crucial bridge, ensuring that advances in precision medicine, big data, and AI are integrated into real-world settings to improve patient outcomes.
#
#
Conclusion
Usage of large population-based cohorts will only increase in the next years. However, the true impact of these advancements depends on their successful implementation in clinical practice. Collaborative networks, open data initiatives, replication studies, and implementation science are essential to bridging the gap between large-scale discovery and real-world application. Moving forward, the synergy between data-driven research, clinical translation, and healthcare policy will be critical in transforming hepatology and improving patient outcomes on a global scale.
#
Main Concepts and Learning Points
Major concept |
Key learning points |
Role of large datasets |
---|---|---|
Big data and multi-omics in hepatology |
- Integration of genetic, multi-omic, and clinical data increases liver disease understanding. - Integration of structured and unstructured data still holds some challenges. |
- Enable identification of disease patterns, risk factors, and biomarkers at a population level. - Facilitate longitudinal studies to track disease evolution and treatment responses. |
Artificial intelligence in liver disease research |
- AI enables the identification of patient subgroups and personalized risk stratification. - Machine learning models help optimize treatment strategies and may uncover new therapeutic targets. |
- Provide high-dimensional data needed to train robust AI models. - Improve predictive accuracy by capturing complex interactions between genetic, environmental, and clinical factors. |
Translational research and implementation science |
- Bridging the gap between AI-driven discoveries and clinical applications is essential. - Structured frameworks are needed to ensure real-world adoption of precision medicine. |
- Large datasets support validation of AI models across diverse patient populations. - Help assess generalizability and effectiveness of predictive models in real-world settings. |
Collaborative networks and open data initiatives |
- Data sharing accelerates innovation and reproducibility. - Cross-disciplinary collaboration increases the development of effective precision medicine approaches. |
- Enable meta-analyses and validation studies across different cohorts. - Support the development of standardized protocols for data integration and analysis. |
#
#
Conflict of Interest
None declared.
* Joint authorship.
-
References
- 1 Rico-Uribe LA, Morillo-Cuadrado D, Rodríguez-Laso Á. et al. Worldwide mapping of initiatives that integrate population cohorts. Front Public Health 2022; 10: 964086
- 2 Kinkorová J, Topolčan O. Biobanks in the era of big data: objectives, challenges, perspectives, and innovations for predictive, preventive, and personalised medicine. EPMA J 2020; 11 (03) 333-341
- 3 Mahmud N, Goldberg DS, Bittermann T. Best practices in large database clinical epidemiology research in hepatology: barriers and opportunities. Liver Transpl 2022; 28 (01) 113-122
- 4 Cheung K-S, Leung WK, Seto W-K. Application of big data analysis in gastrointestinal research. World J Gastroenterol 2019; 25 (24) 2990-3008
- 5 Kong H-J. Managing unstructured big data in healthcare system. Healthc Inform Res 2019; 25 (01) 1-2
- 6 Schneider CV, Li T, Zhang D. et al. Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine 2023; 62: 102149
- 7 Huang T, Ma L, Zhang B, Liao H. Advances in deep learning: from diagnosis to treatment. Biosci Trends 2023; 17 (03) 190-192
- 8 Far AT, Bastani A, Lee A. et al. Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4. Hepatology 2024; . Epub ahead of print
- 9 Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: improving diagnostics, prognostics and response prediction. JHEP Rep Innov Hepatol 2022; 4 (04) 100443
- 10 Balsano C, Alisi A, Brunetto MR, Invernizzi P, Burra P, Piscaglia F. Special Interest Group (SIG) Artificial Intelligence and Liver Diseases; Italian Association for the Study of the Liver (AISF). The application of artificial intelligence in hepatology: a systematic review. Dig Liver Dis 2022; 54 (03) 299-308
- 11 Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA 2014; 311 (24) 2479-2480
- 12 Duan J, Xiong J, Li Y. et al. Deep learning based multimodal biomedical data fusion: an overview and comparative review. Inf Fusion 2024; 112: 102536
- 13 Brancato V, Esposito G, Coppola L. et al. Standardizing digital biobanks: integrating imaging, genomic, and clinical data for precision medicine. J Transl Med 2024; 22 (01) 136
- 14 Oikonomou EK, Thangaraj PM, Bhatt DL. et al. An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized clinical trials. NPJ Digit Med 2023; 6 (01) 217
- 15 Aithal GP, Guha N, Fallowfield J, Castera L, Jackson AP. Biomarkers in liver disease: emerging methods and potential applications. Int J Hepatol 2012; 2012: 437508
- 16 Pang Y, Kartsonaki C, Lv J. et al. Adiposity, metabolomic biomarkers, and risk of nonalcoholic fatty liver disease: a case-cohort study. Am J Clin Nutr 2022; 115 (03) 799-810
- 17 Schneider KM, Cao F, Huang HYR. et al. The lipidomic profile discriminates between MASLD and MetALD. Aliment Pharmacol Ther 2025; 61 (08) 1357-1371
- 18 Raverdy V, Tavaglione F, Chatelain E. et al. Data-driven cluster analysis identifies distinct types of metabolic dysfunction-associated steatotic liver disease. Nat Med 2024; 30 (12) 3624-3633
- 19 Liu Z, Yuan H, Suo C. et al. Point-based risk score for the risk stratification and prediction of hepatocellular carcinoma: a population-based random survival forest modeling study. EClinicalMedicine 2024; 75: 102796
- 20 Deng Y-T, You J, He Y. et al. Atlas of the plasma proteome in health and disease in 53,026 adults. Cell 2025; 188 (01) 253-271.e7
- 21 Kjaergaard M, Lindvig KP, Thorhauge KH. et al. Using the ELF test, FIB-4 and NAFLD fibrosis score to screen the population for liver disease. J Hepatol 2023; 79 (02) 277-286
- 22 Reverter E, Tandon P, Augustin S. et al. A MELD-based model to determine risk of mortality among patients with acute variceal bleeding. Gastroenterology 2014; 146 (02) 412-19.e3
- 23 Loaeza-del-Castillo A, Paz-Pineda F, Oviedo-Cárdenas E, Sánchez-Avila F, Vargas-Vorácková F. AST to platelet ratio index (APRI) for the noninvasive evaluation of liver fibrosis. Ann Hepatol 2008; 7 (04) 350-357
- 24 Serra-Burriel M, Juanola A, Serra-Burriel F. et al; LiverScreen Consortium Investigators. Development, validation, and prognostic evaluation of a risk score for long-term liver-related outcomes in the general population: a multicohort study. Lancet 2023; 402 (10406): 988-996
- 25 Åberg F, Luukkonen PK, But A. et al. Development and validation of a model to predict incident chronic liver disease in the general population: The CLivD score. J Hepatol 2022; 77 (02) 302-311
- 26 Njei B, Osta E, Njei N, Al-Ajlouni YA, Lim JK. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci Rep 2024; 14 (01) 8589
- 27 Schneider ARP, Schneider CV, Schneider KM. et al. Early prediction of decompensation (EPOD) score: non-invasive determination of cirrhosis decompensation risk. Liver Int 2022; 42 (03) 640-650
- 28 Liu Y, Zhang J, Wang W, Li G. Development and validation of a risk prediction model for incident liver cancer. Front Public Health 2022; 10: 955287
- 29 Pazoki R, Vujkovic M, Elliott J. et al; Lifelines Cohort Study, VA Million Veteran Program. Genetic analysis in European ancestry individuals identifies 517 loci associated with liver enzymes. Nat Commun 2021; 12 (01) 2579
- 30 Vujkovic M, Ramdas S, Lorenz KM. et al; Regeneron Genetics Center, Geisinger-Regeneron DiscovEHR Collaboration, EPoS Consortium, VA Million Veteran Program. A multiancestry genome-wide association study of unexplained chronic ALT elevation as a proxy for nonalcoholic fatty liver disease with histological and radiological validation. Nat Genet 2022; 54 (06) 761-771
- 31 Buch S, Stickel F, Trépo E. et al. A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis. Nat Genet 2015; 47 (12) 1443-1448
- 32 Schneider CV, Fromme M, Schneider KM, Bruns T, Strnad P. Mortality in patients with genetic and environmental risk of liver disease. Am J Gastroenterol 2021; 116 (08) 1741-1745
- 33 Hehl L, Creasy KT, Vitali C. et al; Regeneron Genetics Center. A genome-first approach to variants in MLXIPL and their association with hepatic steatosis and plasma lipids. Hepatol Commun 2024; 8 (05) e0427
- 34 Rendel MD, Vitali C, Creasy KT. et al; Regeneron Center. The common p.Ile291Val variant of ERLIN1 enhances TM6SF2 function and is associated with protection against MASLD. Med (N Y) 2024; 5 (08) 963-980.e5
- 35 Scorletti E, Saiman Y, Jeon S. et al. A missense variant in human perilipin 2 (PLIN2 Ser251Pro) reduces hepatic steatosis in mice. JHEP Rep Innov Hepatol 2023; 6 (01) 100902
- 36 Huang HYR, Vitali C, Zhang D. et al; Regeneron Centre. Deep metabolic phenotyping of humans with protein-altering variants in TM6SF2 using a genome-first approach. JHEP Rep Innov Hepatol 2024; 7 (01) 101243
- 37 Fabbrini E, Rady B, Koshkina A. et al. Phase 1 trials of PNPLA3 siRNA in I148M homozygous patients with MAFLD. N Engl J Med 2024; 391 (05) 475-476
- 38 Harrison SA, Bedossa P, Guy CD. et al; MAESTRO-NASH Investigators. A phase 3, randomized, controlled trial of resmetirom in NASH with liver fibrosis. N Engl J Med 2024; 390 (06) 497-509
- 39 Xu K, He B-W, Yu J-L. et al. Clinical significance of serum FGF21 levels in diagnosing nonalcoholic fatty liver disease early. Sci Rep 2024; 14 (01) 25191
- 40 Loomba R, Sanyal AJ, Kowdley KV. et al. Randomized, controlled trial of the FGF21 analogue pegozafermin in NASH. N Engl J Med 2023; 389 (11) 998-1008
- 41 Vell MS, Loomba R, Krishnan A. et al. Association of statin use with risk of liver disease, hepatocellular carcinoma, and liver-related mortality. JAMA Netw Open 2023; 6 (06) e2320222
- 42 Krishnan A, Schneider CV, Hadi Y, Mukherjee D, AlShehri B, Alqahtani SA. Cardiovascular and mortality outcomes with GLP-1 receptor agonists vs other glucose-lowering drugs in individuals with NAFLD and type 2 diabetes: a large population-based matched cohort study. Diabetologia 2024; 67 (03) 483-493
- 43 Yen F-S, Hou M-C, Wei JC-C, Shih YH, Hwu CM, Hsu CC. Effects of glucagon-like peptide-1 receptor agonists on liver-related and cardiovascular mortality in patients with type 2 diabetes. BMC Med 2024; 22 (01) 8
- 44 Kanwal F, Kramer JR, Li L. et al. GLP-1 receptor agonists and risk for cirrhosis and related complications in patients with metabolic dysfunction-associated steatotic liver disease. JAMA Intern Med 2024; 184 (11) 1314-1323
- 45 Akuta N, Kawamura Y, Fujiyama S. et al. Favorable impact of long-term SGLT2 inhibitor for NAFLD complicated by diabetes mellitus: a 5-year follow-up study. Hepatol Commun 2022; 6 (09) 2286-2297
- 46 Androutsakos T, Nasiri-Ansari N, Bakasis A-D. et al. SGLT-2 inhibitors in NAFLD: expanding their role beyond diabetes and cardioprotection. Int J Mol Sci 2022; 23 (06) 3107
- 47 Jojima T, Wakamatsu S, Kase M. et al. The SGLT2 inhibitor canagliflozin prevents carcinogenesis in a mouse model of diabetes and non-alcoholic steatohepatitis-related hepatocarcinogenesis: association with SGLT2 expression in hepatocellular carcinoma. Int J Mol Sci 2019; 20 (20) 5237
- 48 Vell MS, Creasy KT, Scorletti E. et al. Omega-3 intake is associated with liver disease protection. Front Public Health 2023; 11: 1192099
- 49 Vell MS, Krishnan A, Wangensteen K. et al. Aspirin is associated with a reduced incidence of liver disease in men. Hepatol Commun 2023; 7 (10) e0268
- 50 Žigutytė L, Sorz-Nechay T, Clusmann J, Kather JN. Use of artificial intelligence for liver diseases: a survey from the EASL congress 2024. JHEP Rep Innov Hepatol 2024; 6 (12) 101209
- 51 Schattenberg JM, Chalasani N, Alkhouri N. Artificial intelligence applications in hepatology. Clin Gastroenterol Hepatol 2023; 21 (08) 2015-2025
- 52 Kalapala R, Rughwani H, Reddy DN. Artificial intelligence in hepatology- ready for the primetime. J Clin Exp Hepatol 2023; 13 (01) 149-161
- 53 Feng S, Wang J, Wang L. et al. Current status and analysis of machine learning in hepatocellular carcinoma. J Clin Transl Hepatol 2023; 11 (05) 1184-1191
- 54 Yu PLH, Chiu KW-H, Lu J. et al. Application of a deep learning algorithm for the diagnosis of HCC. JHEP Rep 2024; 7 (01) 101219
- 55 Razmpour F, Daryabeygi-Khotbehsara R, Soleimani D. et al. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci Rep 2023; 13 (01) 4942
- 56 Rehman AU, Butt WH, Ali TM. et al. A machine learning-based framework for accurate and early diagnosis of liver diseases: a comprehensive study on feature selection, data imbalance, and algorithmic performance. Int J Intell Syst 2024; (01) 6111312
- 57 Wong GL-H, Hui VW-K, Tan Q. et al. Novel machine learning models outperform risk scores in predicting hepatocellular carcinoma in patients with chronic viral hepatitis. JHEP Rep Innov Hepatol 2022; 4 (03) 100441
- 58 Ghosh S, Zhao X, Alim M, Brudno M, Bhat M. Artificial intelligence applied to 'omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment. Gut 2025; 74 (02) 295-311
- 59 Hu H, Galea S, Rosella L, Henry D. Big data and population health: focusing on the health impacts of the social, physical, and economic environment. Epidemiology 2017; 28 (06) 759-762
- 60 Bosch J, Chung C, Carrasco-Zevallos OM. et al. A machine learning approach to liver histological evaluation predicts clinically significant portal hypertension in NASH cirrhosis. Hepatology 2021; 74 (06) 3146-3160
- 61 Forlano R, Mullish BH, Giannakeas N. et al. High-throughput, machine learning-based quantification of steatosis, inflammation, ballooning, and fibrosis in biopsies from patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol 2020; 18 (09) 2081-2090.e9
- 62 Saillard C, Schmauch B, Laifa O. et al. Predicting survival after hepatocellular carcinoma resection using deep learning on histological slides. Hepatology 2020; 72 (06) 2000-2013
- 63 Mandrekar P. Advancing hepatology research: excellence in open access. Hepatol Commun 2017; 1 (02) 83
- 64 Lohmöller J, Pennekamp J, Matzutt R. et al. The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems. Data Knowl Eng 2024; 151: 102301
- 65 Díaz-Faes AA, Llopis O, D'Este P, Molas-Gallart J. Assessing the variety of collaborative practices in translational research: an analysis of scientists' ego-networks. Res Eval 2023; 32 (02) 426-440
- 66 Peng Y, Shi J, Fantinato M, Chen J. A study on the author collaboration network in big data*. Inf Syst Front 2017; 19: 1329-1342
- 67 Rehm HL, Page AJH, Smith L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom 2021; 1 (02) 100029
- 68 Lim J, Sang H, Kim HI. Impact of metabolic risk factors on hepatic and cardiac outcomes in patients with alcohol- and non-alcohol-related fatty liver disease. JHEP Rep Innov Hepatol 2023; 5 (06) 100721
- 69 Gu S, Rajendiran G, Forest K. et al. Drug-induced liver injury with commonly used antibiotics in the all of us research program. Clin Pharmacol Ther 2023; 114 (02) 404-412
- 70 Khalifa A, Obeid JS, Erno J, Rockey DC. The role of artificial intelligence in hepatology research and practice. Curr Opin Gastroenterol 2023; 39 (03) 175-180
- 71 Rogal SS, Yakovchenko V, Waltz TJ. et al. Longitudinal assessment of the association between implementation strategy use and the uptake of hepatitis C treatment: year 2. Implement Sci 2019; 14 (01) 36
- 72 Karlsen TH, Sheron N, Zelber-Sagi S. et al. The EASL-Lancet Liver Commission: protecting the next generation of Europeans against liver disease complications and premature mortality. Lancet 2022; 399 (10319): 61-116
- 73 Holle R, Happich M, Löwel H, Wichmann HE. MONICA/KORA Study Group. KORA—a research platform for population based health research. Gesundheitswesen 2005; 67 (Suppl. 01) S19-S25
- 74 Winkelmann BR, März W, Boehm BO. et al; LURIC Study Group (LUdwigshafen RIsk and Cardiovascular Health). Rationale and design of the LURIC study—a resource for functional genomics, pharmacogenomics and long-term prognosis of cardiovascular disease. Pharmacogenomics 2001; 2 (1, Suppl 1): S1-S73
- 75 German National Cohort (GNC) Consortium. The German National Cohort: aims, study design and organization. Eur J Epidemiol 2014; 29 (05) 371-382
- 76 Bycroft C, Freeman C, Petkova D. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562 (7726) 203-209
- 77 Kurki MI, Karjalainen J, Palta P. et al; FinnGen. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613 (7944) 508-518
- 78 Sijtsma A, Rienks J, van der Harst P, Navis G, Rosmalen JGM, Dotinga A. Cohort Profile Update: lifelines, a three-generation cohort study and biobank. Int J Epidemiol 2022; 51 (05) e295-e302
- 79 Huppertz B, Bayer M, Macheiner T. et al. Biobank Graz: the hub for innovative biomedical research. Open J Bioresour 2016; 3: e3
- 80 Hakonarson H, Gulcher JR, Stefansson K. deCODE genetics, Inc. Pharmacogenomics 2003; 4 (02) 209-215
- 81 Cook MB, Sanderson SC, Deanfield JE. et al. Our future health: a unique global resource for discovery and translational research. Nat Med 2025; 31 (03) 728-730
- 82 Verma A, Damrauer SM, Naseer N. et al; For The Penn Medicine BioBank. The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J Pers Med 2022; 12 (12) 1974
- 83 Boutin NT, Schecter SB, Perez EF. et al. The evolution of a large biobank at Mass General Brigham. J Pers Med 2022; 12 (08) 1323
- 84 Olson JE, Ryu E, Johnson KJ. et al. The Mayo Clinic Biobank: a building block for individualized medicine. Mayo Clin Proc 2013; 88 (09) 952-962
- 85 Feigelson HS, Clarke CL, Van Den Eeden SK. et al. The Kaiser Permanente Research Bank Cancer Cohort: a collaborative resource to improve cancer care and survivorship. BMC Cancer 2022; 22 (01) 209
- 86 Denny JC, Rutter JL, Goldstein DB. et al; All of Us Research Program Investigators. The “All of Us” Research Program. N Engl J Med 2019; 381 (07) 668-676
- 87 Gaziano JM, Concato J, Brophy M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 2016; 70: 214-223
- 88 Patel CJ, Pho N, McDuffie M. et al. A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data 2016; 3: 160096
- 89 Carey DJ, Fetterolf SN, Davis FD. et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med 2016; 18 (09) 906-913
- 90 Ho V, Csizmadi I, Boucher BA. et al. Cohort profile: the CARTaGENE Cohort Nutrition Study (Quebec, Canada). BMJ Open 2024; 14 (08) e083425
- 91 Dummer TJB, Awadalla P, Boileau C. et al; with the CPTP Regional Cohort Consortium. The Canadian Partnership for Tomorrow Project: a pan-Canadian platform for research on chronic disease prevention. CMAJ 2018; 190 (23) E710-E717
- 92 Walters RG, Millwood IY, Lin K. et al; China Kadoorie Biobank Collaborative Group. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genom 2023; 3 (08) 100361
- 93 Nagai A, Hirata M, Kamatani Y. et al; BioBank Japan Cooperative Hospital Group. Overview of the BioBank Japan Project: study design and profile. J Epidemiol 2017; 27 (3S): S2-S8
Address for correspondence
Publication History
Article published online:
21 May 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA
-
References
- 1 Rico-Uribe LA, Morillo-Cuadrado D, Rodríguez-Laso Á. et al. Worldwide mapping of initiatives that integrate population cohorts. Front Public Health 2022; 10: 964086
- 2 Kinkorová J, Topolčan O. Biobanks in the era of big data: objectives, challenges, perspectives, and innovations for predictive, preventive, and personalised medicine. EPMA J 2020; 11 (03) 333-341
- 3 Mahmud N, Goldberg DS, Bittermann T. Best practices in large database clinical epidemiology research in hepatology: barriers and opportunities. Liver Transpl 2022; 28 (01) 113-122
- 4 Cheung K-S, Leung WK, Seto W-K. Application of big data analysis in gastrointestinal research. World J Gastroenterol 2019; 25 (24) 2990-3008
- 5 Kong H-J. Managing unstructured big data in healthcare system. Healthc Inform Res 2019; 25 (01) 1-2
- 6 Schneider CV, Li T, Zhang D. et al. Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine 2023; 62: 102149
- 7 Huang T, Ma L, Zhang B, Liao H. Advances in deep learning: from diagnosis to treatment. Biosci Trends 2023; 17 (03) 190-192
- 8 Far AT, Bastani A, Lee A. et al. Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4. Hepatology 2024; . Epub ahead of print
- 9 Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: improving diagnostics, prognostics and response prediction. JHEP Rep Innov Hepatol 2022; 4 (04) 100443
- 10 Balsano C, Alisi A, Brunetto MR, Invernizzi P, Burra P, Piscaglia F. Special Interest Group (SIG) Artificial Intelligence and Liver Diseases; Italian Association for the Study of the Liver (AISF). The application of artificial intelligence in hepatology: a systematic review. Dig Liver Dis 2022; 54 (03) 299-308
- 11 Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA 2014; 311 (24) 2479-2480
- 12 Duan J, Xiong J, Li Y. et al. Deep learning based multimodal biomedical data fusion: an overview and comparative review. Inf Fusion 2024; 112: 102536
- 13 Brancato V, Esposito G, Coppola L. et al. Standardizing digital biobanks: integrating imaging, genomic, and clinical data for precision medicine. J Transl Med 2024; 22 (01) 136
- 14 Oikonomou EK, Thangaraj PM, Bhatt DL. et al. An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized clinical trials. NPJ Digit Med 2023; 6 (01) 217
- 15 Aithal GP, Guha N, Fallowfield J, Castera L, Jackson AP. Biomarkers in liver disease: emerging methods and potential applications. Int J Hepatol 2012; 2012: 437508
- 16 Pang Y, Kartsonaki C, Lv J. et al. Adiposity, metabolomic biomarkers, and risk of nonalcoholic fatty liver disease: a case-cohort study. Am J Clin Nutr 2022; 115 (03) 799-810
- 17 Schneider KM, Cao F, Huang HYR. et al. The lipidomic profile discriminates between MASLD and MetALD. Aliment Pharmacol Ther 2025; 61 (08) 1357-1371
- 18 Raverdy V, Tavaglione F, Chatelain E. et al. Data-driven cluster analysis identifies distinct types of metabolic dysfunction-associated steatotic liver disease. Nat Med 2024; 30 (12) 3624-3633
- 19 Liu Z, Yuan H, Suo C. et al. Point-based risk score for the risk stratification and prediction of hepatocellular carcinoma: a population-based random survival forest modeling study. EClinicalMedicine 2024; 75: 102796
- 20 Deng Y-T, You J, He Y. et al. Atlas of the plasma proteome in health and disease in 53,026 adults. Cell 2025; 188 (01) 253-271.e7
- 21 Kjaergaard M, Lindvig KP, Thorhauge KH. et al. Using the ELF test, FIB-4 and NAFLD fibrosis score to screen the population for liver disease. J Hepatol 2023; 79 (02) 277-286
- 22 Reverter E, Tandon P, Augustin S. et al. A MELD-based model to determine risk of mortality among patients with acute variceal bleeding. Gastroenterology 2014; 146 (02) 412-19.e3
- 23 Loaeza-del-Castillo A, Paz-Pineda F, Oviedo-Cárdenas E, Sánchez-Avila F, Vargas-Vorácková F. AST to platelet ratio index (APRI) for the noninvasive evaluation of liver fibrosis. Ann Hepatol 2008; 7 (04) 350-357
- 24 Serra-Burriel M, Juanola A, Serra-Burriel F. et al; LiverScreen Consortium Investigators. Development, validation, and prognostic evaluation of a risk score for long-term liver-related outcomes in the general population: a multicohort study. Lancet 2023; 402 (10406): 988-996
- 25 Åberg F, Luukkonen PK, But A. et al. Development and validation of a model to predict incident chronic liver disease in the general population: The CLivD score. J Hepatol 2022; 77 (02) 302-311
- 26 Njei B, Osta E, Njei N, Al-Ajlouni YA, Lim JK. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci Rep 2024; 14 (01) 8589
- 27 Schneider ARP, Schneider CV, Schneider KM. et al. Early prediction of decompensation (EPOD) score: non-invasive determination of cirrhosis decompensation risk. Liver Int 2022; 42 (03) 640-650
- 28 Liu Y, Zhang J, Wang W, Li G. Development and validation of a risk prediction model for incident liver cancer. Front Public Health 2022; 10: 955287
- 29 Pazoki R, Vujkovic M, Elliott J. et al; Lifelines Cohort Study, VA Million Veteran Program. Genetic analysis in European ancestry individuals identifies 517 loci associated with liver enzymes. Nat Commun 2021; 12 (01) 2579
- 30 Vujkovic M, Ramdas S, Lorenz KM. et al; Regeneron Genetics Center, Geisinger-Regeneron DiscovEHR Collaboration, EPoS Consortium, VA Million Veteran Program. A multiancestry genome-wide association study of unexplained chronic ALT elevation as a proxy for nonalcoholic fatty liver disease with histological and radiological validation. Nat Genet 2022; 54 (06) 761-771
- 31 Buch S, Stickel F, Trépo E. et al. A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis. Nat Genet 2015; 47 (12) 1443-1448
- 32 Schneider CV, Fromme M, Schneider KM, Bruns T, Strnad P. Mortality in patients with genetic and environmental risk of liver disease. Am J Gastroenterol 2021; 116 (08) 1741-1745
- 33 Hehl L, Creasy KT, Vitali C. et al; Regeneron Genetics Center. A genome-first approach to variants in MLXIPL and their association with hepatic steatosis and plasma lipids. Hepatol Commun 2024; 8 (05) e0427
- 34 Rendel MD, Vitali C, Creasy KT. et al; Regeneron Center. The common p.Ile291Val variant of ERLIN1 enhances TM6SF2 function and is associated with protection against MASLD. Med (N Y) 2024; 5 (08) 963-980.e5
- 35 Scorletti E, Saiman Y, Jeon S. et al. A missense variant in human perilipin 2 (PLIN2 Ser251Pro) reduces hepatic steatosis in mice. JHEP Rep Innov Hepatol 2023; 6 (01) 100902
- 36 Huang HYR, Vitali C, Zhang D. et al; Regeneron Centre. Deep metabolic phenotyping of humans with protein-altering variants in TM6SF2 using a genome-first approach. JHEP Rep Innov Hepatol 2024; 7 (01) 101243
- 37 Fabbrini E, Rady B, Koshkina A. et al. Phase 1 trials of PNPLA3 siRNA in I148M homozygous patients with MAFLD. N Engl J Med 2024; 391 (05) 475-476
- 38 Harrison SA, Bedossa P, Guy CD. et al; MAESTRO-NASH Investigators. A phase 3, randomized, controlled trial of resmetirom in NASH with liver fibrosis. N Engl J Med 2024; 390 (06) 497-509
- 39 Xu K, He B-W, Yu J-L. et al. Clinical significance of serum FGF21 levels in diagnosing nonalcoholic fatty liver disease early. Sci Rep 2024; 14 (01) 25191
- 40 Loomba R, Sanyal AJ, Kowdley KV. et al. Randomized, controlled trial of the FGF21 analogue pegozafermin in NASH. N Engl J Med 2023; 389 (11) 998-1008
- 41 Vell MS, Loomba R, Krishnan A. et al. Association of statin use with risk of liver disease, hepatocellular carcinoma, and liver-related mortality. JAMA Netw Open 2023; 6 (06) e2320222
- 42 Krishnan A, Schneider CV, Hadi Y, Mukherjee D, AlShehri B, Alqahtani SA. Cardiovascular and mortality outcomes with GLP-1 receptor agonists vs other glucose-lowering drugs in individuals with NAFLD and type 2 diabetes: a large population-based matched cohort study. Diabetologia 2024; 67 (03) 483-493
- 43 Yen F-S, Hou M-C, Wei JC-C, Shih YH, Hwu CM, Hsu CC. Effects of glucagon-like peptide-1 receptor agonists on liver-related and cardiovascular mortality in patients with type 2 diabetes. BMC Med 2024; 22 (01) 8
- 44 Kanwal F, Kramer JR, Li L. et al. GLP-1 receptor agonists and risk for cirrhosis and related complications in patients with metabolic dysfunction-associated steatotic liver disease. JAMA Intern Med 2024; 184 (11) 1314-1323
- 45 Akuta N, Kawamura Y, Fujiyama S. et al. Favorable impact of long-term SGLT2 inhibitor for NAFLD complicated by diabetes mellitus: a 5-year follow-up study. Hepatol Commun 2022; 6 (09) 2286-2297
- 46 Androutsakos T, Nasiri-Ansari N, Bakasis A-D. et al. SGLT-2 inhibitors in NAFLD: expanding their role beyond diabetes and cardioprotection. Int J Mol Sci 2022; 23 (06) 3107
- 47 Jojima T, Wakamatsu S, Kase M. et al. The SGLT2 inhibitor canagliflozin prevents carcinogenesis in a mouse model of diabetes and non-alcoholic steatohepatitis-related hepatocarcinogenesis: association with SGLT2 expression in hepatocellular carcinoma. Int J Mol Sci 2019; 20 (20) 5237
- 48 Vell MS, Creasy KT, Scorletti E. et al. Omega-3 intake is associated with liver disease protection. Front Public Health 2023; 11: 1192099
- 49 Vell MS, Krishnan A, Wangensteen K. et al. Aspirin is associated with a reduced incidence of liver disease in men. Hepatol Commun 2023; 7 (10) e0268
- 50 Žigutytė L, Sorz-Nechay T, Clusmann J, Kather JN. Use of artificial intelligence for liver diseases: a survey from the EASL congress 2024. JHEP Rep Innov Hepatol 2024; 6 (12) 101209
- 51 Schattenberg JM, Chalasani N, Alkhouri N. Artificial intelligence applications in hepatology. Clin Gastroenterol Hepatol 2023; 21 (08) 2015-2025
- 52 Kalapala R, Rughwani H, Reddy DN. Artificial intelligence in hepatology- ready for the primetime. J Clin Exp Hepatol 2023; 13 (01) 149-161
- 53 Feng S, Wang J, Wang L. et al. Current status and analysis of machine learning in hepatocellular carcinoma. J Clin Transl Hepatol 2023; 11 (05) 1184-1191
- 54 Yu PLH, Chiu KW-H, Lu J. et al. Application of a deep learning algorithm for the diagnosis of HCC. JHEP Rep 2024; 7 (01) 101219
- 55 Razmpour F, Daryabeygi-Khotbehsara R, Soleimani D. et al. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci Rep 2023; 13 (01) 4942
- 56 Rehman AU, Butt WH, Ali TM. et al. A machine learning-based framework for accurate and early diagnosis of liver diseases: a comprehensive study on feature selection, data imbalance, and algorithmic performance. Int J Intell Syst 2024; (01) 6111312
- 57 Wong GL-H, Hui VW-K, Tan Q. et al. Novel machine learning models outperform risk scores in predicting hepatocellular carcinoma in patients with chronic viral hepatitis. JHEP Rep Innov Hepatol 2022; 4 (03) 100441
- 58 Ghosh S, Zhao X, Alim M, Brudno M, Bhat M. Artificial intelligence applied to 'omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment. Gut 2025; 74 (02) 295-311
- 59 Hu H, Galea S, Rosella L, Henry D. Big data and population health: focusing on the health impacts of the social, physical, and economic environment. Epidemiology 2017; 28 (06) 759-762
- 60 Bosch J, Chung C, Carrasco-Zevallos OM. et al. A machine learning approach to liver histological evaluation predicts clinically significant portal hypertension in NASH cirrhosis. Hepatology 2021; 74 (06) 3146-3160
- 61 Forlano R, Mullish BH, Giannakeas N. et al. High-throughput, machine learning-based quantification of steatosis, inflammation, ballooning, and fibrosis in biopsies from patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol 2020; 18 (09) 2081-2090.e9
- 62 Saillard C, Schmauch B, Laifa O. et al. Predicting survival after hepatocellular carcinoma resection using deep learning on histological slides. Hepatology 2020; 72 (06) 2000-2013
- 63 Mandrekar P. Advancing hepatology research: excellence in open access. Hepatol Commun 2017; 1 (02) 83
- 64 Lohmöller J, Pennekamp J, Matzutt R. et al. The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems. Data Knowl Eng 2024; 151: 102301
- 65 Díaz-Faes AA, Llopis O, D'Este P, Molas-Gallart J. Assessing the variety of collaborative practices in translational research: an analysis of scientists' ego-networks. Res Eval 2023; 32 (02) 426-440
- 66 Peng Y, Shi J, Fantinato M, Chen J. A study on the author collaboration network in big data*. Inf Syst Front 2017; 19: 1329-1342
- 67 Rehm HL, Page AJH, Smith L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom 2021; 1 (02) 100029
- 68 Lim J, Sang H, Kim HI. Impact of metabolic risk factors on hepatic and cardiac outcomes in patients with alcohol- and non-alcohol-related fatty liver disease. JHEP Rep Innov Hepatol 2023; 5 (06) 100721
- 69 Gu S, Rajendiran G, Forest K. et al. Drug-induced liver injury with commonly used antibiotics in the all of us research program. Clin Pharmacol Ther 2023; 114 (02) 404-412
- 70 Khalifa A, Obeid JS, Erno J, Rockey DC. The role of artificial intelligence in hepatology research and practice. Curr Opin Gastroenterol 2023; 39 (03) 175-180
- 71 Rogal SS, Yakovchenko V, Waltz TJ. et al. Longitudinal assessment of the association between implementation strategy use and the uptake of hepatitis C treatment: year 2. Implement Sci 2019; 14 (01) 36
- 72 Karlsen TH, Sheron N, Zelber-Sagi S. et al. The EASL-Lancet Liver Commission: protecting the next generation of Europeans against liver disease complications and premature mortality. Lancet 2022; 399 (10319): 61-116
- 73 Holle R, Happich M, Löwel H, Wichmann HE. MONICA/KORA Study Group. KORA—a research platform for population based health research. Gesundheitswesen 2005; 67 (Suppl. 01) S19-S25
- 74 Winkelmann BR, März W, Boehm BO. et al; LURIC Study Group (LUdwigshafen RIsk and Cardiovascular Health). Rationale and design of the LURIC study—a resource for functional genomics, pharmacogenomics and long-term prognosis of cardiovascular disease. Pharmacogenomics 2001; 2 (1, Suppl 1): S1-S73
- 75 German National Cohort (GNC) Consortium. The German National Cohort: aims, study design and organization. Eur J Epidemiol 2014; 29 (05) 371-382
- 76 Bycroft C, Freeman C, Petkova D. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562 (7726) 203-209
- 77 Kurki MI, Karjalainen J, Palta P. et al; FinnGen. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613 (7944) 508-518
- 78 Sijtsma A, Rienks J, van der Harst P, Navis G, Rosmalen JGM, Dotinga A. Cohort Profile Update: lifelines, a three-generation cohort study and biobank. Int J Epidemiol 2022; 51 (05) e295-e302
- 79 Huppertz B, Bayer M, Macheiner T. et al. Biobank Graz: the hub for innovative biomedical research. Open J Bioresour 2016; 3: e3
- 80 Hakonarson H, Gulcher JR, Stefansson K. deCODE genetics, Inc. Pharmacogenomics 2003; 4 (02) 209-215
- 81 Cook MB, Sanderson SC, Deanfield JE. et al. Our future health: a unique global resource for discovery and translational research. Nat Med 2025; 31 (03) 728-730
- 82 Verma A, Damrauer SM, Naseer N. et al; For The Penn Medicine BioBank. The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J Pers Med 2022; 12 (12) 1974
- 83 Boutin NT, Schecter SB, Perez EF. et al. The evolution of a large biobank at Mass General Brigham. J Pers Med 2022; 12 (08) 1323
- 84 Olson JE, Ryu E, Johnson KJ. et al. The Mayo Clinic Biobank: a building block for individualized medicine. Mayo Clin Proc 2013; 88 (09) 952-962
- 85 Feigelson HS, Clarke CL, Van Den Eeden SK. et al. The Kaiser Permanente Research Bank Cancer Cohort: a collaborative resource to improve cancer care and survivorship. BMC Cancer 2022; 22 (01) 209
- 86 Denny JC, Rutter JL, Goldstein DB. et al; All of Us Research Program Investigators. The “All of Us” Research Program. N Engl J Med 2019; 381 (07) 668-676
- 87 Gaziano JM, Concato J, Brophy M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 2016; 70: 214-223
- 88 Patel CJ, Pho N, McDuffie M. et al. A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data 2016; 3: 160096
- 89 Carey DJ, Fetterolf SN, Davis FD. et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med 2016; 18 (09) 906-913
- 90 Ho V, Csizmadi I, Boucher BA. et al. Cohort profile: the CARTaGENE Cohort Nutrition Study (Quebec, Canada). BMJ Open 2024; 14 (08) e083425
- 91 Dummer TJB, Awadalla P, Boileau C. et al; with the CPTP Regional Cohort Consortium. The Canadian Partnership for Tomorrow Project: a pan-Canadian platform for research on chronic disease prevention. CMAJ 2018; 190 (23) E710-E717
- 92 Walters RG, Millwood IY, Lin K. et al; China Kadoorie Biobank Collaborative Group. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genom 2023; 3 (08) 100361
- 93 Nagai A, Hirata M, Kamatani Y. et al; BioBank Japan Cooperative Hospital Group. Overview of the BioBank Japan Project: study design and profile. J Epidemiol 2017; 27 (3S): S2-S8





