Keywords genetics - PNPLA3 - TM6SF2 - STAT3 - GCKR - HSD17B13 - GWAS - NASH - systems biology
- druggability
Nonalcoholic fatty liver disease (NAFLD) is a condition manifested by an abnormal
accumulation of fat in the liver, which can present signs of hepatocyte injury and
chronic damage, such as those that characterize nonalcoholic steatohepatitis (NASH).[1 ] The disease can progress into severe clinical forms, including NASH-fibrosis, cirrhosis,
and even hepatocellular carcinoma (HCC).[1 ]
Epidemiological observations derived from population-based studies,[2 ]
[3 ] familial aggregation studies,[4 ]
[5 ] and twin studies[6 ]
[7 ] have long provided evidence that NAFLD is at some level a heritable trait. NAFLD
has been observed in a cluster of families, whereby Schwimmer et al found that fatty
liver is significantly more common in siblings (59%) and parents (78%) of children
with NAFLD.[4 ] The heritability estimates of NAFLD range from 20 to 70%, depending on the study
design and diagnostic approaches used in determining the liver phenotype.[2 ]
[3 ]
[4 ]
[5 ]
[6 ]
Variants of over 100 loci have been explored in candidate–gene association studies
(see [Table 1 ]). Findings yielded by these studies have generated plausible evidence indicating
that several loci are involved in the genetic susceptibility of NAFLD, including nuclear
receptors, transcription factors that regulate lipid- and carbohydrate-related biosynthetic
processes, inflammatory response, and fibrogenesis.[8 ]
[9 ]
[10 ] Nevertheless, authors of a large majority of candidate–gene studies on NAFLD and
NASH have failed to convincingly demonstrate a robust causal relationship between
the associated variant and the disease. This could be explained by the limited number
of functional mechanistic studies designed to test the hypotheses driving the investigations,
or simply by the lack of statistical power.
Table 1
Training gene list based on published evidence of the genetic component of NAFLD and
NASH
Gene symbol (gene description)
RPL13AP7 (ribosomal protein L13a pseudogene 7)
ABCB11 (ATP binding cassette subfamily B member 11)
ACSL4 (acyl-CoA synthetase long chain family member 4)
ACTR5 (ARP5 actin related protein 5 homolog)
ADIPOQ (adiponectin, C1Q and collagen domain containing)
ADIPOR1 (adiponectin receptor 1)
ADIPOR2 (adiponectin receptor 2)
ADRB2 (adrenoceptor β 2)
ADRB3 (adrenoceptor β 3)
AGTR1 (angiotensin II receptor type 1)
APOC3 (apolipoprotein C3)
APOE (apolipoprotein E)
ARHGEF40 (Rho guanine nucleotide exchange factor 40)
C1orf94 (chromosome 1 open reading frame 94)
CACNA2D1 (calcium voltage-gated channel auxiliary subunit α2delta 1)
CD14 (CD14 molecule)
CDH2 (cadherin 2)
CFTR (cystic fibrosis transmembrane conductance regulator)
CLOCK (clock circadian regulator)
CNTN5 (contactin 5)
COL13A1 (collagen type XIII α 1 chain)
CRACR2A (calcium release activated channel regulator 2A)
CYP2E1 (cytochrome P450 family 2 subfamily E member 1)
DCLK1 (doublecortin like kinase 1)
DGAT1 (diacylglycerol O-acyltransferase 1)
DGAT2 (diacylglycerol O-acyltransferase 2)
DYSF (dysferlin)
EHBP1L1 (EH domain binding protein 1 like 1)
ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1)
ETS1 (ETS proto-oncogene 1, transcription factor)
FABP2 (fatty acid binding protein 2)
FARP1 (FERM, ARH/RhoGEF and pleckstrin domain protein 1)
FDFT1 (farnesyl-diphosphate farnesyltransferase 1)
GATAD2A (GATA zinc finger domain containing 2A)
GC (GC, vitamin D binding protein)
GCKR (glucokinase regulator)
GCLC (glutamate-cysteine ligase catalytic subunit)
HFE (homeostatic iron regulator)
HS3ST1 (heparan sulfate-glucosamine 3-sulfotransferase 1)
HSD17B13 (hydroxysteroid 17-β dehydrogenase 13)
IL18RAP (interleukin 18 receptor accessory protein)
IL1B (interleukin 1 β)
IL6 (interleukin 6)
IRS1 (insulin receptor substrate 1)
KHDRBS3 (KH RNA binding domain containing, signal transduction associated 3)
KLF6 (Kruppel-like factor 6)
LCP1 (lymphocyte cytosolic protein 1)
LEPR (leptin receptor)
LINC00322 (long intergenic nonprotein coding RNA 322)
LIPC (lipase C, hepatic type)
PRG1 (p53-responsive gene 1)
LTBP3 (latent transforming growth factor β binding protein 3)
LYPLAL1 (lysophospholipase like 1)
MACROD2 (MACRO domain containing 2)
MBOAT7 (membrane bound O-acyltransferase domain containing 7)
MC4R (melanocortin 4 receptor)
MIF (macrophage migration inhibitory factor)
MTCYBP22 (mitochondrially encoded cytochrome b pseudogene 22)
MTHFR (methylenetetrahydrofolate reductase)
MTTP (microsomal triglyceride transfer protein)
MUM1 (melanoma associated antigen (mutated) 1)
NCAN (neurocan)
NFIC (nuclear factor I C)
NGF (nerve growth factor)
NR1I2 (nuclear receptor subfamily 1 group I member 2)
OTX2P1 (orthodenticle homeobox 2 pseudogene 1)
PALLD (palladin, cytoskeletal associated protein)
PARVB (parvin β)
PBX2P1 (PBX homeobox 2 pseudogene 1)
PDGFA (platelet derived growth factor subunit A)
PEMT (phosphatidylethanolamine N-methyltransferase)
PNPLA3 (patatin like phospholipase domain containing 3)
PPARA (peroxisome proliferator-activated receptor α)
PPARG (peroxisome proliferator-activated receptor gamma)
PPARGC1A (PPARG coactivator 1 α, PGC-1a)
PPP1R3B (protein phosphatase 1 regulatory subunit 3B)
PTGS2 (prostaglandin-endoperoxide synthase 2)
PTPRU (protein tyrosine phosphatase, receptor type U)
PZP (PZP, α-2-macroglobulin like)
RAB37 (RAB37, member RAS oncogene family)
SAMM50 (SAMM50 sorting and assembly machinery component)
SDK1 (sidekick cell adhesion molecule 1)
SEL1L3 (SEL1L family member 3)
SERPINA1 (serpin family A member 1)
SLC38A8 (solute carrier family 38 member 8)
SLC46A3 (solute carrier family 46 member 3)
SLC9A9 (solute carrier family 9 member A9)
SOD2 (superoxide dismutase 2)
SPINK1 (serine peptidase inhibitor, Kazal type 1)
ST8SIA1 (ST8 α-N-acetyl-neuraminide α-2,8-sialyltransferase 1)
STAT3 (signal transducer and activator of transcription 3)
TCF7L2 (transcription factor 7 like 2)
TEX36 (testis expressed 36)
TLR4 (toll like receptor 4)
TM6SF2 (transmembrane 6 superfamily member 2)
TMEM56 (transmembrane protein 56)
TNF (tumor necrosis factor)
TNFSF10 (TNF superfamily member 10)
TRAPPC9 (trafficking protein particle complex 9)
UCP1 (uncoupling protein 1)
UGT1A1 (UDP glucuronosyltransferase family 1 member A1)
YIPF1 (Yip1 domain family member 1)
ZNF512 (zinc finger protein 512)
ZP4 (zona pellucida glycoprotein 4)
Abbreviations: NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic steatohepatitis.
Conversely, discoveries of variants of three genes (PNPLA3- rs738409, TM6SF2- rs58542916, and glucokinase regulator gene [GCKR ]- rs780094 or GCKR -rs1260326) that regulate metabolic traits have been driven by genome-wide approaches,[2 ]
[3 ]
[11 ]
[12 ] including genome-wide association (GWAS) and exome-wide association (EWAS) studies.
The association between these gene variants with the risk of NAFLD in cohorts of diverse
ethnical backgrounds around the world was demonstrated by several authors.[13 ]
[14 ]
[15 ] In addition, variants in these genes have been associated with the risk of NASH
and histological features of the disease severity, including liver fibrosis.[13 ]
[16 ]
[17 ]
[18 ]
Variants in additional loci, including a missense (p.Gly17Glu, rs641738 C/T) variant
in exon 1 of transmembrane channel-like 4 (TMC4 )/intergenic-downstream of membrane-bound O-acyltransferase domain-containing 7 (MBOAT7 ), have been associated with a modest risk of NAFLD and NASH in Italian population.[19 ] However, this association could not be replicated in populations of other ethnicities.[20 ]
[21 ]
[22 ]
More recently, a study that involved the analysis of exome-sequence data coupled to
electronic health records of 46,455 patients taking part in a large collaborative
study revealed a loss-of-function variation in hydroxysteroid 17-β dehydrogenase 13
(HSD17B13 ) gene that confers protection against chronic liver injury and mitigates progressive
NASH among European Americans.[23 ]
From the histopathologic point of view, NAFLD refers to potentially progressive lesions
ranging from isolated steatosis (NAFL) to NASH[1 ] as explained above. It is then reasonable to hypothesize that NAFL, NASH, and NASH-fibrosis
share genetic modifiers. In fact, variants in locus influencing the risk of NAFLD,
including PNPLA3 , TM6SF2 , and GCKR , contribute to the risk of NASH as well. For example, the rs738409 presents a significant
effect not only on liver fat accumulation (GG homozygous carriers show 73% higher
lipid fat content when compared with CC homozygous) but also on the susceptibility
of a more aggressive disease (GG homozygous carriers have 3.24-fold greater risk of
higher inflammatory scores and 3.2-fold greater risk of developing fibrosis when compared
with CC homozygous).[13 ]
While single nucleotide polymorphisms (SNPs) currently known as involved in the genetic
risk of NAFLD cannot distinguish between isolated steatosis and NASH, some of these
variants can inform the chances of presenting a more advanced disease. For example,
NASH is 3.5-fold more frequently observed in GG homozygous than in CC homozygous carriers.[13 ]
The questions arise as to why much of the genetic variants do not allow us to differentiate
NAFL from progressive NASH. Many explanations for this question can be suggested,
which vary from the assumption that there is no NASH without NAFL, and consequently
variants influencing the risk of NAFL directly or indirectly affect the predisposition
to NASH to a more pragmatic point of view that questions the designs of genetic studies,
including imprecise phenotyping and the use of controls of uncertain comparability.
Certainly, the evidence suggests that some factors either genetic or environmental
should affect the progression of NAFL to NASH if they are stages of a single disease.
These factors are likely related with the inflammatory response and fibrogenesis process.
The Missing Heritability of NAFLD and NASH
The Missing Heritability of NAFLD and NASH
According to the available evidence, the effect of variants uncovered from GWAS or
EWAS[2 ]
[3 ]
[11 ]
[12 ]
[23 ] explains a small portion of the disease variance. In fact, variants in the loci
mentioned above explain up to approximately 10% of the variance NAFLD-related phenotypes.
Hence, it is clear that knowledge of the phenotypic variance of NAFLD and NASH, which
stems from the interaction between the genetic component and environmental sources,
is still lacking, resulting in what is known as missing heritability.
The missing heritability of NAFLD and NASH, like many other common diseases, includes
a complex spectrum of factors that remain poorly explored; some of them are illustrated
in [Fig. 1 ]. Mapping the genetic component of NAFLD and NASH should include not only the search
for rare variants, which probably would have substantial effect/s on the phenotype
but the exploration of structural variation, for example, copy number variants. Given
the role of mitochondria on the physiology of the disease, it is also worthwhile to
characterize the genetic diversity of the mitochondrial deoxyribonucleic acid (mtDNA).
Fig. 1 Missing heritability of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic
steatohepatitis (NASH). There is considerable disparity in the magnitude of heritability
estimates of NAFLD and NASH and the proportion of variance explained by single nucleotide
polymorphisms (SNPs) uncovered from genome-wide association study (GWAS), exome-wide
association study (EWAS), and candidate–gene association studies. A significant proportion
of the disease burden could be explained by the missing heritability, which cover
not only genetic and epigenetic modifiers but also the interaction with environmental
exposure as well as with a highly interconnected and dynamic network of factors, including
the microbiome.[115 ]
[116 ] The genetic component of NAFLD and NASH may be potentially explained by undiscovered
rare variants, structural variation, including copy number variation, variants in
micro-ribonucleic acids (miRNAs) and long noncoding RNAs (lncRNAs), and expression
of quantitative trait loci (eQTLs). The allelic architecture of the human genome that
substantially varies according to the different ethnic groups plays an important role.
Variation across populations might explain differences in the prevalence and severity
of the disease across different ethnic groups. Epigenetic factors include not only
deoxyribonucleic acid (DNA) and histone methylation but also chromatic remodeling
and nonprotein coding RNAs. Epigenetic inheritance also involves modifications of
the histone code, including histone acetyltransferases (HAT ) and deacetylases (HDAC ). Abbreviations: circRNA, circular RNA; linRNAs, long intergenic RNA; NAT, natural
antisense transcript; piRNA, PIWI-interacting RNA; snRNA, small nuclear RNA; snoRNA,
small nucleolar RNA.
Novel evidence on the variability of the mtDNA genome suggests a significant role
of mitochondrial genetics in the pathogenesis of NAFLD and the natural history of
the disease.[24 ] A comprehensive exploration of the complete liver mtDNA-mutation spectrum in patients
with NAFLD and in different stages of the disease showed that NAFLD is associated
with increased liver mtDNA mutational burden, including point mutations in genes of
the oxidative phosphorylation.[24 ] In addition, patients with advanced fibrosis had an overall enrichment of 1.4-fold
mutation rate compared with those in whom fibrosis was mild or absent.[24 ] The accumulation of liver mtDNA-polymorphic sites in subunits of the OXPHOS was
paralleled by the emergence of an OXPHOS-deficient phenotype.[24 ] Specifically, profiling of liver OXPHOS-gene and protein expression provided evidence
of liver mtDNA mutational rate impacts on mitochondrial function.[24 ] These observations are supported by several studies that highlighted the importance
of mitochondrial homeostasis in the pathogenesis of NAFLD and the disease progression
into NASH and NASH-fibrosis.[25 ]
[26 ]
[27 ]
It is expected that variants in micro-ribonucleic acids (miRNAs), which may affect
miRNA function, would account for a sizeable proportion of the disease risk and/or
the association of NAFLD with comorbidities. For instance, rs41318021, a miR-122-related
sequence genetic variation in the 3′ untranslated region of the l -arginine transporter gene (SLC7A1 ) was associated with arterial hypertension in patients with NAFLD.[28 ] Across the disease spectrum, variants in long noncoding RNAs (lncRNAs) have been
shown to cover a portion of genetic component as well.[29 ] A survey of genetic variation associated with lncRNA-genomic regions uncovered the
rs2829145 A/G located in a lncRNA (lnc-JAM2–6), which was associated with NAFLD and
the disease severity. Moreover, prediction of regulatory elements in lnc-JAM2–6 indicated
potential sequence-specific binding motifs of oncogenes MAF bZIP transcription factor
K (MAFK ) and JunD proto-oncogene, AP-1 transcription factor subunit (JUND ), as well as transcription factors involved in inflammatory response.[29 ] Results from a pilot GWAS on NAFLD showed that intergenic or intron variants with
predicted functionality in lncRNAs might be associated with steatosis, lobular inflammation,
and liver fibrosis.[30 ] SNPs in LYPLAL1 (rs12137855), PPP1R3B (rs4240624), and TRIBI (rs2954021), of which the predicted functionality in the corresponding loci is a
long lincRNA, were associated with liver fat content.[3 ]
The concept of heritability is derived from a mathematical calculation involving three
phenotype variance sources: genetics (G), environment (E), and individual (noise).
Hence, a single number (G) represents the fraction of variation between individuals
in a population that is due to their genetic background. Unfortunately, knowledge
of the G × E interaction in the biology of NAFLD and NASH remains largely unexplored
([Fig. 1 ]). There is, however, remarkable evidence showing an interaction effect of variants
in genes predisposing an individual to NAFLD, specifically variants in PNPLA3 , TM6SF2 , and GCKR and adiposity.[31 ] The greatest effects were observed for the interaction between rs738409-G risk allele
and obesity, which was found to affect the entire spectrum of NAFLD, from steatosis,
to steatohepatitis, to end-stage liver disease, as well as liver enzyme levels.[31 ] For example, in persons homozygous for the G-allele compared with homozygous CC,
the risk of progression to cirrhosis varies from 2.4-fold among lean (body mass index
[BMI] < 25 kg/m2 ) to 5.8-fold among obese (BMI > 35 kg/m2 ) subjects.[31 ] While interaction effects between obesity and variants of TM6SF2 (rs58542926) and GCKR (rs1260326) were also observed, the magnitude of the effect and the impact on the
disease spectrum are not only much lesser, but are rather confined to the amount of
liver fat deposition.[31 ] These results suggest that an excessive caloric intake is more determinant in individuals
at genetic risk, as expected.
It remains uncertain whether the gene–environment interactions mentioned above are
limited to European ancestry populations. Hence, a note of caution must be added as
the allelic architecture of the human genome substantially varies according to the
different ethnic groups. A detailed explanation of population genetics of variants
in locus of interest can be found in the International Genome Sample Resource (http://www.internationalgenome.org/ ), which includes data generated by the 1000 Genomes Project (Africa, American, East
Asian, European, and South Asian population). An important consideration is that the
large majority of genetic studies of NAFLD and NASH remain mostly limited to Caucasian
population. Further studies in non-Europeans, which are clearly underrepresented in
the big studies currently available, might yield intriguing results, in particular
regarding extreme genotypes or rapidly progressive forms of the disease.[32 ]
[33 ]
Although the concept is beyond this review, it is worthy to note that a noticeable
portion of the missing heritability of NAFLD may be explained by epigenetic factors,[25 ]
[34 ]
[35 ]
[36 ]
[37 ]
[38 ] which may change gene expression by modifying accessibility of transcription machinery
to chromatin. Among the most prominent epigenetic factors are DNA methylation and
histone covalent modifications. We found that DNA methylation not only of the nuclear
but mitochondrial genome loci is associated with NAFLD pathophenotypes.[25 ]
[37 ]
[39 ]
Most importantly, knowledge is lacking regarding a broad range of not sufficiently
studied interactions, which are not limited to G × E, gene–gene (G × G), and genotype–phenotype
(G × P) but other interactions that could explain the variance of the disease ([Fig. 1 ]). It is plausible to presume that the NAFLD–NASH heritability gap might be explained
by the intricate relationship among genetic variance of the nuclear and mitochondrial
genome, the phenotype, and the yet unexplored interactions with epigenetic and environmental
factors, including the microbiome. Future explorations into this interaction network
could help unravel the missing heritability of NAFLD.
Genetic Knowledge of NAFLD: Integrated Pathways of Disease Pathogenesis
Genetic Knowledge of NAFLD: Integrated Pathways of Disease Pathogenesis
Genes associated with a given disease often provide clues on its pathogenesis and
mechanisms of tissue-associated damage. For example, variants in PNPLA3
[40 ]
[41 ]
[42 ]
[43 ] and TM6SF2
[12 ]
[44 ]
[45 ] have been functionally profiled to confirm a putative relationship with and a responsible
effect on the variability of liver fat content. When their findings are interpreted
jointly, the studies highlighted above yield insights into the significance of liver
fat composition and lipid droplet biology and dynamics, as well as patterns of liver
fat mobilization, in the pathogenesis of NAFLD.
Despite the wealth of knowledge from early hypothesis-driven genetic studies and genome-wide
investigations, the precise mechanisms that explain the variability of the NAFLD phenotype
are not fully understood. We also lack an understanding of the precise processes that
govern the disease progression, as well as the molecular mechanisms associated with
the degree of disease severity.
To offer a framework for overcoming these limitations, we used a tool that expands
annotation details of genes/proteins to perform an integrative analysis. Specifically,
we integrated the genes/loci discovered either via candidate–gene association studies
or genome-wide investigations into the Protein ANalysis THrough Evolutionary Relationships
(PANTHER) database (http://pantherdb.org ). PANTHER contains comprehensive information on the evolution and function of protein-coding
genes from Homo sapiens to a wide range of completely sequenced genome. The training
set of genes is shown in [Table 1 ]. This list includes a search in the GWAS Catalog (https://www.ebi.ac.uk/gwas/ ) using the “Nonalcoholic fatty liver disease” search string, as well as genes that
have been associated with the genetic risk of NAFLD and NASH.[8 ]
[10 ]
[46 ]
[47 ]
[48 ] We used the Gene Ontology (GO) data set to infer and integrate information pertinent
to biological process of all genes listed in [Table 1 ]. The top ranked GO biological processes were adipokinetic hormone receptor activity
(GO:0097003), adiponectin binding (GO:0055100), retinol O-fatty-acyltransferase activity
(GO:0050252), and β-adrenergic receptor activity (GO:0004939) that presented a > 100-fold
change enrichment (see [Table 2 ] for the complete list that includes p -values and fold changes). Overrepresentation and enrichment tests based on Reactome
pathways highlighted signaling to signal transducer and activator of transcription
3 (STAT3 ) (R-HSA-198745) (> 100-fold change), acyl chain remodeling of diacylglycerol (DAG)
and triacylglycerol (TAG) (R-HSA-1482883), adenosine monophosphate-activated protein
kinase-mediated chREBP transcriptional activation and caspase activation (R-HSA-163680
and R-HSA-140534, respectively), and chylomicron-mediated lipid transport (R-HSA-174800)
as significantly enriched (see [Table 2 ] for the complete list). As a result, we may infer that the pathogenesis of NAFLD
and NASH is heavily mediated not only by processes associated with TAG and DAG remodeling,
but also with hepatocyte response to interleukins/cytokines/adipokines, cell-death
immune-mediated pathways, and acute-phase protein genes.
Table 2
Integrated pathways of disease pathogenesis: Gene Ontology (GO) molecular function
and Reactome prediction of NAFLD-predisposing genes
Annotation data set
Fold enrichment
Raw p -value
FDR
GO molecular function (GO annotation number)
Adipokinetic hormone receptor activity (GO:0097003)
> 100
1.51E-04
3.91E-02
Adiponectin binding (GO:0055100)
> 100
2.51E-04
5.31E-02
Retinol O-fatty-acyltransferase activity (GO:0050252)
> 100
2.51E-04
5.08E-02
Beta-adrenergic receptor activity (GO:0004939)
> 100
2.51E-04
4.87E-02
Long-chain fatty acid binding (GO:0036041)
45.38
6.72E-05
2.61E-02
Acylglycerol O-acyltransferase activity (GO:0016411)
27.12
2.26E-05
1.75E-02
Fatty acid binding (GO:0005504)
22.47
4.44E-05
2.30E-02
RNA polymerase II repressing transcription factor binding (GO:0001103)
21.26
5.43E-05
2.53E-02
O-acyltransferase activity (GO:0008374)
15.73
1.61E-04
3.95E-02
Nuclear receptor activity (GO:0004879)
15.73
1.61E-04
3.75E-02
Transcription factor activity, direct ligand regulated sequence-specific DNA binding
(GO:0098531)
15.73
1.61E-04
3.57E-02
Monocarboxylic acid binding (GO:0033293)
14.9
3.01E-05
1.75E-02
Carboxylic acid binding (GO:0031406)
7.25
6.50E-05
2.75E-02
Organic acid binding (GO:0043177)
6.85
9.14E-05
2.66E-02
Cytokine receptor binding (GO:0005126)
5.6
1.12E-04
3.06E-02
Lipid binding (GO:0008289)
3.92
1.52E-05
1.76E-02
Identical protein binding (GO:0042802)
2.99
2.03E-07
9.42E-04
Protein dimerization activity (GO:0046983)
2.91
2.70E-05
1.79E-02
Signaling receptor binding (GO:0005102)
2.7
1.10E-05
1.70E-02
Molecular function regulator (GO:0098772)
2.51
2.22E-05
2.07E-02
Enzyme binding (GO:0019899)
2.45
6.17E-06
1.44E-02
Protein binding (GO:0005515)
1.33
8.31E-05
2.58E-02
Molecular function (GO:0003674)
1.15
6.75E-05
2.42E-02
Reactome pathway (identifier number)
Signaling to STAT3 (R-HSA-198745)
> 100
2.51E-04
4.54E-02
Acyl chain remodeling of DAG and TAG (R-HSA-1482883)
84.28
1.47E-05
7.33E-03
AMPK inhibits chREBP transcriptional activation activity (R-HSA-163680)
73.75
2.02E-05
6.69E-03
Ligand-dependent caspase activation (R-HSA-140534)
42.14
8.13E-05
1.80E-02
Chylomicron-mediated lipid transport (R-HSA-174800)
25.65
3.01E-04
4.28E-02
Caspase activation via extrinsic apoptotic signaling pathway (R-HSA-5357769)
24.58
3.37E-04
4.19E-02
Transcriptional regulation of white adipocyte differentiation (R-HSA-381340)
12.61
6.40E-05
1.59E-02
Lipid digestion, mobilization, and transport (R-HSA-73923)
11.8
1.65E-05
6.56E-03
Glycerophospholipid biosynthesis (R-HSA-1483206)
9.02
2.86E-04
4.74E-02
PPARA activates gene expression (R-HSA-1989781)
8.94
2.98E-04
4.56E-02
Regulation of lipid metabolism by Peroxisome proliferator-activated receptor α (PPARalpha)
(R-HSA-400206)
8.78
3.23E-04
4.28E-02
Metabolism of vitamins and cofactors (R-HSA-196854)
8.08
1.24E-04
2.47E-02
Fatty acid, triacylglycerol, and ketone body metabolism (R-HSA-535734)
6.99
2.46E-05
6.99E-03
Metabolism of lipids and lipoproteins (R-HSA-556833)
5.25
4.26E-09
4.24E-06
Metabolism (R-HSA-1430728)
3.2
2.10E-09
4.17E-06
Abbreviations: DAG, diacylglycerol; FDR, false discovery rate; NAFLD, nonalcoholic
fatty liver disease; NASH, nonalcoholic steatohepatitis; TAG, triacylglycerol.
Note: Enrichment analysis was performed by the PANTHER software available at http://pantherdb.org/ ;[120 ] analysis type: PANTHER Overrepresentation Test (Released December 5, 2017). Annotation
version: PANTHER version 13.1 and Reactome version 58.
Statistical analysis: Fisher's exact test with false discovery rate (FDR) multiple
test correction. Analyzed list: training set was the list of genes associated with
NAFLD or NASH in candidate gene association studies or genome-wide approaches (see
[Table 1 ]).
Reference list: Homo sapiens (all genes in the database).
Shared Molecular Regulatory Pathways of Chronic Liver Damage
Shared Molecular Regulatory Pathways of Chronic Liver Damage
Chronic liver diseases, particularly NAFLD and alcoholic liver disease (ALD), share
the pathogenic pathways and mechanisms.[49 ]
[50 ]
[51 ]
[52 ] There are also consistent similarities in the pathogenesis of complex cholestatic
disorders, including primary biliary cholangitis (formerly known as primary biliary
cirrhosis) and primary sclerosing cholangitis.[53 ]
[54 ]
[55 ] Furthermore, chronic liver damage is associated with conserved pathogenic mechanisms,
in particular hepatocyte cell death pathways, inflammatory processes that involve
immune response, and fibrogenesis.[52 ]
[56 ]
It is therefore biologically plausible to presume that genetic predisposition of convergent
pathophenotypes, specifically liver inflammation and fibrosis, is similar, as discussed
later.
An interesting example of that is the PNPLA3 -rs738409 (I148M) variant that was initially discovered in a GWAS of NAFLD. Subsequently,
evidence of its involvement in the susceptibility of cirrhosis and end-stage liver
disease of patients with ALD emerged, including the development of alcohol-related
cirrhosis[57 ]
[58 ]
[59 ] and HCC.[60 ]
[61 ] In addition, patients homozygous for the risk-G allele of the rs738409 variant seem
to be more susceptible to developing severe alcoholic hepatitis, while also having
a greater risk of poor survival rates.[62 ]
[63 ] Summarized evidence also suggests an involvement of the rs738409 variant in the
risk and severity of chronic hepatitis C.[64 ]
These remarkable observations suggest that the rs738409 (directly by a cis or trans
eQTL effect of PNPLA3 gene and/or by coding protein isoforms with diverse functions) might have a causal
role in inflammation, fibrosis, and hepatocarcinogenesis. In vitro studies showed
that PNPLA3 is required for hepatic stellate cell (HSC) activation and that the rs738409 G variant
potentiates the profibrogenic features of HSCs.[65 ] Although findings yielded by previous studies on PNPLA3 protein regulation indicate
that the adiponutrin protein-family exhibits phospholipase but not retinyl esterase
activity,[66 ]
[67 ] some evidence suggests that the rs738409 variant may be involved in retinol release.[68 ] On the other hand, recent in vitro studies showed that overexpression of PNPLA3 -Met148 variant is associated with 1.75-fold increase in lactic acid, suggesting a
shift of cellular response toward anaerobic metabolism and mitochondrial dysfunction.[69 ] This particular metabolic profile has also been observed in patients with NASH.[70 ] Furthermore, PNPLA3 silencing has been associated with global metabolic perturbations that resemble a
catabolic response associated with protein breakdown.[69 ] These metabolic changes may support the involvement of PNPLA3 in broader metabolic functions in the liver.
More recently, the splice variant rs72613567 in the HSD17B13 gene was found to protect patients with chronic liver disease, including NAFLD and
ALD, from severe and progressive damage, regardless of the etiology.[23 ] These findings were replicated in two recent studies.[71 ]
[72 ]
There are other liver-related traits, such as serum aminotransferase levels, of which
the genetic component of variability is highly influenced by the aforementioned variants,
irrespective of the underlying cause of liver disease.[23 ]
[73 ]
While shared biology and genetics might explain the pathogenesis of chronic liver
damage, the magnitude of the loci that are potentially involved in shared mechanisms
is unknown. Based on the available evidence, PNPLA3 and probably TM6SF2 could explain commonality in pathogenic pathways of metabolic liver disease. However,
some interesting observations suggest that other genes might potentially influence
the shared mechanisms of liver damage. For example, variants in nuclear receptor subfamily
1, group I, member 2 (nuclear pregnane X receptor ) have been associated with NAFLD predisposition[74 ] and with drug clearance and drug-induced liver injury.[75 ]
Variants/mutations in homeostatic iron regulator (HFE), a membrane protein that is
similar to major histocompatibility complex class I-type proteins and that is involved
in iron storage disorder, have been associated not only with hereditary hemochromatosis[76 ] but also with an increased risk of HCC in patients with alcoholic cirrhosis.[77 ] HFE variants have been also involved in the susceptibility of NAFLD,[78 ]
[79 ] although findings yielded by a systematic review of available data does not support
this association.[47 ]
A missense (p.Glu366Lys, also known as PI*Z) variant in SERPINA1 (serpin family A member 1) gene that is known as a predisposing factor for developing
α-1-antitrypsin deficiency[80 ] has been recently associated with the risk of cirrhosis in NAFLD and alcohol misuse.[81 ]
NAFLD Genes and Pleiotropy: Cross-Associations between NAFLD-Predisposing Genes and
Phenotypes of the Metabolic Syndrome
NAFLD Genes and Pleiotropy: Cross-Associations between NAFLD-Predisposing Genes and
Phenotypes of the Metabolic Syndrome
Genome-wide association of complex diseases have demonstrated that a large number
of SNPs are implicated in the susceptibility of multiple—not necessarily related—traits.
The effect of one gene on different phenotypes is known as pleiotropy.[82 ]
[83 ] While the concept of pleiotropy has been largely confined to the field of evolutionary
biology, it become evident during the past 10 years, when genome-wide approaches revealed
cross-phenotype associations among a broad category of complex traits.[84 ] In fact, it is estimated that approximately 4.6% of SNPs discovered by GWAS show
pleiotropic effects,[84 ] and 44% of genes reported in the GWAS Catalog are associated with more than one
phenotype.[85 ]
Gene-based connectivity network based on gene/protein cooccurrence suggests genetic
commonality between NAFLD and features of the metabolic syndrome (MetS), specifically
obesity, type 2 diabetes, and arterial hypertension.[86 ] For example, a rare nonsense (rs149847328, p.Arg227Ter) mutation in GCKR was associated with a rapidly progressive clinical form of NASH, which might be the
first rare genetic form of the disease.[33 ] Interestingly, GCKR is considered a susceptibility gene for a form of maturity-onset diabetes of the
young.[87 ]
There are, however, paradoxical examples of alleles that impart risk of developing
NAFLD but are protective against phenotypes that are closely related with the disease,
including cardiovascular disease (CVD). For example, carriers of the minor T allele
(EK + KK) of the TM6SF2 E167K (rs58542926 C/T) variant are protected from CVD, including myocardial infarction,[88 ] and show low levels of total plasma cholesterol, low-density lipoprotein cholesterol,
and tryglicerides.[12 ]
[14 ]
[18 ]
[44 ]
[89 ] At the same time, the minor-T allele of rs58542926 is a risk factor for NAFLD and
NASH.[12 ]
[14 ]
[17 ]
[18 ]
[89 ]
Together, these observations highlight the concept of shared genetic basis of diverse
phenotypes. This assumption not only fits into biologically meaningful associations,
for example, immune-mediated and/or metabolic diseases, but also traits/diseases that
a priori present a certain level of dissimilarity in their pathogenic mechanisms.
We explored the extent of pleiotropy of loci known to be associated with the genetic
risk of NAFLD and NASH. This exploration was performed by literature-enrichment analysis
offered by the Genset2Diseases (GS2D) Web server (http://cbdm.uni-mainz.de/geneset2diseases )—a tool that computes associations of genes with diseases using biomedical literature
annotations.[90 ] GS2D algorithm prioritizes all human genes according to their relation to a biomedical
topic using all available scientific abstracts and orthology information.[90 ]
As expected, we found that a high proportion of NAFLD-related genes (listed in [Table 1 ]) are also involved in the pathogenesis of phenotypes of the MetS ([Table 3 ]). Examples of shared NAFLD and MetS-related loci include PPARGC1A , a master regulator of carbohydrates and fat metabolism and mitochondrial function
that has been associated with NAFLD, insulin resistance, and liver mitochondrial copy
number,[25 ] as well as with cardiac development[91 ] and cardiac disease.[92 ] Another example is clock circadian regulator (CLOCK ) that has been linked to MetS in rodents[93 ] and in humans, specifically obesity[94 ]
[95 ] and NAFLD.[96 ]
Table 3
Extent of pleiotropy in NAFLD-predisposing genes
Disease
Genes count
Fold change
p -Value
FDR
Genes# (numbers of times that relevant citations regarding each gene appear in biomedical
literature)
Metabolic syndrome
21
10.15
0.000 e+00
0.000e+00
ADRB29 , ADRB322 , AGTR110 , APOC320 , APOE35 , FABP213 , GCKR10 , IRS18 , LEPR20 , LIPC8 , MC4R9 , MTTP7 , ENPP111 , PPARG51 , UCP15 , ADIPOQ163 , CLOCK7 , PPARGC1A8 , ADIPOR17 , ADIPOR25 , PNPLA310
Insulin resistance
22
6.08
5.038e-12
9.236e−11
ADRB335 , APOC315 , FABP228 , GCKR8 , IL653 , IRS199 , LEPR30 , LIPC10 , MC4R8 , MTTP8 , ENPP143 , PPARA10 , PPARG141 , TCF7L248 , TNF101 , UCP19 , DGAT15 , ADIPOQ200 , PPARGC1A28 , ADIPOR131 , ADIPOR228 , PNPLA321
Hypertriglyceridemia
7
14.31
5.279e−07
5.807e−06
APOC338 , APOE36 , FABP26 , GCKR8 , LIPC8 , PPARA8 , ADIPOQ6
Morbid obesity
8
10.85
6.807e−07
6.240e−06
ADRB38 , LEPR13 , MC4R19 , PPARG18 , UCP17 , ADIPOQ24 , PPARGC1A9 , PNPLA36
Dyslipidemias
8
10.32
9.975e−07
7.837e−06
ADRB25 , APOC317 , APOE39 , GCKR6 , LIPC9 , PPARA13 , PPARG11 , ADIPOQ11
Alcoholic liver diseases
5
26.58
1.018e−06
6.999e−06
CD146 , CYP2E110 , HFE9 , TNF10 , PNPLA38
Obesity
21
3.08
2.624e−06
1.604e−05
ADRB281 , ADRB399 , FABP227 , GCKR11 , IRS131 , LEPR152 , LIPC21 , MC4R199 , ENPP138 , PPARA26 , PPARG174 , TCF7L243 , UCP146 , DGAT16 , ADIPOQ200 , CLOCK16 , PPARGC1A28 , ADIPOR121 , ADIPOR213 , PNPLA344 , LYPLAL18
Chronic periodontitis
6
13.99
4.096e−06
2.253e−05
CD147 , IL1B33 , IL621 , PTGS210 , TLR47 , TNF13
Diabetes mellitus,
type 2
25
2.52
9.088e−06
4.544e−05
ADRB337 , AGTR130 , APOC334 , APOE111 , FABP239 , GC9 , GCKR52 , IL698 , IRS185 , LEPR32 , LIPC25 , MC4R32 , MTTP8 , ENPP160 , PPARA28 , PPARG200 , SOD225 , TCF7L2200 , TNF123 , UCP117 , ADIPOQ200 , PPARGC1A75 , ADIPOR126 , ADIPOR222 , LYPLAL15
Polycystic ovary syndrome
9
5.75
2.564e−05
1.175e−04
IL616 , IRS125 , PPARG29 , TCF7L216 , TNF18 , ADIPOQ52 , PPARGC1A5 , ADIPOR16 , ADIPOR25
Overweight
7
7.82
3.058e−05
1.294e−04
ADRB27 , ADRB37 , LEPR10 , MC4R10 , ADIPOQ48 , CLOCK6 , PNPLA311
Hyperlipidemias
6
9.38
4.170e−05
1.638e−04
ADRB35 , APOC38 , APOE67 , FABP26 , LIPC10 , PPARA11
Atherosclerosis
14
3.39
5.669e−05
2.079e−04
AGTR110 , APOC312 , APOE61 , CD1411 , GCKR5 , IL638 , LIPC7 , MIF9 , PPARA12 , PPARG29 , TLR433 , TNF45 , ADIPOQ57 , PPARGC1A7
Diabetes, gestational
6
8.05
9.822e−05
3.376e−04
IRS17 , PPARG16 , TCF7L217 , TNF15 , ADIPOQ37 , PPARGC1A5
Weight loss
6
7.67
1.291e−04
4.177e−04
ADRB311 , FABP25 , LEPR11 , MC4R16 , ADIPOQ28 , CLOCK6
Coronary artery disease
14
2.81
4.014e−04
1.226e−03
AGTR123 , APOC334 , APOE85 , CD1417 , FABP25 , GCKR9 , IL663 , LIPC26 , MTHFR89 , PPARA14 , PPARG33 , ADIPOQ82 , PPARGC1A9 , ADIPOR17
Periodontitis
5
7.64
4.909e−04
1.421e−03
CD1415 , IL1B68 , IL628 , TLR424 , TNF33
Diabetic nephropathies
8
4.29
5.476e−04
1.506e−03
AGTR120 , APOE27 , MTHFR29 , ENPP19 , PPARG27 , SOD28 , TCF7L27 , ADIPOQ31
Glucose intolerance
5
6.71
8.854e−04
2.319e−03
IRS17 , LEPR6 , PPARG12 , TCF7L214 , ADIPOQ31
Premature birth
7
4.45
9.875e−04
2.469e−03
ADRB210 , CD147 , IL1B19 , IL633 , MTHFR17 , TLR413 , TNF29
Nasal polyps
5
5.54
2.085e−03
4.986e−03
CFTR7 , IL1B7 , IL69 , PTGS29 , TNF10
Abortion, habitual
5
5.07
3.053e-03
6.996e-03
APOE9 , IL1B12 , IL616 , MTHFR68 , TNF24
Helicobacter infections
6
3.69
5.725e−03
1.260e−02
CD1410 , IL1B117 , MIF8 , PTGS233 , TLR437 , TNF57
Hepatitis B, chronic
5
4.18
6.921e−03
1.464e−02
HFE7 , IL620 , MIF7 , TNF44 , PNPLA38
Sepsis
5
3.36
1.673e−02
3.408e−02
CD1441 , IL651 , MIF12 , TLR446 , TNF55
Diabetes mellitus
7
2.63
1.719e−02
3.377e−02
PPARA10 , PPARG35 , TCF7L220 , UCP16 , PPARGC1A10 , ADIPOR17 , ADIPOR26
Colitis, ulcerative
7
2.53
2.051e−02
3.890e−02
CD1412 , IL1B20 , MIF8 , STAT318 , TLR428 , TNF55 , NR1I25
Pulmonary disease, chronic obstructive
7
2.47
2.336e−02
4.283e−02
ADRB223 , CFTR13 , GC16 , GCLC,5 IL6,32 SERPINA1,45 TNF55
Abbreviations: EWAS, exome-wide association study; FDR, false discovery rate; GWAS,
genome-wide association study; NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic
steatohepatitis.
Note: The exploration was performed by literature-enrichment analysis offered by the
Genset2Diseases (GS2D) Web server (http://cbdm.uni-mainz.de/geneset2diseases ), a tool that computes associations of genes with diseases using biomedical literature
annotations.[90 ] The training set consisted of a list of genes extracted from published gene associations
with NAFLD and NASH in candidate–gene association studies and genome-wide approaches
(GWAS and EWAS); the full list is shown in [Table 1 ].
Disease: Disease term from the MeSH vocabulary (based on biomedical references represented
by MEDLINE records).
Genes count: The search was restricted using the following filters: For a gene set,
minimum number of genes significantly associated with a disease = 5 and minimum number
of disease-related citations for a gene = 5.
Fold change: (number of input genes significantly associated with the disease in the
literature / number of input genes) / (total number of genes significantly associated
with the disease in the literature / total number of genes).
p -Value: Computed by Fisher's exact test; FDR computed by Benjamini–Hochberg method.
Gene # : list of genes (gene symbols) of input genes significantly associated with the disease
and, in superscript, numbers of relevant citations in the literature.
It could be argued that this analysis is inflated by highly correlated traits and
outcomes, such as NAFLD, type 2 diabetes, insulin resistance, atherosclerosis, dyslipidemia,
etc. Nevertheless, the analysis offered some surprising findings as well. For example,
7 out of 104 input genes were significantly associated in the literature with ulcerative
colitis or premature birth ([Table 3 ]), and 5 out of 104 were associated with abortion, sepsis, and nasal polyps ([Table 3 ]). These results, however, must be interpreted with caution, as further work on the
confirmation of causality and curation of data are needed. Still, it is expected that—if
confirmed—these results may open a window for therapeutic explorations, whereby drugs
can be designed to focus on pleiotropic loci or pleiotropic molecular targets that
cover multiple traits, even though those traits are not obviously associated.
Genetics of NAFLD and Precision Medicine
Genetics of NAFLD and Precision Medicine
With the advances of the genetic knowledge of NAFLD and NASH, it becomes possible
to use this information for clinical applications. Genetic data could be leveraged
to identify individuals at risk of NAFLD, or to estimate the risk of severe histological
outcomes, including NASH and NASH-fibrosis ([Fig. 2 ]). Genetic markers are already being used as tools for personalized clinical practice,
including treatment decisions ([Fig. 2 ]). Specifically, PNPLA3 -rs738409 was incorporated into combined screening algorithms that included clinical
and biochemical data. Nevertheless, the utility of the variant in NAFLD risk estimation
remains inferior to classical predictive or imaging approaches. For example, Kotronen
et al proposed the NAFLD liver fat score , which showed an area under the receiver operating characteristic curve (AUROC) of
0.872 (95% confidence interval [CI]: 0.84–0.91) in predicting liver fat content.[97 ] The addition of rs738409 to the score composed by the presence of type 2 diabetes,
along with the level of serum fasting insulin and aminotransferases, improved the
prediction accuracy by only < 1%.[97 ] A more sophisticated multipanel score—the NAFLD multicomponent score— which integrates omics-derived and clinical variables, the rs738409, and proteomic
data, showed an AUROC of 0.932 for the NAFLD population risk identification.[98 ] Despite this high predictive value, this biomarker panel would be neither practical
nor cost-effective for large-scale population screening programs.
Fig. 2 Genetics of nonalcoholic fatty liver disease (NAFLD) and precision medicine. This
figure shows examples of the use of genetic markers in the clinical setting, as well
as potential yet unexplored applications.
Risk estimation of the disease severity and progression, including NASH and NASH-fibrosis,
offers greater opportunities of clinical translation. In fact, the use of genetic
testing might open a window for the development of gene-based strategies for the diagnosis
of NASH, thus moving the diagnosis of the disease severity from an invasive (liver
biopsy) toward a noninvasive approach. Unfortunately, there is still no evidence of
superiority in terms of efficacy and accuracy of rs738409—or other variants—in predicting
liver histology as compared with the liver biopsy. For example, a combination of laboratory
test (aspartate transaminase and fasting insulin), circulating metabolites, and rs738409
genotypes into the NASH Clinical Score and the NASH ClinLipMet score showed NASH prediction value; the AUROC for NASH was 0.778 (95% CI: 0.709–0.846)
and 0.866 (95% CI: 0.820–0.913), respectively.[99 ] Similar explorations have been conducted in pediatric settings, in which a polygenic
risk score that included combinations of variants in four loci (PNPLA3- rs738409, SOD2- rs4880, KLF6- 3750861, and LPIN1- 13412852) showed an AUROC for NASH of 0.75 (95% CI: 0.67–0.82).[100 ] It has to be emphasized that the more variants (a worse scenario if minor allele
frequencies are low) are included in a polygenic score, the lower the frequency of
individuals at risk will be found.
It should be noted that genetic assessments provide static information for the explored
phenotype or disease trait. However, genetic markers could be used to dynamically
predict the response to any therapeutic intervention, as shown in [Fig. 2 ]. For example, findings yielded by pilot studies indicate that information of the
homozygocity status of rs738409 risk-G allele was useful in predicting the absolute
change in liver fat content of patients enrolled in a program of hypocaloric low-carbohydrate
diet[101 ] or reduced caloric intake.[102 ] The variant in PNPLA3 seems to be also useful in predicting changes in body weight of morbidly NAFLD patients
enrolled in a bariatric surgery program.[103 ]
Potential avenues for future research that would significantly affect prevention,
surveillance, and prognosis assessment of NAFLD and NASH are summarized in [Fig. 2 ]. Poorly explored but promising uses of genetic markers include, for example, surveillance
of HCC that could occur in cirrhotic and noncirrhotic patients with NASH, or assessment
of liver transplantation prognostic outcomes ([Fig. 2 ]). The potential interplay between the recipient and donor genotype of variants of
interest suggests an interesting yet poorly explored research avenue in the field
of precision medicine. Findings yielded by a small number of studies suggest that
the PNPLA3 -rs738409 G allele in either the donor or the recipient could be a risk factor for
NAFLD recurrence or appearance after liver transplantation.[104 ]
[105 ]
The potential value of using genetic markers for treatment decisions pertaining to
patients enrolled in NASH clinical trials remains largely unexploited. Nonetheless,
it is expected that this specific clinical application will be explored in the near
future as the use of novel drugs for the treatment of NASH becomes available in the
market. Potential shortcomings and limitations of genetic markers in clinical decision
making are shown in [Fig. 2 ].
Finally, while it is known that NAFLD is a polygenic and complex disease, the use
of polygenic risk scores in the NASH diagnosis and prognosis and its interaction with
environmental exposure remain largely unknown. Yet, the use of polygenic risk scores
in personalized NAFLD care should be tested and optimized to perform well in diverse
ethnic groups because the frequency of the risk alleles varies significantly among
populations.[2 ]
[11 ]
[12 ]
[13 ]
[14 ]
[23 ]
[106 ] Remarkable examples of allele frequency disparity among populations are PNPLA3 -rs738309, of which the frequency of the G-risk allele varies from 12% in African
population to 48% in South American (Mexican, Colombian, Peruvian, and Puerto Rican)
population (as shown in http://www.ensembl.org ), and HSD17B13 -rs72613567, of which the frequency of the A-protective insertion allele varies from
5% in African population to 34% among East Asian population (figures of population
genetics were extracted from the 1000 Genomes Project, http://www.internationalgenome.org/ ).
Nonalcoholic Steatohepatitis Treatment Inferred from Genetic Discoveries
Nonalcoholic Steatohepatitis Treatment Inferred from Genetic Discoveries
There are currently no approved pharmacologic therapies for NASH. However, many novel
drugs are being tested for safety and efficacy.[107 ] Some of these drugs have been designed based on the available knowledge of NAFLD
pathogenesis and the underlying mechanisms of the disease progression, including metabolic
pathways, inflammatory cascades, and/or fibrogenesis.[107 ]
Patients with NAFLD and NASH currently receive lifestyle recommendations, and are
eventually medicated with known and relatively safe drugs, for example, α tocopherol
(vitamin E), ursodeoxycholic acid (UDCA), metformin, losartan, or the insulin sensitizer
pioglitazone,[107 ]
[108 ] which are already available on the market. These drugs are usually prescribed not
necessarily for the treatment of NASH but for the treatment of associated comorbidities,
for example, type 2 diabetes and arterial hypertension. Hence, their use in the treatment
of NASH is purely empirical and/or pragmatic, guided by the assumption of a putative
effect on the disease. Despite this limitation, some of the commonly prescribed drugs,
including vitamin E and pioglitazone, have been shown to lead to a partial improvement
in liver outcomes, such as liver enzymes.[108 ]
To answer the question of whether medications that patients receive in ordinary clinical
practice are in line with disease mechanisms inferred from genetic discoveries, we
performed text-mined chemical–gene–disease interactions by the Comparative Toxicogenomics
Database (CTD; http://ctdbase.org ). We specifically modeled the interaction network among genes associated with NAFLD
and NASH that was reported in previous studies ([Table 1 ]), genes associated with fibrosis and inflammation (mined from the curated gene–disease
associations that are established by both the CTD data set and Online Mendelian Inheritance
in Man), and drugs that have been used or are currently in use for the treatment of
NASH (α tocopherol-vitamin E, UDCA, metformin, pioglitazone, losartan, and liraglutide).
It is evident from the above that translating the information generated from NAFLD
genetic studies into new treatment drugs and/or clinical biomarkers was a major challenge.
In fact, data generated from either candidate–gene association studies or genome-wide
surveys have not been exploited for drug discovery, even though some genes involved
are shared by NAFLD and general processes, such as inflammation and fibrosis ([Fig. 3 ]). Nevertheless, we obtained some remarkable results. For example, some of these
drugs—including vitamin E, pioglitazone, and even losartan—are predicted to target
genes associated with the genetic risk of NAFLD or NASH ([Fig. 3 ]); conversely, liraglutide seems not to match any genes discovered in genetic studies
([Fig. 3 ]).
Fig. 3 Nonalcoholic fatty liver disease (NAFLD) genes and drug interaction network. This
figure shows the shared genes associated with NAFLD and nonalcoholic steatohepatitis
(NASH) (see [Table 1 ] for the training set of genes, n = 104), genes associated with fibrosis (n = 49), and inflammation (n = 130) (the list of genes under these terms is automatically established from the
Comparative Toxicogenomics Database [CTD] data set) (A ), and drugs that have been used or are currently in use for the treatment of NASH:
α-tocopherol/vitamin E, ursodeoxycholic acid (UDCA) (B ), pioglitazone, liraglutide (C ), and metformin and losartan (D ). Numbers after terms in Venn graphs, including genes associated with the selected
drugs, indicate the number of genes stored in the CTD database for a given term: UDCA,
n = 299; vitamin E, n = 174; pioglitazone, n = 686; liraglutide, n = 3; losartan, n = 172; and metformin, n = 424. CTD integrates information on chemicals, including chemical structures, curated
interacting genes and proteins, curated and inferred disease relationships, and enriched
pathways and functional annotations, which were extracted from the U.S. National Library
of Medicine, the Online Mendelian Inheritance in Man (OMIM) database, and the gene
database at the National Center for Biotechnology Information (NCBI). The interaction
network was modeled by the CTD (http://ctdbase.org ).
Particularly interesting are the following targets: peroxisome proliferator-activated
receptor alpha (PPARα ) and peroxisome proliferator-activated receptor gamma (PPARγ ) and its coactivator PPARG coactivator 1 alpha (PGC1α ), STAT3 , adrenoceptor beta 2 (ADRB3 ), and tumor necrosis factor (TNF ).
Elafibranor (code name GFT505), a dual PPARα and PPARδ ligand that is currently in phase III, has been proven to consistently ameliorate
histological outcomes associated with the disease severity.[110 ] This pharmacological agent, which has been specifically designed to target PPARs,
represents a remarkable example of a drug with potentially pleiotropic and systemic
effects.[111 ]
NAFLD and NASH Genes and the Druggable Proteome
NAFLD and NASH Genes and the Druggable Proteome
Variants associated with the greatest effects on NAFLD and NASH are indeed missense
SNPs (PNPLA3 -I148M and TM6SF2 -E167K) that not only explain modest changes in gene/protein expression levels but
hardly represent “druggable” targets.[106 ] These two loci present either pleiotropic metabolic effects,[69 ] or are associated with dual and opposite effects on critical phenotypes, particularly
TM6SF2 -E167K variant, as already mentioned.[14 ] Hence, the potential use of these proteins as pharmacological targets by modulating
their protein and/or enzymatic activity is rather limited.[106 ]
As a proof-of-concept, we performed an in silico “druggability” prediction of known
NAFLD GWAS-discovered genes—including PNPLA3 , TM6SF2 , GCKR , and HSD17B13 —based on protein structural druggability, ligand-based druggability, and network-based
druggability implemented by the canSAR resource (http://cansar.icr.ac.uk/ ). This resource contains information of the whole human proteome, as well as 2,136
model organisms and 8,631 protein families.[112 ] Predictions are based on the premise that a protein is “druggable” if its activity
can be modulated by its binding to a drug-like small compound.[112 ] The results yielded by this analysis revealed that neither PNPLA3 nor TM6SF2 have
any “druggable” protein structure or are associated with any bioactive compound, or
are potentially druggable by any predicted ligand-based approach. Assessment of the
same parameters for GCKR and HSD17B13 shows a contrasting scenario, as both proteins
are potentially druggable targets based on their molecular target three-dimensional
structure (Protein Data Bank) and ligandability prediction, which were performed for
all identified pockets within each protein structure. Based on the homology of closest
druggable structure(s), which examines the structure of the protein and identifies
any cavities on the protein surface where a drug-like compound could bind, we found
that HSD17B13 and GCKR have a structural druggability of 66.67 and 100%, respectively.
Nevertheless, druggability prediction using different approaches, including tumor-tissue
and cell line expression, and mutational analysis indicated that overall druggability
percentile of GCKR is 44.08%, including druggability for cancer (46.36%) and other
therapeutics (17.32%).
Specific focused analysis by the canSAR resource on candidate genes previously associated
with NAFLD and the disease severity, for instance STAT3 ,[113 ] revealed that the protein coded by this gene presents a ligand-based druggability
score of 97%. This specific score indicates the likely druggability of the protein
based on the chemical properties of different compounds tested against the protein
itself and/or its homologs. STAT3 protein presents an overall druggability percentile
of 99.21%, and druggability for cancer therapeutics of 99.39%. Furthermore, structural
druggability of STAT3 is 100%.
Network-based druggability assessment for STAT3 and GCKR proteins, which examines
the structure or the protein–protein interaction around the target, suggests that
STAT3 but not GCKR is a good drug target, as disrupting its activity would affect
different and relevant cellular processes ([Fig. 4 ]). In fact, STAT3 performs better than average targets of other therapeutic areas,
even cancer ([Fig. 4 ]).
Fig. 4 Signal transducer and activator of transcription 3 (STAT3 ) and glucokinase regulator gene (GCKR ) radar network-predicted druggability plots. Radar plots showing representative network
property profiles of STAT3 and GCKR as potential drug targets (blue plot). The predicted
network druggability is compared with the randomized network model of an average cancer
target (green plot) or an average target for a noncancer drug. Prediction was performed
by the canSAR resource available at https://cansar.icr.ac.uk . The network descriptors are divided into three categories: Substructures, Topological,
and Community-based. The substructures were obtained from Przulj.[119 ] The graphlets are labeled as G-n and the orbits as O-. Topological descriptors:
Betweenness centrality: A measure for quantifying the influence of one protein on
the communication between other proteins in a network. Closeness centrality: Measures
how many steps are required for a protein to reach every other protein—a lower number
of steps indicates faster communication. Burt's constraint: Burt's Structural Hole
and Ego Networks. Constraint is higher when a protein's neighbors are also connected,
making the protein more redundant. k-core: A k-core is a fully connected subgraph
in which each protein has a degree of at least k. Kleinberg hub score: A measure of
how authoritative each protein is based on the principal eigenvector of the network's
adjacency matrix. Google PageRank: A measure of the relative importance of the protein
within the network; as the protein–protein network is an undirected graph, PageRank
is positively correlated with degree distribution. Clustering coefficient: The probability
that the neighbors are also connected to each other, calculated by the ratio of triangles
connected to the protein. Community-based descriptors: Community Size (Walktrap):
Based on hierarchical clustering and attempts to find densely connected subgraphs
via random walks across the network. Community size (Spinglass): Based on partitional
clustering, where the number of communities to detect is predefined. Intracommunity:
Ratio of inter- to intracommunity communication. A higher number indicates that the
protein's neighbors are in the same community. Spinglass inner: Number of interactions
within the community. Spinglass outer: Number of interactions between the community
and the rest of the network.
A recently published experimental study in mice in which the researchers used a novel
small STAT3 inhibitor molecule (C188–9) has demonstrated its beneficial effects on
liver-related outcomes.[114 ] C188–9 not only reduced tumor development but also improved liver steatosis, inflammation,
and pathological lesions of NASH in mice with hepatocyte-specific deletion of Pten gene.[114 ] Further experimental and clinical evidence indicates that STAT3 is not only involved
in the regulatory circuit of liver fibrogenesis ([Fig. 3 ]),[115 ] but is also involved in NASH by exacerbating insulin resistance.[116 ]
In conclusion, as genetic studies of NAFLD and NASH continue to expand, they are likely
to provide insights into the mechanisms of disease pathogenesis and progression. Knowledge
on variants associated with the susceptibility of NASH offers an interesting opportunity
not only for individualized risk prediction and prognosis, but also for the individual
assessment of therapeutic response. Hence, future medicine in the field of NASH would
benefit from patient-optimized strategies, which rather than being implemented on
a wide scale may be tailored to the genetic makeup of each patient.
Main Concepts and Learning Points
Main Concepts and Learning Points
Concepts
NAFLD is a polygenic complex disease
NAFLD gene-regulatory networks
Shared pathogenic mechanisms of chronic liver damage
NAFLD genes: pleiotropy or just biologically meaningful associations?
NAFLD and the druggable proteome
Learning Points
The genetic component of NAFLD and NASH is largely explained by variants in genes
that regulate glucose and fat homeostasis.
Integrated pathways of disease pathogenesis suggest > 100-fold change enrichment in
adiponectin and STAT3 activated-signaling pathways, retinol O-fatty-acyltransferase,
and β-adrenergic activity.
Convergent pathophenotypes, including liver inflammation and fibrosis, share molecular
regulatory pathways and disease-predisposing genes.
NAFLD-associated genes overlap with loci that were originally thought to play a role
in the metabolic syndrome-associated traits.
Data generated from candidate–gene association studies and genome-wide surveys can
be leveraged to identify therapeutic targets.