Materials and Methods
Our approach, atheMir, collects information from public databases, recent reviews,
and performs context-sensitive text mining on public PubMed abstracts. We use curated
sets of synonyms derived from domain-specific ontologies for genes, diseases, species,
cell types, experimental contexts, functional classes, and pathways. After detecting
cooccurrences of synonyms for objects from theses ontologies, we apply natural language
processing techniques to analyze the sentence structure and to extract miRNA–gene
interactions, as well as the associated context. This way we produce a list of miRNA–gene
interactions which can afterwards easily be queried and restricted to specific diseases,
for example, atherosclerosis, and specific contexts, for example, endothelial cells
and processes in specific late atherosclerosis stages such as plaque destabilization.
The list of miRNA:target relations has been extracted from public miRNA databases
and from PubMed abstract via text mining. Text mining has been performed on the complete
PubMed corpus (September 2018), which consists of 28,787,497 abstracts.
Our automated atheMir approach is evaluated via a detailed assessment using two reviews
of the field. After evaluating the general performance of our data-driven review process,
we focus on miRNA–gene interactions in specific cell types and specific stages of
atherosclerosis thereby aiming at a more (context) specific view on active miRNA regulations.
Gene and miRNA Vocabulary
For gene and miRNA vocabulary, controlled synonym lists are obtained from HUGO Gene
Nomenclature Committee (HGNC)[13] (human) and Mouse Genome Informatics (MGI)[14] (mouse). Applying a Named Entity Recognition (NER) approach for finding interactions
(using syngrep[15]), we rely on a good vocabulary. The vocabulary should be chosen such that it is
exact, commonly used, easily maintainable, and mostly not ambiguous. Both the human
HGNC and mouse MGI gene lists fulfill these criteria. Both lists provide approved
and previous gene symbols as well as gene names, common synonyms, and name synonyms.
For cross-species transfer, mouse gene identifiers are mapped on the human gene identifiers
by gene symbol using mapping lists created from homologous gene lists provided by
BioMart.[16]
The HGNC gene list contains 45,467 gene entries. For 43,518 gene entries synonyms
are given. On average, about six different synonyms are provided per gene, totaling
to 264,850 synonyms. For mouse, the corresponding gene file from MGI contains 66,933
entries. Synonyms are provided for 16,470 entries. On average, each of these gene
identifiers has about 7 synonyms, 113,187 synonyms in total.
Of all PubMed abstracts, 6,373,603 abstracts mention at least one human gene and 7,578,721
a mouse synonym, respectively.
The miRNA vocabulary is derived from miRBase.[17] miRNAs are mentioned in 53,896 abstracts and miRNAs together with human genes are
found in 40,312 abstracts of these, or 40,957 in mouse, respectively.
Context-Based Text Mining
Each abstract is categorized with respect to five dimensions: species (human, mouse),
disease (from disease ontology[18]), cell line (from Cellosaurus[19]), protein function (from National Cancer Institute Thesaurus[20]), and gene ontology.[21]
The context of a miRNA–gene interactions is defined by the document it is found in,
thus for the list of recognized features per dimension, in the document. A document
has a specific feature, if a synonym of this feature (e.g., ontology terms) was found.
For miRNA identifiers, also the organism prefix is evaluated, that is, miRNAs hsa-miR-98
and mmu-miR-124 establish a human and mouse context, respectively. While this would
also be possible for human/mouse genes, we have refrained from doing so, because frequently
the only difference between human and mouse genes is the capitalization.
The second dimension of the context of a miRNA interaction is the disease. We extract
disease synonyms as controlled vocabulary from the disease ontology. The Gene Ontology
(GO) is used to categorize abstracts by molecular function, cellular compartment,
and biological process. Finally, the NCIT ontology is subset for proteins grouped
by function, to search for protein classes (e.g., cytokines or chemokines).
For each vocabulary of a dimension, derived from ontologies, common (English) words
and manually curated words are excluded from the synonym lists. In general, taxonomic
identifiers, cell names, and disease names are excluded, if these are not relevant
for the specific dimension. For example, creating the gene synonyms, taxonomic identifiers,
cell names and disease names are excluded to avoid ambiguities. Common words leading
to ambiguities within the same dimensions are also excluded.
Aggregation of Text Mining Results
Using the controlled vocabularies, the PubMed abstracts are scanned for occurrences
of synonyms. Building the atheMir database, the text mining result has to be summarized
and aggregated.
With the custom gene and miRNA lists, we perform NER on these abstracts using an in-house
tool, syngrep.[15] Using a sentence-based cooccurrence of a miRNA and a gene object with connecting
relation (interaction verb), we identify a total of 41,930 miRNA–gene interactions
within 36,505 PubMed abstracts (note that only 0.1% of all and 0.5% of human PubMed
abstracts contain a valid miRNA interaction).
For a valid interaction, an appropriate verb phrase has to be identified as target
verb connecting miRNA and gene. If both gene and miRNA are mentioned in an enumeration
(without a connecting verb), the interaction is discarded. To detect the relation,
we use spaCy[22] to analyze the grammatical structure of the sentence. spaCy has been shown to be
quite accurate and it is one of the fastest tools available.[23] As a complete analysis of the PubMed including test mining and detection of valid
interactions takes less than 12 h (on a laptop computer), this allows updating the
text mining regularly. spaCy has several pretrained models available for analyzing
sentences. Here, we make use of the en_core_web_1 g neural-network model. Using such
models, spaCy can build dependency trees for full sentences. This allows to use the
semantic structure of a sentence to accept or reject a miRNA–gene interaction ([Supplementary Fig. S1], available in the online version). For this dependency tree, we reconstruct the
path from the target word (gene or miRNA) up to the root element, which in this case
is also the connecting verb, confirmed. This path is called the stack and is further
analyzed. For KLF12, the stack is KLF12, confirmed. For miR-34a the stack is: miR-34a, 5p, of, targets, as, confirmed. This analysis also shows the problems of natural language processing. The 5p part
of miR-486–5p is detected as a separate noun and is thus returned as a separate element
within the stack. This stack ends at the root element of the sentence, too. Following
the stack construction, we perform three analyses.
Fig. 1 The chemokine–miRNA interactome identified by atheMir: for each chemokine (green),
all interacting miRNAs (red) are shown. The size of a node corresponds to the number
of found interactions (representing its degree). Interactions are taken from text
mining (PubMed abstracts), miRTarBase, and DIANA-TarBase.
Fig. 2 The chemokine–miRNA interactome identified by atheMir. We show the increment to the
original fig. 3 from Hartmann et al.[8] For each chemokine in the figure, a set of new interacting miRNAs is shown. The
respective blocks of miRNAs exhibit the massive growth of knowledge on miRNA interactions
in atherosclerosis since the Hartmann et al review in 2015 (original figure underlayed).
First, we compare whether the stack contains any connecting verbs by intersecting
the stacks of the gene and the miRNA. If a verb is found in the intersection, it is
ensured that the sentence structure either resembles subject (S)-verb (V)-object (O)
(SVO) or OVS, where gene and miRNA must be subject and object, or vice versa. A slightly
weaker criterion for detecting an interaction, but required due to the aforementioned
problems in natural language processing regarding the detection of the sentence structure,
particularly with miRNAs, is the following. For every verb in the whole sentence it
is checked whether the gene and miRNA are subject or object of the given verb. In
this case, the only verb detected which fulfills this condition is again confirmed. Finally, to exclude cases in which miRNA and gene are mentioned in an enumeration,
for example, in the sentence “It has been shown that macrophages can communicate with
endothelial cells via ICAM1 and miR-98” ([Supplementary Fig. S2], available in the online version), enumerations are also analyzed. We detect that
ICAM1 and miR-98 are contained both in the same enumeration and particularly note
that there is no connecting verb. Thus, a possible miR-98:ICAMl interaction is thus
rejected, here.
In general, the remaining dimensions are aggregated within atheMir independently from
found interactions, because the abstract classification by features is independent
from interactions. Thus, dimensional information exists for a PubMed abstract even
if not all context features could be identified (e.g., missing disease). After combining
all dimensions, atheMir provides information in at least one dimension for 22,286,667
PubMed articles, of which 8,813,329 have annotated diseases and 36,955 have miRNA–gene
interactions.
Additional Databases
In addition to text mining results, further databases with experimental data have
been integrated into atheMir. These include interactions from miRTarBase,[24] miRecords,[25] and DIANA-TarBase.[26] Expression values for miRNAs in specific cells are provided by the FANTOM5 project.[27]
The CBN database provides gene-regulatory networks for several (biological) processes.[12] The CBN database contains network models for different processes and diseases in
human, mouse, or rat. Among the 169 networks (version 1 + 2), 6 networks are specific
to cardiovascular diseases. These 6 CBN model different stages of atherosclerosis
development: (I) Endothelial cell activation, (II) Endothelial cell-monocyte interaction,
(III) Foam cell formation, (IV) Plaque destabilization, (V) Platelet activation and
(VI) Smooth muscle cell activation.
Results and Discussion
Benchmarking and Assessment Using Atherosclerosis Reviews
To assess atheMir with respect to precision, sensitivity, and false discovery rate
(FDR), we use interactions from published reviews by Andreou et al[9] and Hartmann et al.[8] We restrict atheMir to context-specific miRNAs for several processes in atherosclerosis.
In particular, we compare the atheMir miRNA–gene interactions with those described
in endothelial cells (Andreou et al,[9] [Fig. 1]), those involved in plaque destabilization (Andreou et al,[9] [Fig. 2]), and miRNAs involved in the regulation of the initiation, progression, and thrombotic
complications (Andreou et al,[9] [Table 1]).
Table 1
Comparison of miRNA-gene interactions
Analysis of endothelial miRNA-gene interactions in atherosclerosis (Andreou et al,[9] fig. 1). Interactions mentioned in the Andreou et al review and found by atheMir
are counted in atheMir + Andreou. Those mentioned by Andreou et al, but not found
by atheMir (missed) are counted in the Andreou column and listed as Missed miRNAs.
Values in brackets represent manually curated true interactions found only by atheMir.
|
Gene
|
atheMir + Andreou
|
atheMir
|
Andreou
|
Missed miRNAs
|
CXCL12
|
1
|
0
|
0
|
|
DLK1
|
2
|
0
|
0
|
|
ETS1
|
2
|
3 (3)
|
1
|
miR-126[a]
|
F11R
|
1
|
0
|
1
|
miR-143[a]
|
ICAM1
|
1
|
3 (3)
|
1
|
miR-17[a]
|
IRAK1
|
0
|
0
|
2
|
miR-146a[a], miR-146b[a]
|
IRAK2
|
0
|
0
|
2
|
miR-146a[a], miR-146b[a]
|
KLF2
|
2
|
0
|
2
|
miR-145[a], miR-126[a]
|
KLF4
|
2
|
2 (2)
|
0
|
miR-663b
|
KPNA4
|
1
|
2 (1)
|
0
|
|
NANOS3
|
1
|
10 (10)
|
0
|
|
PPARA
|
2
|
2 (0)
|
0
|
|
SELE
|
0
|
2 (2)
|
1
|
miR-31[a]
|
SIRT1
|
1
|
4 (4)
|
0
|
|
SOCS5
|
1
|
0
|
0
|
|
TAB1
|
1
|
0
|
0
|
miR-10[b]
|
TIMP3
|
1
|
0
|
0
|
|
TRAF6
|
0
|
0
|
2
|
miR-146a[a], miR-146b[a]
|
VCAM1
|
1
|
6 (6)
|
0
|
|
Analysis of miRNA-gene interactions involved in initiation, progression, and thrombotic
complications of atherosclerosis (Andreou et al
[9]
, table 1)
|
Gene
|
atheMir + Andreou
|
atheMir
|
Andreou
|
Missed miRNAs
|
ABCA1
|
2
|
29 (28)
|
0
|
|
ABCG1
|
1
|
7 (6)
|
0
|
|
AKT1
|
1
|
8 (8)
|
0 (1)
|
|
BCL6
|
2
|
1 (1)
|
0
|
|
CPT1A
|
1
|
1 (1)
|
0
|
|
DLK1
|
2
|
0
|
0
|
|
KLF2
|
1
|
1
|
0
|
|
KLF4
|
2
|
3 (3)
|
0
|
|
KPNA4
|
1
|
2 (1)
|
0
|
|
MAP3K10
|
2
|
0
|
0
|
|
MT-TP
|
1
|
0
|
0
|
|
RGS16
|
1
|
0
|
0
|
|
SOCS1
|
1
|
0
|
0
|
|
SOCS5
|
1
|
0
|
0
|
|
TIMP3
|
1
|
2 (1)
|
0
|
|
Abbreviation: miRNA, micro-ribonucleic acid.
a Missed also in PubMed search.
b miR-Xa found as additional.
Based on this assessment of atheMir for miRNAs in atherosclerosis, we more specifically
analyze the miRNA interactions with chemokines in atherosclerosis. For this, we compare
interactions identified herein with the review by Hartmann et al.[8]
We assess the sensitivity and precision of atheMir via a systematic comparison with
(standard-of-truth as established by) these reviews. We evaluate further whether the
text mining problems mentioned above hamper our approach and goals, and show that
our approach can recapitulate current knowledge and provide added value. The assessment
also indicates that the approach can be used in other contexts and may also be successfully
applied to other fields/diseases of interest.
For each stage from the CBNs, we identify the interacting miRNAs and the regulating
processes. Thus, for each stage we identify the important cell types and the relevant
miRNA–gene interactions in these cell types.
Ideally, we would see a sequential progression of the identified processes through
these six stages. But, for this, the low resolution of the stages poses a problem.
Also, only few relevant key players have been identified in each stage so far, and
many miRNAs are involved in many stages. For many interactions, the necessary context
has not yet been established. And, lastly, the text mining methods are not perfect
and may produce both false positive and false negative hits even if relevant interactions
have been described in the literature. Particularly the low resolution is problematic,
since the six networks model both spatial (cell–cell migration and differentiation)
and temporal development of various cells and tissues over years. Thus, we cannot
expect a real time series of miRNA regulation in atherosclerosis from this resource.
atheMir Database
The atheMir database contains text mining interactions for 6,244 genes and 1,375 miRNAs.
A total of 26,428 interactions between these genes and miRNAs are recorded. Of these,
19,679 interactions are in a disease context. A total of 2,242 of these interactions
are associated with cardiovascular system diseases (DOID:1287) or atherosclerosis
(DOID:1936). In the atherosclerosis context, atheMir contains 643 miRNA–gene interactions
from text mining. The number of PubMed abstracts per dimension, including additional
databases, is listed in [Supplementary Table S1] (available in the online version).
miRNAs Relevant in Atherosclerosis
For a first assessment of atheMir, all miRNAs involved in atherosclerosis-specific
processes and respective cell types (derived from the processes defined by Andreou
et al[9]) are analyzed ([Supplementary Table S8], available in the online version). Any miRNA–gene interaction must have been detected
in the atherosclerosis (DOID:1936) context ([Supplementary Table S8], search parameters in [Supplementary Table S6], available in the online version).
For SMCs (proliferation/migration), many missed miRNAs have been observed. Also for
the cell types in angiogenesis, monocyte differentiation/macrophage activation, and
cholesterol efflux, several miRNAs are missed. A miRNA is missed, if the gold standard
(here Andreou et al review[9]) lists this miRNA, but it is not found by atheMir. We investigate why these miRNAs
are missed, considering the wide search in our database. We used the keywords given
in the review[9] to perform a manual search in PubMed, for example, for miR-378 in the angiogenesis
context atherosclerosis angiogenesis miR-378. Similarly, we evaluated the missed miRNAs
interactions of miR-125 in the T cell differentiation and activation context. For
most of the missed miRNAs, such manual searches returned no results.
The found miRNAs show that on the one hand an automated approach can retrieve many
relevant miRNAs for specific processes in a disease context, but may miss some. This
might be because information is not easily accessible from PubMed abstracts: some
reported miRNAs might be involved in more general processes, which are not specific
to atherosclerosis, and therefore the atherosclerosis keyword is not included in the
specific abstracts.
Another reason for not finding relevant miRNAs in atheMir is its focus on miRNA–gene
interactions—if an article mentions a miRNA in a specific context without a gene,
it is not included in atheMir. In some cases, the vocabulary used may not be sufficient
to detect specific diseases or processes. Furthermore, the interactions could only
be present in the full text, which we currently do not analyze with text mining.
For the eight atherosclerotic processes defined by Andreou et al,[9] we provide an overview of the accepted, missed, and additional miRNAs ([Supplementary Table S8], available in the online version).
Endothelial miRNAs Implicated in Atherosclerosis[9]
Analyzing Fig. 1 from Andreou et al,[9] we augment the identified interactions with atheMir ([Table 1]). The search parameters for the atheMir query are outlined in [Supplementary Table S5] (available in the online version). For most interactions, corresponding literature
was found.
For all missed interactions (e.g., miR-146:IRAK2), we find an interaction without
the atherosclerosis context. For instance, the miR-31:SELE (E-selectin) interaction
is only reported in cancer,[34] yet mentioned in the review. Also, the missed interactions miR-126:ETS1 and miR-17:ICAM1
can be explained: the cited literature only refers to endothelial cells in general,
and does not mention any disease. Finally, the miR-146a/b:IRAK1/2 interactions are
not mentioned in the abstract of the cited literature. Among the accepted interactions
for KLF4 is also its interaction with miR-103.[35]
miRNAs Implicated in Atherosclerotic Plaque Destabilization[9]
Similar to the previous benchmark, we also checked Fig. 2 from the Andreou et al review.[9] The results are very similar ([Supplementary Table S2], available in the online version). Some interactions are missed, because they are
not (yet) reported in an atherosclerotic context. More specifically, the miR-29 and
MMP2/3/9/13/14 interactions are only found in rotator cuff tears,[36] but no atherosclerosis-specific interactions are found in/by PubMed.
Table 2
Systematic evaluation of atheMir Text mining results against facts mentioned in the
Andreou et al review[9]
Systematic evaluation for endothelial miRNAs in atherosclerosis (
[Table 1]
)
|
|
|
PubMed Corr.
|
Andreou
|
|
|
Cond. Pos.
|
Cond. Neg.
|
Cond. Pos.
|
Cond. Neg.
|
DB
|
Pred. (True)
|
20 + 31
|
3
|
20
|
34
|
|
Not Pred. (False)
|
0
|
−
|
12
|
−
|
Systematic evaluation for miRNAs in regulation of the initiation, progression, and
thrombotic complication (
[Table 1]
)
|
|
|
PubMed Corr.
|
Andreou
|
|
|
Cond. Pos.
|
Cond. Neg.
|
Cond. Pos.
|
Cond. Neg.
|
DB
|
Pred. (True)
|
20 + 49
|
5
|
20
|
54
|
|
Not Pred. (False)
|
0
|
−
|
0
|
−
|
Evaluation of statistical measures for the above results
|
|
Endothelial miRNA–gene interactions ([Table 1])
|
Init/Progr/Thrombotic ([Table 1])
|
Measure
|
DB / PubMed
|
DB / Andr.
|
Comb.
|
DB / PubMed
|
DB / Andr.
|
Comb.
|
Sensitivity
|
1.0
|
0.625
|
0.625
|
1.0
|
1.0
|
1.0
|
False Discovery
|
0.0556
|
0.6297
|
|
0.0676
|
0.7297
|
|
Rate
|
|
|
|
|
|
|
Precision
|
0.9444
|
0.3703
|
0.9444
|
0.9324
|
0.2703
|
0.9324
|
F
1
|
0.9714
|
0.4651
|
0.7522
|
0.9650
|
0.4255
|
0.965
|
Abbreviation: miRNA, micro-ribonucleic acid.
Regulation of the Initiation, Progression, and Thrombotic Complications of Atherosclerosis
by miRNAs in Mice[9]
In the same fashion as before table 1 by Andreou et al[9] has been analyzed ([Table 1]). All interactions are found in the atherosclerosis context.
Systematic Evaluation
For the endothelial miRNAs implicated in atherosclerosis ([Table 1]) and regulation of the initiation, progression, and thrombotic complications of
atherosclerosis by miRNAs in mice ([Table 1]), we manually checked all interactions to systematically evaluate our text mining
method for sensitivity
, precision
, and FDR
. Furthermore, we calculate the F
1 score as
. The evaluations are presented in [Table 2]. Sensitivity and precision are best for the comparison of atheMir and PubMed. On
the one hand, this is little surprising. Our approach has the same input as the PubMed
search. Thus, the number of true predicted elements should be equal to the true positives.
This shows that our context-based search does neither miss nor add too many interactions.
The comparison between Andreou et al and atheMir mostly lacks precision. This is due
to atheMir finding many additional interactions. However, since these have been manually
checked, it allows us to combine the ground truth from the Andreou et al review regarding
sensitivity and the manual curation from PubMed regarding precision. The resulting
F
l scores of 0.75 and 0.97 for the combined analyses show that atheMir can reliably
be used for miRNA–gene interaction mining. For the endothelial miRNAs evaluation,
the low F
l scores originate from many miRNA–gene interactions which we could not find in atherosclerosis
using manual PubMed search.
Chemokine-Specific Networks
In the previous section, atheMir could replicate and increment the presented networks.
Here, we want to focus on the specific miRNA–chemokine interactions for endothelial
cells and macrophages. Using atheMir, we extract a network of all miRNA–chemokine
interactions ([Fig. 1], no other context) and increment the existing network from Hartmann et al ([Fig. 2]). There are 742 interactions, 234 of these are derived from DIANA-TarBase only.
Interestingly, only 20 interactions are recorded by both DIANA-TarBase and miRTarBase.
The intersections of results from any experimental database, DIANA-TarBase, miRTarBase,
and miRecords, with PubMed is relatively small (37, 25 and 5 interactions, respectively).
Note that 261 miRNAs interact only with a single gene. In comparison to the original
article by Hartmann et al,[8] 6 (missed) interactions could not be found by our approach. For all of these, but
one, a PubMed search does not return any results for interactions or even for the
miRNAs (miR-1843/1935) themselves. The miR-21:CXCR4 interaction evidences are not
included in atheMir, because these occur in an enumeration, and are rejected according
to our text mining rules.
While this gives a general overview over the miRNA–chemokine landscape, there are
more specific processes relevant in atherosclerosis. Restricting the miRNA–gene interaction
search to only such interactions which are found in cardiovascular disease (DOID:1287
[cardiovascular system disease] or DOID:1936 [atherosclerosis]), it becomes apparent
that there are many miRNA–gene interaction not yet studied in an atherosclerotic context.
In contrast to the general interactions, more missed interactions are observed and
only a few chemokines have multiple interactions studied, like CCL2, CXCL10, CXCR4,
and CXCL12.
CCL2 Expression in Macrophages
Inspired by the review by Hartmann et al,[8] we investigate the miRNA-mediated CCL2 expression in macrophages. Searching only
the macrophages context ([Fig. 3], search parameters in [Supplementary Table S5], available in the online version), all but one interaction from the Hartmann et
al[8] review are found. The missed interaction between miR-150:CCL2 can be explained such
that Hartmann et al[8] show this regulation in their network, but state, that this regulation is indirect
via KLF2 and miR-124a, which is found by atheMir.
Fig. 3 The incremented micro-ribonucleic acid (miRNA)-mediated CCL2 expression in macrophages.
The original interactions from the Hartmann et al review[8] (fig. 2) are underlayed.
Fig. 4 (A) A model for the regulation of CCL2 in macrophages via miR-146a and miR-125. Depending
on an over- or underexpression of miR-146a, Toll-like receptor 4 (TLR4) is either
repressed, or regularly expressed. In the first case, the nuclear factor kappa B (NF-κB)-mediated
pathway is reduced, but also less miR-124 represses CCL2. If TLR4 is expressed regularly,
CCL2 is expressed via the NF-κB pathway, but also repressed via miR-124. Assuming
that both paths are equally strong, the expression of TLR4 does not affect CCL2 expression.
This matches the observations made by del Monte et al.[28] (B) A model for the regulation of CCL2 in endothelial cells via miR-216, which is coregulated
via miR-155/221/222 and ETS1 (context information on edges). Literature reports three
paths of regulation for CCL2. ETS1 directly regulates CCL2,[29] but is also a transcription factor upregulating miR-126,[30] which can directly downregulate CCL2.[31] Additionally, miR-126 can also downregulate SIRT1.[32] SIRT1 is an inhibitor of NF-κB,[33] which helps to upregulate CCL2.
Fig. 5 For each miRNA, its number of gene interactions in the atherosclerosis context and
the number of corresponding PubMed evidences is shown. The middle and right block
show a black dot if a miRNA is within the associated pathway or found within the cell
type context.
Applying the atherosclerosis context (DOID:1936) to macrophages, fewer additional
interactions ([Supplementary Figure S8], available in the online version) are found in atheMir. Some interactions are not
found, such as miR-24:CHI3L1, which is only reported in vascular diseases, but not
explicitly in atherosclerosis. This also applies to the missed interaction of miR-146:IRAK1
which is, according to our database, only described in cardiac dysfunction.[37]
We will focus on the additional interactions in a cardiovascular disease or atherosclerosis
context. Some additional interactions are shown in [Supplementary Table S3] (available in the online version).[38]
[39]
[40] First, it is described that lipoprotein lipase (LPL) and CCL2 are both directly
targeted by miR-590.[41]
[42] Repressing CCL2 prevents lipid accumulation, diminishing atherosclerosis. Increased
LPL expression accelerates atherosclerosis by promoting lipid accumulation and inflammatory
response.[41] miR-125b is known to regulate tumor necrosis factor receptor-associated factor 6
(TRAF6)[43] and CCL2 (via LACTB[44]) in atherosclerosis. This miRNA is particularly interesting, since it is normally
depleted in leukocytes and monocytes.[27] Thus, an increase of miR-125b in these cell types could prevent atherosclerosis.
Toll-like receptor 4 (TLR4) can regulate CCL2 expression via nuclear factor kappa
B (NF-κB) as stated in the original review by Hartmann et al.[8] Another interesting path is formed by TLR4→miR-124:CCL2, because miR-124 is naturally
depleted in leukocytes.[27] It is known that TLR4 is repressed via miR-146a (enriched in leukocytes), while
TLR4 is also a regulator of miR-124[45] in cocaine-mediated inflammation. It is also known from the literature that CCL2
is regulated via a miR-124-dependent pathway.[46] Thus, the TLR4→miR-124:CCL2 path could be directly controlled via miR-146a such
that a higher miR-146a expression would lead to less repression of TLR4/miR-124/CCL2,
while the pathway via NF-κB would be repressed ([Fig. 4A]). In summary, a knockout of miR-146a would lead to more CCL2 via the NF-κB pathway,
while the miR-124 pathway is repressed. On the other hand, an overexpression of miR-146a
would lead to less CCL2 repression via TLR4→miR-124:CCL2. Under the assumption that
both pathways are similar effective, miR-146a will likely not influence CCL2 expression
in atherosclerosis. This matches the observations made by del Monte et al,[28] that there is no change in CCL2 levels when disturbing miR-146a.
miRNA-Mediated Inflammatory Response in Endothelial Cells
Additionally, we assessed our results regarding the miRNA-mediated inflammatory response
in endothelial cells ([Supplementary Fig. S7], search parameters in [Supplementary Table S5], available in the online version). Again, we first analyze the network of all known
miRNA–gene interactions and find that there are no interactions missed by atheMir.
Restricting the database search to the atherosclerosis context (DOID:1936, [Supplementary Fig. S9], available in the online version), 10 interactions are not reported in atheMir:
let-7g interacting with SIRT1, SMAD2, THBS1, and TGFBR1, miR-146 interacting with
HuR, TRAF6, and IRAK, and miR-10a interacting with TRC and TAK1 and miR-181b:KPNA3.
Fig. 6 miRNA overlap between stages of atherosclerosis as proposed by the Causal Biological
Networks (CBNs) when restricting the miRNA–gene interaction search to atherosclerosis
and cardiovascular diseases.
Fig. 7 Number of interactions seen in Causal Biological Networks (CBNs)[12] and cell type context. Selected miRNAs appear in most CBNs (in a cardiovascular
disease/atherosclerosis context).
The miR-146:HuR interaction is reported for ELAVL1, the gene symbol for HuR. The miR-181b:KPNA3
interaction is found, but not within an explicit endothelial cell context. An interaction
let-7g:TGFBR1 is found, but not within the atherosclerosis context. Regarding the
other missed let-7g interactions, we checked the original reference.[47] First, SIRT1 is not recognized, because an uncommon symbol SIRT-1 is used, and to
avoid confusions, we enforce that gene symbols are matched without error. SMAD2 and
THBS1 are both recognized, however, the abstract has not triggered a cardiovascular
disease context according to our vocabulary. For miR-146, it must be noted that the
original article[48] cited by Hartmann et al[8] does not mention TRAF6 and IRAK1 in the abstract, but only in the full text. There
exist other abstracts mentioning this interaction, but these do not focus on atherosclerosis.[49] Thus, with the current setup of text mining, the missed interactions are not found
without curation of the underlying ontologies.
We further analyze atheMir's interactions for disturbing CCL2 expression in endothelial
cells and have prepared selected additional interactions ([Supplementary Table S4], available in the online version).[50]
[51]
[52]
[53]
[54] First, we wanted to check results for miR-126 ([Fig. 4B]) as this miRNA targets several of the genes in the pathway explained by Hartmann
et al.[8] ETS1 is a transcription factor for miR-126,[30] and is controlled by miR-155, miR-221, and miR-222,[29] where the latter two are known to be enriched in endothelial cells.[27] If these miRNAs are downregulated, ETS1 expression is promoted. ETS1 itself influences
CCL2 expression in several ways. First, ETS1 can directly coregulate CCL2[29] in an atherosclerotic context. Second, miR-126 is reported to reduce CCL2 expression
in hCMEC/D3 (brain) cells[31] directly. Finally, miR-126 also targets SIRT1 in artery disease[32] which represses the NF-κB pathway and, thus, reduces CCL2 levels.[33] In addition, the THBS1/TGFBR1/SMAD2 path is affected due to miR-126 interactions
with THBS1 in ischemic hind limb.[55] While there is no reported evidence in atherosclerosis via this path, it is known
that miR-126 is important in atherosclerosis, influencing endothelial cell proliferation.[56]
Another interesting factor in endothelial cell activation is miR-98. In contrast to
miR-126, miR-98 has only CCL2 as target here, which was shown in the context of blood–brain
barrier disease.[57] From expression data[27] it is known that miR-98 typically is enriched in endothelial cells of the vascular
tree, reducing oxidized low-density lipoprotein (LDL) uptake and, thus, apoptosis.[58] But since it also represses CCL2, fewer macrophages are attracted to the endothelial
cells. An inhibition of miR-98 could thus increase atherosclerosis.
In conclusion, regarding the original pathways presented by Hartmann et al,[8] we could show that using available literature, combined with miRNA expression data,
most known interactions described by domain experts could be reproduced and linked
to respective evidence. The found miRNAs for these pathways are summarized in a UpSet
plot[59]-like matrix ([Fig. 5]). It can be seen that within the atherosclerosis context many miRNAs only have a
single PubMed evidence. Many miRNAs have been detected in literature corresponding
to multiple cell types. Regarding the endothelial cell pathway from Andreou et al,[9] and the endothelial inflammatory response pathway from Hartmann et al,[8] a huge overlap of miRNAs can be found. Furthermore, additional interactions have
been proposed, which have either been already looked into, or represent mechanisms
known in other disease contexts, and, therefore, could yield new hypotheses for atherosclerosis.
Causal Networks
Similar to the specific chemokine networks, we analyze the six cardiovascular disease
networks from the CBN database.[12] The networks contain nodes of different types of entities. Here, we filter the CBNs
such that only nodes representing genes or proteins are contained.
First, for each network, we match it with the American Heart Association (AHA) atherosclerosis
stages and determine which cells are most active in this stage. This allows to identify
the context keywords describing each stage and, thereby, fine tune miRNA–gene interaction
search in atheMir. Matching the active cell types (and diseases) allows a context-specific
prediction of miRNA–gene interactions. We found that most AHA stages are represented
by exactly one causal network ([Supplementary Table S11], available in the online version), with the exception of the foam cell formation
network, which could be assigned to both stages 2 and 3 for proteolysis and apoptosis.
The analysis of the CBN yields two findings: first, each stage has stage-specific
genes ([Supplementary Fig. S4], available in the online version), and second, the miRNAs involved in each stage
overlap more between stages than the genes. There are only few stages with stage-specific
miRNAs ([Supplementary Fig. S3], available in the online version). Restricting the search to atherosclerosis and
cardiovascular disease, the specificity of miRNAs for a stage does not increase ([Fig. 6]).
The number of identified genes, miRNAs, and augmented interactions is shown in [Supplementary Table S11] (available in the online version). The used search parameters are summarized in
[Supplementary Table S7] (available in the online version). In the following, we discuss our findings from
the six relevant CBNs: we present particularly interesting miRNAs and propose already
validated or hypothetical interactions, also from different disease contexts.
(I) Endothelial Cell Activation and (II) Endothelial Cell–Monocyte Interaction
The endothelial cell activation stage in atherosclerosis is mainly characterized by
inflamed endothelial cells ([Supplementary Fig. S10], available in the online version). According to Andreou et al,[9] several miRNAs play important atheroprotective roles, and others atherogenic roles.
Some of these are also among the most prominent miRNAs in our augmented network. For
instance, miRNAs-17/21/124/125/126/146a/155/221 are among the top 10 regulating miRNAs
in the first and second stage. Some of the miRNAs with a large number of interactions
we found for this stage are not mentioned by Andreou et al in any associated process.
Of these, for instance miR-499 is of interest, because it also regulates CCL2 and
further genes directly associated with the attraction of monocytes, such as VCAM1,
ICAM1, CXCL8, and CCL2.[60]
The interactions of miR-34a in atherosclerosis are also interesting. Some interactions
are already reported in atherosclerosis, such as the interaction with SIRT1. From
the literature it is known that miR-34 represses SIRT1 and thereby regulates apoptosis,[61] which is one of the main mechanisms during the second and third AHA stage. However,
it also interacts with PDGFRA/PDGFRB, MEK1, and CDK4/6, regulating rat mesangial cell
proliferation in glomerulonephritis.[62] Since these genes are also contained in the CBNs for atherosclerosis, these interactions
could be a promising target in atherosclerosis.
Combining the first two stages/networks, miR-155 (31), miR-126 (25), miR-21 (16),
miR-146a (13), and miR-124 (11) have the largest number of targets (in brackets) listed
in atheMir.
(III) Foam Cell Formation
During foam cell formation, other miRNAs play an important role ([Supplementary Fig. S11], available in the online version): miR-21 (24), miR-155 (6), and miR-34a (4) are
the most connected miRNAs.
With 24 found interactions, miR-21 appears to have a central role. Among its target
genes are AKT1, MAPK8, FASLG, PPARA, CXCL2, and pTEN. While Andreou et al do not mention
miR-199a in this stage (or a related process), atheMir predicts miR-199a interacting
with both EGR1 and CD14 in this stage. It has been shown that EGR1 is a strong positive
regulator of miR-199a.[63] In addition, CD14 is coexpressed with miR-199a and is known to regulate CXCL2, IL6,
TNFA, and NO production.[64] miR-370 targets KDR and FOXO1 according to our database. It is known that both genes
can block the AKT/FOXO1 signaling pathway in the context of cerebral aneurysm.[65] The inhibition of FOXO1 is atherogenic, as it leads to increased vascular calcification
in atherosclerosis.[66]
(IV) Smooth Muscle Cell Activation
In the fourth stage of atherosclerosis, SMCs are activated, forming the lipid core
and initiating fibrous cap formation. In addition to SMCs, endothelial cells are involved
in the following.
During SMC activation, the miRNAs with more than 20 targets are miR-126 (31), miR-146a
(30), miR-21 (27), and miR-155 (21). Indeed, for most genes in the enriched causal
network, regulation by many miRNAs has been reported.
The miR-152 interactions are of special interest, because miR-152 is known to be relevant
in atherosclerosis. In addition, we find further interactions in other contexts (diseases).
In our network, miR-152 targets, among others, ESR1, ADAM17, KDR, and VEGFA. It has
been shown that ESR1 expression is reduced via miR-152 repressing deoxyribonucleic
acid methyltransferase. A high level of ESR1 is reported to protect against atherosclerosis.[67] Another target of miR-152 is ADAM17. A high ADAM17 expression is known to decrease
lesion formation[68] and, thereby, functions in an atheroprotective manner. Thus, miR-152 is an interesting
target in atherosclerosis because it can reduce lesion formation via two different
pathways. Furthermore, it regulates apoptosis in brain microvascular endothelial cells
via PTEN and Bax.[69]
(V) Platelet Activation
The fifth stage of atherosclerosis is platelet activation with thrombus formation
from the necrotic core. Compared with the previous stage, the enriched miRNA network
is considerably smaller, however, also interesting ([Supplementary Fig. S14], available in the online version).
miR-20a regulates the AKT1, PTEN, EDN1, VEGFA, and NANOS3. NANOS3 and OLR1 are also
targeted by let-7a/b. OLR1 is also targeted by miR-590, which also represses TP53
and BAX. Interestingly, there is a large gene overlap between the miR-20a and miR-590
targets and the miR-152 targets from the previous stage. In this stage of atherosclerosis,
lipoproteins ABCA1 and LDL receptor are regulated by miR-143. This implies that repressing
miR-143, lipoprotein uptake and necrotic core formation can be reduced.[70]
(VI) Plaque Destabilization
The final stage of atherosclerosis is plaque destabilization ([Supplementary Fig. S15], available in the online version). In this stage, the necrotic core breaks the artery
wall and can form a thrombus.
In this causal network, miR-21 (26) and miR-155 (16) have most interactions. We want
to further investigate the lipid uptake and find that miR-33 interacts with both ABCA1
and ABCG1 in this stage. While the miR-33 interaction with ABCA1 is already known
in atherosclerosis, the interaction with ABCG1 has not yet been reported in atherosclerosis,
specifically. miR-33 contributes to the regulation of cholesterol homeostasis by targeting
both ABCA1 and ABCG1 directly.[71]
Having focused on chemokines earlier, we look at the CCL2 interactions in this stage.
In contrast to earlier stages, CCL2 can also be regulated by different miRNAs, namely
miR-494/495 and miR-10b. It could be shown that miR-494 induces inflammatory mediators,
including CCL2.[72] Additionally, miR-495 directly targets CCL2, affecting proliferation and apoptosis
of human umbilical vein endothelial cells.[73] An effect on CCL2 via miR-499 and the NF-κB signaling pathway has also been reported.[60] Moreover, miR-10b seems to affect CCL2 expression in the context of renal allograft
loss.[74] This could indicate a role of miR-10b in detecting foreign objects near endothelial
cells. Hypothetically, plaque formation could induce similar reactions.
Abundant miRNA Regulators per Stage
After looking at the context-specific roles of individual miRNAs in the six CBN stages,
we also investigate the most abundant miRNAs across all stages. The number of miRNA–gene
interactions for each miRNA is listed in [Supplementary Table S12] (available in the online version). We only consider interactions within the cardiovascular
disease and atherosclerosis context.
Summarizing all stages, we find that the following miRNAs have the largest number
of interactions (interaction counts and PubMed evidence counts in brackets) reported
by atheMir: miR-126 (15, 17), miR-21 (15, 15), miR-155 (14, 8), miR-146a (14, 8),
miR-125b (13, 7), miR-34a (9, 6), miR-499 (8, 2), miR-221 (7, 5), miR-370 (7, 4),
and miR-504 (6, 2).
These miRNAs have a high overlap with the miRNAs appearing in most CBN stages: miR-21
(5), miR-125b (4), miR-370 (4), miR-93 (3), miR-98 (3), miR-125a (3), miR-126 (3),
miR-146a (3), miR-155 (3), and miR-34a (3).
For each of these miRNAs, we evaluate in which CBN stages and cell types it can be
found by how many PubMed evidences. This has been visualized in a parallel set plot
([Fig. 7]). For the above miRNAs, the width of the connections shows the number of evidences
found for the miRNA interacting in the given stages and cell types. It can be seen
that miR-126 is well studied, particularly in endothelial cells and within stages
(I) and (IV). In the following stages, more and more miRNAs become relevant. The foam
cell formation stage has few reported miRNA interactions, yet it consists of miRNA
interactions in foam cells, macrophages, and SMCs. Likewise, the plaque destabilization
stage combines all cell types. In the platelet activation stage, none of the otherwise
frequently occurring miRNAs is important, only miR-98 is active. Particularly miR-21
is mostly active in stages 2 to 4, while miR-125b seems to be involved in all stages.
On the other hand, miR-155 and miR-98 seem to be specifically relevant in SMC activation.
Prevalent Cell Types per Stage
Besides the miRNA interactions per CBN[12] stage, we are interested in the different cell types prevalent per stage. Exemplarily,
we counted the occurrences of cell types for all miRNA–gene interactions in the two
stages: endothelial cell/monocyte interaction ([Supplementary Fig. S5], available in the online version) and SMC activation ([Supplementary Fig. S6], available in the online version).
For the endothelial cell/monocyte interaction ([Supplementary Fig. S5], available in the online version), it can be seen that the most frequent cell types
are monocytes and vascular endothelial cells. However, also other cell types relevant
to atherosclerosis, such as SMCs, inflammatory macrophages, foam cells, natural killer
cells, neutrophils, and platelets, are mentioned. However, also some references to
cell types in the brain (brain microvascular endothelial cells, neoplastic cells)
exist.
Similarly, most miRNAs in the SMC activation stage are occurring in SMCs. However,
there are quite some occurrences in other cells which are involved in atherosclerosis,
such as platelets, endothelial cells, monocytes, and foam cells.
Cell Type-Based miRNA Cooccurrences
miRNAs occurring in more than one cell type within the same stage, could be interesting
targets, because they could hypothetically resemble a similar mechanism in the affected
cell types. We want to explore such cooccurrences again in the stages endothelial
cell/monocyte interaction and SMC activation.
In the cell type cooccurrence figure, two cell types are connected (by a miRNA), if
this miRNA is involved in both cell types. By definition, a miRNA defines a clique
(fully connected network) of cell types it is active in. For the endothelial cell/monocyte
interaction stage ([Supplementary Fig. S17], available in the online version) such cliques are formed. For instance, miR-l45
forms a large clique of mainly SMCs. However, also cliques are formed for foam cells,
microvascular endothelial cells, and monocytes. miR-126 is mainly active within endothelial
cells, but is also detected in monocytes. miR-155 is reported in macrophages and monocytes.
Finally, miR-222 is both described in SMCs and endothelial cells.
Switching to the SMC activation stage ([Supplementary Table S9], available in the online version), more and larger cliques can be found, indicating
that miRNAs are active in multiple cell types. Here, particularly the small cliques
with only a few cell types could be of interest, because these miRNAs are more cell-type
specific. For instance, miR-10a is mentioned in monocytes, endothelial cells, and
inflammatory cells. Monocytes and vascular SMCs have in common that, in both cell
types, miR-33/181 and miR-516a interactions are reported. Also, miR-98 is of interest
because it functions in monocytes, endothelial cells, and SMCs, and thus is involved
in cells relevant for early atherogenesis. In combination with the previous finding
in inflamed endothelial cells, it could be an interesting target for chemokine-mediated
processes in atherosclerosis.
Combining Stages and Processes of Atheroprogression
We focus again on rather broad contexts, namely the eight processes of atheroprogression
described by Andreou et al[9]: endothelial cell activation and inflammation, monocyte differentiation and macrophage
activation, foam cell formation, angiogenesis, vascular remodeling, T cell differentiation
and activation, cholesterol efflux, and SMC proliferation and migration.
We have refined queries for atheMir using cell types and GO terms to find both miRNAs
involved in these processes and their target genes ([Supplementary Table S13], available in the online version). For each process, we determined a range of 2
to 35 relevant miRNAs and 2 to 80 gene targets ([Supplementary Table S10], available in the online version). For T cell differentiation and activation, we
set the disease context to cardiovascular system disease. Additionally we require
the GO classes for SMC migration and proliferation in the (IV) SMC activation stage,
to distinguish this stage from the endothelial cell stages (I) and (II) ([Supplementary Fig. S16], available in the online version).
The miRNAs occurring in at least one CBN stage[12] and their detected presence in the processes as well as cell types is summarized
in [Fig. 8]. In addition, the number of found target genes (interactors) and PubMed evidences
from the combined CBN stage and process analysis are shown. This figure allows to
make several interesting observations: Even though some miRNAs are associated to specific
stages, they are not associated to any defined processes of atheroprogression. Many
miRNA–gene interactions are only supported by one PubMed article. Most miRNAs are
associated to multiple cell types. Only a few miRNAs occur in a majority of the CBN
stages, and many miRNAs are relevant to only 2 or 3 CBN stages. The difference between
the recognized miRNAs in the stages and processes of atherosclerosis shows the limits
of the used NER approach: it relies on the quality and completeness of the synonym
lists, and that authors make use of that vocabulary.
Fig. 8 For each miRNA in the atherosclerosis context, its number of gene interactions, the
number of PubMed evidences, associated Causal Biological Networks (CBN)[12] stages, processes, as defined in [Supplementary Table S13], and cell types, are shown. Overall, there are 114 miRNAs in the CBN stages and
processes, of which 80 are shown here (min. 2 PubMed abstracts and must be in at least
one CBN stage). The top 10 miRNAs are the most interacting ones in most CBN stages
([Fig. 7]).
Thus, it is important to not only rely on one dimension (e.g., GO), because evidences
may be missed during classification. Using further dimensions, such as disease, cell
type, or protein class, can be used to characterize the context of literature, as
has been shown in our evaluation. This underlines the importance of accessing the
underlying evidences. With atheMir, these options can be explored, and evidence can
be accessed, to make informed decisions on the found interactions.
Finally, particularly those miRNAs occurring in few stages and processes could be
interesting targets for further research to determine their role in atherosclerosis,
and prove the specificity of these miRNAs to certain phases of the disease.