Keywords congenital coagulation and platelet disorders - LR-PCR - NGS - high throughput sequencing
Schlüsselwörter hereditäre gerinnungs- und thrombozytenstörungen - LR-PCR - NGS - hochdurchsatzsequenzierung
Hemorrhagic diathesis can be caused by disorders in primary hemostasis (such as platelet disorders) and in secondary hemostasis (e.g. coagulation factor deficiencies). They present a heterogeneous group of bleeding disorders with clinical manifestations ranging from mild to severe.[1 ] The prevalence of the individual disease in the general population is low and strongly influenced by ethnicity and rate of consanguinity.
For many hereditary coagulation and platelet disorders, the diagnostic request focusses on only one specific gene (e.g. F7 or F10 ). The identification of the causative variant of an affected patient does not only confirm the diagnosis but can also influence the therapeutic regimen. Furthermore, targeted testing of at-risk family members becomes feasible.
In the past two decades, Sanger technology was considered to be the “gold standard” for sequencing. The method provides a precise tool for routine molecular diagnostics, but the capacity of this technique in terms of multiplexing and high-throughput analyses is limited. Furthermore, it is known to be cost and time consuming. In July 2016, next generation sequencing (NGS)-based molecular testing became remunerable for routine diagnostics of hereditary coagulation and platelet disorders according to the reimbursement catalogue of Germany‘s statutory health insurance system (EBM).
However, while NGS-based multigene panel analyses achieve high-performance and are suitable for medium to large target regions (≥ 100 kb in settings of less well defined clinical phenotypes), they are not always proficient for the analysis of small genes in which mutations are known to be associated with a specific phenotype.[2 ] Thus, additional strategies for targeted enrichment[3 ] must be considered especially in cases in which smaller target regions are of interest for a limited number of samples and time to diagnosis is critical.
Long-range PCR (LR-PCR) has the advantage that it does not require a customized design by commercial vendors. In this proof-of-principle study, we intended to establish LR-PCR target enrichment for single gene analyses of F7 , F10 , F11 , F12 , GATA1 , TUBB1 , and WAS on a MiSeq platform. This approach proved to be reliable, highly flexible, and also appropriate for even larger target regions such as MYH9 .
Material and Methods
DNA Isolation and Conventional Mutation Analyses
DNA samples of all 43 study probands were extracted with written informed consent according to the German Gene Diagnostic Act from peripheral blood lymphocytes using standard techniques. The coding regions and adjacent exon-intron boundaries (± 50 bp) of the following genes were amplified by PCR:
F7 (Locus Reference Genomic sequence LRG_554; transcript LRG_554t1; number of probands [n] = 6)
F10 (LRG_548, LRG_548t1; n = 5)
F11 (LRG_583, LRG_583t1; n = 3)
F12 (LRG_145, LRG_145t1; n = 5)
GATA1 (LRG_559, LRG_559t1; n = 5)
TUBB1 (LRG_581, LRG_581t1; n = 6)
WAS (LRG_125, LRG_125t1; n = 5)
MYH9 (LRG_567, LRG_567t1; n = 8)
Primer sequences are available upon request. Conventional sequencing was performed on an ABI 3130xl sequencer using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Carlsbad, CA, USA). Sequence data were analyzed with the SeqPilot software (Version 4.3.1, JSI Medical Systems, Ettenheim, Germany). SALSA MLPA Kits P207, P440 and P432 were used for detection of copy number variations in F7 , F10 , F11 , and MYH9 (MRC-Holland, Amsterdam, The Netherlands).
Target Enrichment and High-Throughput Sequencing
Primers for LR-PCRs covering the entire genomic regions of F7 , F10 , F11 , F12 , GATA1 , TUBB1 , WAS , and MYH9 were designed with Primer3 (v4.0.0; http://bioinfo.ut.ee/primer3/ ; [Table S1 ]). PrimeSTAR® GXL DNA polymerase (Takara Bio Europe/ Clontech, Saint-Germain-en-Laye, France) was used for PCR amplification according to the manufacturer̀s instructions. The upstream region of MYH9 was amplified with the GC-RICH PCR System (Hoffmann-La Roche, Basel, Switzerland). All LR-PCR products were purified with Agencourt® AMPure® XP system (Beckman Coulter, Pasadena, USA) and quantified with the dsDNA HS Assay Kit on a Qubit® 2.0 (Thermo Fischer Scientific, Waltham, USA).
Equimolar amounts of the purified PCR amplicons were combined in 14 pools containing non-overlapping target regions of up to seven individual probands. The Nextera XT kit was used to prepare sequencing libraries with 1 ng of each pool (Illumina®, San Diego, USA). Individually barcoded DNA libraries were combined and sequenced on a MiSeq instrument with 2 × 150 or 2 × 250 cycles (Reagent Nano Kit v2 or Reagent Kit v3; Illumina®, San Diego, USA).
Bioinformatics Analyses
The MiSeq Reporter Software was used for demultiplexing and primary data analysis (MSR v2.5.1; Illumina®, San Diego, USA). The quality of sequencing reads was analyzed with the FASTQC toolkit (http://www.bioinformatics.babraham.ac.uk/index.html ) and sequencing coverage was visualized with the GVIZ package for R software (https://www.r-project.org/ ). The SeqNext module of SeqPilot software (JSI Medical Systems) was used for read alignment against the human reference assembly GRCh37/hg19 and variant calling. Only coding regions, exon-intron boundaries (± 50 bp), and known promoter regions of the genes were analyzed (diagnostic region of interest). 30 × was defined as the minimum diagnostic sequencing depth.
Results
Fast and Comprehensive Enrichment of Various Genomic Target Regions with LR-PCR
With only minor optimization of the reaction conditions, the entire genomic loci of F7 (14.9 kb), F10 (26.7 kb), F11 (23.8 kb), F12 (14.4 kb), GATA1 (7.8 kb), TUBB1 (7.4 kb), and WAS (14.8 kb) could be successfully amplified by LR-PCR. Only one or two PCR amplicons were necessary for a specific and highly efficient enrichment of each gene ([Fig. 1 ], [Table S1 ]). Target enrichment, purification of the PCR products, pooling, and library preparation for 35 probands were completed in only two regular working days. Due to its large size of 106.8 kb, the entire genomic region of the MYH9 gene was independently amplified without any PCR dropouts for another eight probands with eight overlapping LR-PCR amplicons.
Fig. 1 Comprehensive coverage and characteristic, gene-specific patterns of sequencing depth for the entire genomic target regions. The genomic loci of F7 , F10 , F11 , F12 , GATA1 , TUBB1 , WAS and MYH9 were amplified with 17 LR-PCR amplicons and separated by gel electrophoresis. The robust and specific amplification of all amplicons is shown for a healthy control (upper left subpanel). Representative gene-specific patterns of read depth across the entire genomic regions are shown as coverage plots (darkblue). Horizontal bold and broken lines indicate a read depth of 6000 × and 3000 ×, respectively. Chromosome ideograms with the cytogenetic locations of the genes (bold green lines) are depicted in the upper part of each subpanel. Exon-intron structures of their respective reference transcripts are shown in the lower parts. The size of its entire genomic region is given below each gene. kb = 1000 base pairs.
High-throughput sequencing of the pooled DNA libraries produced an average output of 1.9 × 106 sequencing reads (SD: 0.3 × 106 ) and consistently high mean coverages for the specific target regions (Min: 837 ×, Max: 3434 ×, SD: 578 ×; [Fig. 2 ]). Interestingly, characteristic patterns of sequencing depth were found for each gene after read alignment ([Fig. 1 ]). These unique patterns were recapitulated in all probands tested for the respective genetic loci ([Fig. S1 ]). The NGS data of the pooled DNA libraries were also checked for hints of PCR cross-contaminations, but significant read depths (>10 reads) were only seen in regions previously enriched by PCR.
Fig. 2 Quality parameters of NGS analysis. The number of mapped (green) and unmapped reads (red; left y-axis) are shown for all 14 DNA sequencing libraries (Pool 1–14) that compromise amplified target regions of up to seven individual probands. Mean region coverage depths are depicted as black rectangles (right y-axis). Libraries sequenced with 2 × 150 cycles (1–6) or with 2 × 250 cycles (7–14) are separated by a dashed line.
Broad Coverage of Coding Regions and Exon-Intron Boundaries
The mean coverage for the coding regions, canonical splice sites, exon-intron boundaries (± 50 bp) and known promoter regions were:
F7 : 1292 ×,
F10 : 1687 ×,
F11 : 2583 ×, and
F12 : 1455 ×.
The sequencing depths of thrombocytopenia-associated genes were also sufficiently high:
GATA1 : 2175×
TUBB1 : 2768×
WAS : 1928×
MYH9 : 2420×
Overall, 94 % of the cumulative target region was covered with a sequencing depth of more than 30 × (180.6 of 191.6 kb). NGS achieved complete coverage for the diagnostic target regions of F10 , F11 , F12 , GATA1 , and TUBB1 . Only a single exonic gap in exon 10 of the WAS gene (coverage < 30 ×; LRG_125t1: c.932_1338) was consistently seen in all five probands sequenced for this genomic locus ([Fig. 1 ]). A minor sequencing problem with a mean coverage of 46 × was also found in exon 36 of MYH9 (LRG_567t1: c.5062_5150). Both regions have a high GC-content of nearly 70 %. A complete sequencing dropout was observed for only one of the 107 LR-PCR products sequenced in this study (dropout rate = 0.9 %). This single PCR amplicon was included and completely covered in a second sequencing run.
Reliable, Sensitive and Specific Mutation Detection by NGS
The main purpose of this study was to establish a novel NGS-based screen for missense, nonsense, splice and small frameshift mutations. For an evaluation of the analytic sensitivity and specificity of this approach, DNA samples of 43 probands previously screened with conventional Sanger sequencing were re-analyzed with LR-PCR and NGS. All 25 known pathogenic or likely pathogenic variants ([Table 1 ]) and 128 polymorphisms in the eight analyzed genes were re-identified.
Table 1
Distinct pathogenic and likely pathogenic variants included in this study. Variants identified in more than one proband are only listed once
Gene
Nucleotide Change
Protein Change
Reference
F7
c.911C > T
p.(Ala304Val)
Tamary et al. 1996[18 ]
c.1061C > T
p.(Ala354Val)
Bernardi et al. 1994[19 ]
c.1391delC
p.(Pro464Hisfs*32)
Arbini et al. 1994[20 ]
c.64 + 430_131–6delinsTCGTAA
p.?
Rath et al. 2015 [21 ]
F10
c.413A > T
p.(Gln138Leu)
Rath et al. 2015[21 ]
c.979C > T
p.(Arg327Trp)
Millar et al. 2000[22 ]
c.1097G > A
p.(Arg366His)
Rath et al. 2015[21 ]
c.1159C > T
p.(Arg387Cys)
Hermann et al. 2006[23 ]
c.1247A > C
p.(Gln416Pro)
unpublished
deletion of exon 6 [c.(502 + 1_503–1)_(747 + 1_748–1)]
p.?
Hainmann et al. 2009[24 ]
F11
c.644_649delTCGACA
p.(Ile215_Asp216del)
Zadra et al. 2004[25 ]
c.803G > A
p.(Arg268His)
Duncan et al. 2008[26 ]
F12
c.-57G > C
p.?
Hofferbert et al. 1996[27 ]
c.-62C > T
p.?
Lombardi et al. 2008[28 ]
c.1381G > A
p.(Asp461Asn)
Schloesser et al. 1997[29 ]
c.1668delC
p.(Asp557Metfs*107)
unpublished
c.1681–1G > A
p.?
Schloesser et al. 1995[30 ]
GATA1
c.622G > A
p.(Gly208Arg)
Del Vecchio et al. 2005[31 ])
WAS
c.101delG
p.(Arg34Hisfs*11)
unpublished
c.256C > T
p.Arg86Cys
Kolluri et al. 1995[32 ]
As no false-positives were detected in the regions of interest, analytical sensitivity and specificity of this NGS approach were calculated with 100 % each.
Interestingly, two large deletions previously found using multiplex ligation-dependent probe amplifications (MLPA) could be re-identified by analyses of the gene-specific coverage patterns. A homozygous deletion spanning exon 2 of the F7 gene (proband P5; [Fig. S1 ]) and a heterozygous deletion of exon 6 of the F10 gene (P7; [Fig. 3 ]) were consistently detected. NGS sequence analysis gave a rather precise estimation of the respective size of the F7 [4.35 kb; c.64 + 430_131–6delinsTCGTAA] and F10 [approximately 4.8 kb; c.(502 + 1_503–1)_(747 + 1_748–1)] deletions but could not fine-map the breakpoints down to single nucleotide level.
Fig. 3 Identification of a known exon-spanning heterozygous deletion in F10 . Analysis of the F10 coverage pattern of proband P7 (A) compared with the characteristic control pattern (B) revealed an approximately 4.8 kb spanning heterozygous deletion (indicated by red broken lines) which includes exon 6 of F10 and part of the flanking introns. The exon-intron structure of the reference transcript is shown below. Result of F10 MLPA analysis for P7 also shows the heterozygous deletion of exon 6 (red arrow). Green bars indicate normalized gene dosages of P7 compared with three healthy controls (grey bars). F10 -specific probes are depicted on the left, reference probes on the right.
Discussion
The implementation of NGS in most laboratories has already changed standard workflows. Nevertheless, NGS can also be challenging for diagnostic laboratories. Its high sequencing capacity often implies the economic need for parallel analyses of large target regions or a higher number of patients. These requirements can usually be best met with capture-based NGS multigene panels. Hybridization probes minimize the risk of allele dropouts, can be easily combined and are perfectly suitable for multiplexing. Technically, all genes tested in this study could also have been covered with comprehensive capture-based panels. However, if a diagnosis can most likely be confirmed by analysis of a single small disease gene such as F7 or F10 , additional focussed strategies are more cost-effective.
As an alternative to time-consuming conventional Sanger sequencing, LR-PCR combined with NGS is advantageous in this context since PCR primers are less expensive than hybridization probes and PCR reactions can be realized with standard laboratory skills and equipment. Additionally, there is only a limited need for PCR optimization,[4 ] and there are a lot of safety points in the workflow which allow repetition of a failed reaction without extensive additional costs. Targeted single gene analyses by LR-PCR also reduce the risk of incidental findings and have higher enrichment specificities for targets with pseudogenes or repetitive regions compared to capture-based multigene panels. Analogous strategies for the enrichment of en bloc genomic target regions have been described not only for BRCA1 - and BRCA2 -associated hereditary cancer predisposition syndromes and HLA genotyping[5 ]
[6 ]
[7 ] but also for molecular analysis in hemophilia A.[8 ]
[9 ]
Our study further expands the spectrum of genes that can be efficiently analyzed by LR-PCR amplification and NGS in a diagnostic context.
With an analytic sensitivity and specificity of 100 %, our approach proved to be highly reliable. Furthermore, it is also quite flexible as different target regions can be individually combined. Sequencing gaps or regions with a relatively low read depth that were observed in this study in WAS and MYH9 are mainly restricted to regions with a high GC-content. Those regions are known to cause trouble in NGS sequencing.[10 ] Currently they still need to be completed by Sanger sequencing.
Obviously, the specific assay for the eight genes studied here might not be suitable for all diagnostic laboratories. However, it can be individually adapted and serve as a valuable addition rather than as an alternative to existing NGS multigene panels. Even combined sequencing with DNA samples enriched with other panels is feasible and further decreases sequencing costs.
Of course, PCR-based enrichments are always hampered by the risk of allele dropout due to sequence variants in primer binding sites or large deletions that can result in a selective amplification of a single allele.[11 ]
[12 ] For example, we and others have noticed difficulties in conventional DNA sequencing of exon 2 of MYH9 with a biased amplification due to a poly-guanine repeat.[13 ] Thus, several optimization steps may be necessary for each PCR to find the best primer binding sites. Nonetheless, one has to keep in mind that allele dropout could never be completely excluded and is hard to detect by Sanger sequencing because heterozygous SNPs are often missing in amplified regions due to their small size. LR-PCR significantly increases the chance of finding a heterozygous SNP in the genomic target region that can be used to exclude an allele dropout. Therefore, a screen for heterozygous positions has to be an integral part of data analysis if PCR-based enrichment strategies are chosen.
Multiplex ligation-dependent probe amplification (MLPA) analysis still is the method of choice for copy number detection in routine diagnostics. This technique requires high quality DNA of patients as well as controls and a separate laboratory workflow.[14 ]
The LR-PCR approach described here can replace the standard method to speed up the overall diagnostic process in particular in male patients with X-linked diseases such as GATA1- and WAS -associated disorders.
A hemizygous intragenic deletion can easily be detected with NGS especially in such small genes that are covered with only one LR-PCR amplicon. Even breakpoint mapping to the approximate location within the amplicon is possible in men. But also in female carriers of X-linked diseases and autosomal disorders, a heterozygous intragenic deletion can be detected due to the unique gene specific NGS pattern ([Fig. S1 ]). In these cases LR-PCR might assess the size of a deletion more accurately than MLPA.
One further advantage of a LR-PCR strategy covering the entire genomic region of a specific gene of interest is the possibility to directly search for deep-intronic mutations in patients that had remained mutation-negative after sequencing of the coding regions and invariant splice sites of the respective disease-associated gene. However, this procedure is not remunerable in a diagnostic setting according to the reimbursement catalogue of Germany‘s statutory health insurance system (EBM) and can therefore be performed only in a research context.
International research collaborations such as the ThromboGenomics Consortium or the BRIDGE-BPD study have been formed to unravel the genetic basis in probands with congenital bleeding and platelet disorders. They apply comprehensive NGS multigene panels which work well for rather clear phenotypes.[15 ]
[ 16 ] For cases without a clear phenotypic lead, whole exome or genome sequencing might be more successful.[17 ]. However, this requires a careful clinical characterization of family members, their participation after informed consent including the management of incidental findings and a well-established bioinformatics pipeline.
Conclusion
Taken together, our LR-PCR approach combined with NGS resulted in a success rate of 100 % for the re-identification of variants in eight different exemplary genes involved in congenital coagulation and platelet disorders. In addition to multigene panels which are particularly useful for disorders with genetic heterogeneity, it appears to be a practical alternative method to Sanger sequencing for molecular diagnostics of any small gene to confirm a specific clinical diagnosis.
What is known about this topic?
Molecular testing of genes involved in congenital coagulation and platelet disorders has been routinely done by Sanger sequencing.
NGS-based molecular testing became remunerable for routine diagnostics of hereditary coagulation and platelet disorders according to the reimbursement catalogue of Germany‘s statutory health insurance system (EBM).
What does this paper add?
We here present a new method to transfer mutation analyses of genes involved in congenital coagulation and platelet disorders from Sanger to NGS-based sequencing.
The LR-PCR strategy covering the entire genomic region of a specific gene of interest raises the possibility to directly search for deep-intronic mutations in patients that had remained mutation-negative after sequencing of the coding regions.