Common and Rare Variants in Genes Associated with von Willebrand Factor Level Variation: No Accumulation of Rare Variants in Swedish von Willebrand Disease Patients

Genome-wide association studies (GWASs) have identified genes that affect plasma von Willebrand factor (VWF) levels. ABO showed a strong effect, whereas smaller effects were seen for VWF , STXBP5 , STAB2 , SCARA5 , STX2 , TC2N , and CLEC4M . This study screened comprehensively for both common and rare variants in these eight genes by resequencing their coding sequences in 104 Swedish von Willebrand disease (VWD) patients. The common variants previously associated with the VWF level were all accumulated in the VWD patients compared to three control populations. The strongest effect was detected for blood group O coded for by the ABO gene (71 vs. 38% of genotypes). The other seven VWF level associated alleles were enriched in the VWD population compared to control populations, but the differences were small and not significant. The sequencing detected a total of 146 variants in the eight genes. Excluding 70 variants in VWF , 76 variants remained. Of the 76 variants, 54 had allele frequencies > 0.5% and have therefore been investigated for their association with the VWF level in previous GWAS. The remaining 22 variants with frequencies < 0.5% are less likely to have been evaluated previously. PolyPhen2 classified 3 out of the 22 variants as probably or possibly damaging (two in STAB2 and one in STX2 ); the others were either synonymous or benign. No accumulation of low frequency (0.05–0.5%) or rare variants (<0.05%) in the VWD population compared to the gnomAD (Genome Aggregation Database) population was detected. Thus, rare variants in these genes do not contribute to the low VWF levels observed in VWD patients.


Introduction
The von Willebrand factor (VWF) is a multimeric glycoprotein that has a key role in hemostasis and shows widely varying plasma levels in the normal population.
Quantitative deficiencies in VWF arise from changes in its biosynthesis, secretion, and clearance. 1 Low VWF levels are associated with an abnormal bleeding phenotype, von Willebrand disease (VWD). Type 1 VWD is the least serious type, accounting for approximately 70% of diagnosed cases. It is defined as a partial deficiency of functionally normal VWF and in many cases shows dominant inheritance. Most severe cases of type 1 VWD are caused by mutations in the VWF gene, and often patients with moderately reduced levels of VWF do not have a mutation in the VWF gene. 2 A previously published genome-wide association study (GWAS) identified eight genes that contribute to plasma level variation of VWF. 3 The ABO gene showed the strongest effect, but smaller effects were seen for the VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M genes. Several exome-wide association studies and GWASs have since investigated the contribution of nucleotide variation to plasma level of VWF and replicated the aforementioned results. [4][5][6] The study by Huffman et al 4 used chip-based genotyping of common and low-frequency variants and identified additional loci within previously reported genes that had effect sizes much larger than, and independent of, previously identified common variants. A rare STAB2 variant (rs141041254) had an effect size on the VWF level that was more than 10-fold larger than the previously reported common variant. 4 The study by Sabater-Lleal et al 5 identified an additional 11 gene associations with the VWF level, all in common variants and with modest effect sizes. Overall, the top variants for all loci independently associated with the level of VWF explained 21% of the variance in the VWF level.
Additional contributions to VWF level variation may come from rare variants that are not captured by the previously used marker arrays. To search for such variants, a population of VWD patients showing decreased levels of VWF was sequenced to screen comprehensively for genetic variation in the coding regions of the eight genes previously associated with VWF level variation. The underlying hypothesis is that these genes may harbor rare variants in addition to the common variants in some of these individuals with low VWF levels.

Study Population and von Willebrand Disease Phenotyping
The VWD study population was recruited from the Department for Coagulation Disorders, Malmö University Hospital (Malmö, Sweden). The population consists of consecutive patients and their relatives who have attended the clinic between the years 1988 and 2005, corresponding to approximately 1,000 individuals belonging to 127 families. This population represents the majority of all the families diagnosed with VWD in Sweden during this time period. Clinical and laboratory data were recorded for each patient, and their bleeding phenotypes were classified. 7 We used archival VWF levels usually determined at the time of the original diagnostic work-up. There were no further analyses of VWF levels in this study. Therefore, different phenotypical methods have been used. VWF activity was measured with the traditional VWF: RCo (ristocetin cofactor) method based on aggregation of platelets or an automated VWF:RCo assay based on the BCS coagulation analyzer (Siemens Healthcare Diagnostics, Marburg, Germany) using the BC von Willebrand reagent. VWF antigen levels (VWF:Ag) were measured using electroimmunoassay (the Laurell method), and IRMA (immunoradiometric assay), ELISA (enzyme-linked immunosorbent assay), or LIA (line immunoassay). Previous DNA sequencing identified type 2 VWD mutations in 20 patients among the 127 patients initially diagnosed with VWD. 8 Of the remaining 107 VWD patients, 104 were analyzed in this study. Of the 104 patients, 72 had a family history of bleeding in addition to their bleeding phenotypes. The remaining 32 patients were individual patients with bleeding and low VWF levels. In a strict sense, not all index cases fulfilled the modern definition of type 1 VWD, but at the time of diagnosis, their bleeding symptoms in combination with lowered VWF levels were interpreted as reflecting type 1 VWD. Two Swedish control populations were also analyzed: control population 1 (C 1 ) consisting of 192 individuals from the general population 9 and control population 2 (C 2 ) consisting of 288 unrelated male individuals with no history of bleeding from the general population. 10 This study was approved by the Ethics Committee of the Medical Faculty, Lund University, and the Swedish Data Inspection Board. Written informed consent was obtained from all patients. DNA from human whole blood was isolated using a Qiagen Blood DNA kit (Qiagen, Hilden, Germany), and DNA concentrations were determined by fluorometry using PicoGreen (Molecular Probes, Eugene, Oregon, United States).

Ion Torrent Sequencing
The primer sets were designed using Ion AmpliSeq Designer to include all exonic, 5′UTR (5′ untranslated region), and flanking intronic sequences (http://www.ampliseq.com, pipeline version 2.2.1). The multiplex primer pools were optimized such that the primers for systems with low read depths in a specific pool were added to the other pool, excluding overlapping systems. The library preparation was achieved using the Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific, Waltham, Massachusetts, United States) according to the manufacturer's protocol. The amplicons were barcoded using Ion Xpress Barcode Adapters (Thermo Fisher Scientific). Purification of the library was achieved TH Open Vol. 4 No. 4/2020 using Agencourt AMPure XP reagent beads (Beckman Coulter Inc., Brea, California, United States). Library amplification was performed, and further purification steps were achieved using Agencourt AMPure XP reagent beads before elution of the final library. The library concentrations were determined by capillary electrophoresis using a Fragment Analyzer (Advanced Analytical Technologies, Ankeny, Iowa, United States) and a High Sensitivity NGS Fragment Analysis Kit (Advanced Analytical Technologies). Library normalization was performed to a concentration of approximately 50 pM before being pooled together. An amplification reaction was prepared according to the manufacturer's protocol and transferred to an Ion PGM Hi-Q View reaction filter (Thermo Fisher Scientific) before emulsion polymerase chain reaction was performed on an Ion OneTouch 2 instrument (Thermo Fisher Scientific). Enrichment of the ion sphere particles was performed on an Ion OneTouch ES instrument using Dynabeads MyOne Streptavidin C1 beads (Thermo Fisher Scientific). The sequencing process was carried out on an Ion PGM sequencer (Thermo Fisher Scientific) using Ion 316 chip V2 (Thermo Fisher Scientific). This allowed simultaneous analysis of eight samples at the coverage presented in this study. The loaded chip was sequenced using a 400-bp sequencing protocol with 850 flows of single nucleotides.

Bioinformatic Analysis
The generated raw data was processed by the Ion Torrent Suite Software v5.0.5 (Thermo Fisher Scientific). The sequences were aligned to the Homo sapiens hg19 reference genome and stored as BAM files. The frequency parameter for variant calling was set to 0.25, generating VCF files for each library containing single-nucleotide variants (SNVs), small insertions and deletions (indels), and putative mutations. The generated VCF files were merged and multiallelic sites split to create a database containing all called variants. Annotation of the database VCF file was achieved through the Ensembl Variant Effect Predictor, and each variant was classified following the American College of Medical Genetics and Genomics guidelines using VarSome. Evaluation of detected variants of each library was performed in parallel using MuCor: Mutation Aggregation and Correlation. The non-Finish European population of gnomAD (Genome Aggregation Database) and the SweGen population were used to compare allele frequencies.
The Genotype Tissue Expression (GTEx) database was analyzed for changes in expression relating to the common variants associated with VWF level variation.

Frequencies of Common Variants Associated with von Willebrand Factor Level
Several GWASs have identified and replicated a set of common variants associated with the VWF level in eight genes: ABO, VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M. 3,5,11 The ABO blood group gene showed by far the strongest effect on the VWF level in all previous studies. Since VWF levels of individuals with blood group O are reduced by 25% in comparison to non-O individuals, blood group O is a very strong contributor to low VWF levels and is also more common in type 1 VWD populations in comparison to type 2 VWD and normal populations. 12 In our VWD population, the null allele of ABO shows the strongest deviation from the expected allele frequency: 84% in the VWD population compared to 62, 60, and 63%, respectively, in the three control populations (►Table 1; ►Supplementary Table S1). This difference is highly significant, and it means that the O blood group is present in 71% of individuals compared to the Swedish national average of approximately 40% (refer to ►Supplementary Table S1 for details on all ABO alleles and genotypes). The VWF level associated alleles of the remaining seven genes were Table 1 Comparison of allele frequencies between VWD and local control (C 1 and C 2 ): SweGen and the non-Finnish European gnomAD populations

Gene
Variant Risk allele frequency (%) SweGen VWD-SweGen enriched by 2 to 7% in the VWD population compared to two local control populations of 192 and 288 individuals (C 1 and C 2 populations, respectively). They were also enriched compared to SweGen and gnomAD populations for six out of seven genes (►Table 1). However, the observed differences were not particularly large with enrichments between 1% and 7% for the different combinations of populations nor were they significant. Many of these variants are reported in the GTEx as being associated with changes in the expression level of the respective genes (►Table 1 and ►Supplementary Table S2).
The genotyping produced data for all variants in >98% of individuals.

Rare Variants in von Willebrand Factor Level Associated Genes
The coding sequences of ABO, VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M were screened for common and rare genetic variants by Ion Torrent sequencing of 104 patients from the historic VWD population. The patients in this population had bleeding symptoms in combination with lowered VWF levels, and we hypothesized that eventual rare variants with larger effects on the VWF level could be enriched in this population. The sequencing detected a total of 146 variants in the coding sequences of the eight genes. Excluding 70 variants in VWF (described in detail in the study by Manderstedt et al), 8 76 variants remained. There were a total of 19 variants in ABO: eight missense, two frameshift, and nine synonymous. Of the 19 variants, 15 had a minor allele frequency (MAF) of >5% in gnomAD, and the majority of these showed large allele frequency differences when comparing VWD and gnomAD populations, reflecting the large haplotype frequency differences present for ABO. Three of the four variants with MAFs < 5% in gnomAD were synonymous, and the fourth was a likely benign missense variant. Using PolyPhen2, three out of eight missense variants were predicted to be possibly or probably damaging (►Tables 2 and 3). They were all common variants with MAFs at $20%. The remaining six genes contained a total of 57 variants. The numbers of common and rare variants detected for the individual genes are shown in ►Table 2. Twenty-nine var-iants were detected in STAB2, a clearly higher number of variants compared to the other five genes, which had between 3 and 10 variants. Of the 57 variants, 19 had an MAF of >5% and 20 variants had an MAF of <0.5% in gnomAD. There were almost equal numbers of synonymous and missense variants (27 and 29, respectively). Using VarSome to predict the function of the 57 variants, an absolute majority (43) were found to be benign or likely benign, whereas only 14 were predicted to be of uncertain significance, and none was predicted as pathogenic (►Table 3). When using PolyPhen2 instead, 7 out of 29 missense variants were predicted to be possibly or probably damaging. Of these, five were in STAB2 and two in STX2. No variants had significantly different MAFs when compared between VWD and gnomAD populations. All tests for accumulation of low frequency (0.05%-0.5%) or rare variants (<0.05%) in the VWD population compared to the gnomAD population were insignificant (►Table 4). To investigate for Swedish-specific allele frequencies, SweGen and gnomAD populations were compared for their numbers of low frequency or rare variants with an insignificant result (►Supplementary Table S3).

Read Depth and Coverage
The eight genes were analyzed using an AmpliSeq panel targeting all exons and flanking intronic regions. The coverage was close to 100% for all genes except for ABO and STX2, where single exons were missing due to low-yielding primer systems, and TC2N, where a single exon was missing due to a failed primer design (►Supplementary Fig. S1). All variants with MAFs > 5% in gnomAD were detected in this study, except for single variants in each of STX2 and TC2N that were in the missing exons of those genes. The average read depth varied between 176 and 570, with a read depth of >300 for seven of the eight genes (►Supplementary Table S4; ►Supplementary Fig. S1). The average strand bias was <10% for all genes. All reported low frequency and rare variants were supported by >100 reads. The coding sequences of exons 1 to 3 and exons 5 to 7 of CLEC4M had been sequenced previously using Sanger sequencing, 13 as had all exons of STXBP5. 14 There was complete concordance between the sets of variants detected by Sanger and Ion Torrent sequencing for all individuals.

Discussion
This study investigated 104 patients from a historic VWD population with lowered VWF levels for common and rare variants in ABO, VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M genes. Common variants in these genes had initially been reported by Smith et al 3 to be associated with the VWF level in a large GWAS and we hypothesized that the VWD patients in this study would show the same associations. This was indeed the case, as all eight VWF level associated alleles were enriched in the VWD population compared to two local control populations and for seven out of eight genes compared to SweGen and gnomAD populations. The observed allele frequency enrichments for the three combinations of populations were largest for the ABO variant, with an enrichment of approximately 20%. The       enrichments detected for the remaining seven genes were considerably smaller, varying between 2 and 7% for the local control populations. The allele frequency difference detected for the ABO variant was significant, whereas differences detected for the other variants were not. These results are comparable with the ones reported by Sanders et al, 15 who found a similar pattern of enrichment in their VWD population. This indicates that there is a general enrichment of these GWAS-defined VWF level associated alleles among type 1 VWD patients or low VWF individuals and that our VWD population also shows this pattern. The Ion Torrent sequencing of ABO, VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M in 104 patients from our historic VWD population detected a total of 146 variants in the coding sequences of the eight genes. VWF harbored 70 of these, of which many were mutations and have been described in detail previously. 8 Of the remaining 76 variants, 54 had allele frequencies > 0.5% and a majority of these have therefore likely already been investigated for their association with the VWF level in later GWAS also investigating low-frequency variants in these and other genes. 4,6 The rs141041254 STAB2 variant associated with VWF level variation and identified by Huffman et al 4 was not found in this study, but it has been further investigated along with common variants in STAB2. 16 The 22 variants with frequencies < 0.5% are less likely to have been previously evaluated in the present context, even though all but one is present in gnomAD. According to VarSome, 12 of these had an uncertain significance and 10 were benign or likely benign. PolyPhen2 classified 3 out of the 22 variants as probably or possibly damaging (two in STAB2 and one in STX2); the others were either synonymous or denoted benign. In comparison, MutationTaster classified 7 out of the 22 variants as deleterious. In addition, the testing for accumulation of low frequency or rare variants in the VWD population compared to gnomAD was not significant for any of the seven genes. We have therefore rejected our hypothesis that rare variants with larger effects on the VWF level are accumulated in these genes, and even if a few of the rare variants detected in this study really have an effect on the VWF level, their overall contribution to disease in the VWD population must be very limited indeed.
Our study should be interpreted within the context of its limitations. First, our study population is rather small, thereby limiting the absolute number of chromosomes interrogated for rare variants and consequently also limiting our ability to detect rare variants affecting the VWF level. Our population consists of a mixture of type 1 VWD patients and patients with a low VWF phenotype who do not fulfill modern criteria for type 1 VWD. However, the selection of individuals without causal VWF mutations, but with low VWF levels, may rather increase the likelihood of finding rare variants affecting VWF level in those individuals. Second, the absence of detected rare variants in this study may be caused by an inability of our sequencing methodology to detect variants that are present in the population. We would like to argue against this since the average read depths were >300 for seven of the eight genes, the coverage was high with only very few regions without coverage, and there was a complete concordance between the sets of variants detected by Sanger and Ion Torrent sequencing in CLEC4M and STXBP5 for all individuals. 13,14 Thus, we believe that the lack of accumulation of rare variants and the almost total absence of rare variants predicted to affect protein function in our VWD population is true. None of the genes, except the VWF gene itself, harbor large numbers of rare variants affecting the amount of VWF, ultimately giving rise to VWD. Obviously, we cannot completely exclude the fact that there might exist single rare variants in any of these genes that can have a major effect on the VWF level, but we think it is reasonable to argue that a large number of such variants do not exist.
This leaves us with the following explanatory model for the VWF level variation achieved by genetic variants in the eight genes. The VWF itself harbors both a large number of bona fide mutations and a common haplotype containing SNVs c.2365A > G and c.2385T > C affecting VWF biosynthesis and clearance. 2,17 The rs1063857 (c.2385T > C) variant analyzed in this study and the rs1063856 (c.2365A > G) are reported to be in strong linkage disequilibrium and influence VWF levels independently. 17 The low VWF allele of rs1063857 was enriched in our VWD population by approximately 5% compared to all three control populations, which is in agreement  18 In this study, the null allele of ABO showed the strongest deviation from the expected allele frequency: 84% in the VWD population compared to 62, 60, and 63%, respectively, in the three control populations. No contribution from rare variants could be detected for ABO in our VWD population since only two synonymous variants were detected with MAFs < 0.5%. STXBP5, STX2, and TC2N are most likely involved in the synthesis and exocytosis of VWF from endothelial cells. Thus, it is likely that common variants associated with a decreased VWF level are either associated with lower expression levels giving rise to fewer of these protein molecules or else they hamper the function of the proteins. In addition, the GTEx database reports that the VWF-associated common alleles in these genes are indeed associated with decreased expression levels of all three of the investigated genes, which is compatible with our observations in the VWD population. It is likely that common and rare variants would work in the same direction in these genes. Rare loss-of-function alleles would therefore be accumulated among patients with a bleeding phenotype. Previous analysis of STXBP5 revealed two benign rare variants and a common missense variant (rs1039084) with a slightly increased allele frequency compared to controls. 14 STX2 had two rare missense variants, which were possibly damaging, and one common missense variant, which was benign. TC2N had four benign missense variants with the same allele frequencies as the control populations. Thus, none of these genes show accumulation of rare variants with damaging effects.
STAB2, SCARA5, and CLEC4M are receptors that operate through binding and eliminating VWF in the bloodstream. It is likely that the more of these molecules and the more effectively they bind VWF, the lower the VWF levels will be. This means that common variants associated with the VWF level likely function through more effective binding or higher expression levels giving rise to more of these proteins. Rare mutations within these genes would have to be of a gain-offunction type to produce the phenotype typical of VWD. Since this type of mutation appears much less frequently than loss-of-function type mutations, it is highly unlikely that there would be an accumulation of such mutations. The VWF level association observed for CLEC4M was speculated to depend either on a missense mutation (rs2277998) or on the existence of heterozygosity for the tandemly repeated sequence present in exon 4. 13 Two rare variants were observed in the VWD population: one synonymous and one benign missense variant. Neither is a likely contributor to a low VWF phenotype. The common variant of SCARA5 reported by Smith et al 3 was only slightly enriched in our VWD population compared to two of the control populations. This gene had two benign missense variants and one synonymous rare variant and showed no allele frequency differences for the remaining seven variants. Excluding VWF, STAB2 was the most variable of the genes investigated in this study. It had five probably or possibly damaging rare missense variants but did not show any accumulation of rare variants compared to gnomAD. Since this gene shows a lot of variation and seems to be tolerant to changes in its protein sequence, these missense variants are also less likely to contribute to changes in VWF level and ultimately to VWD. In conclusion, we do not find any support for our hypothesis that rare variants are accumulated in the VWD population and contribute to the disease phenotype. In addition, the few rare variants detected in these genes are not likely to affect protein function. The common variants previously associated with the VWF level all show the expected pattern in our VWD population, indicating that the population as such is valid to use for the purpose of rare variant analysis, as described previously.