Summary
Background: Genome-wide association studies (GWAS) were highly successful in identifying new
susceptibility loci of complex traits. Such studies usually start with genotyping
fixed arrays of genetic markers in an initial sample. Out of these markers, some are
selected which will be further genotyped in independentsamples. Due tothevery low
a priori probability of a true positive association, the vast majority of all marker
signals will turn out to be false positive. Thus, several methods to sort marker data
have been proposed which will be evaluated here.
Objectives: We compared statistical properties of ranking by p-values, q-values, the False Positive
Report Probability (FPRP) and the Bayesian False-Discovery Probability (BFDP).
Methods: We performed simulation studies for a genomic region derived from GWAS data sets
and calculated descriptive statistics as well as mean square errors with regard to
the true marker ranking. Additionally, we applied all measures to a GWAS for early
onset extreme obesity superimposing a priori information on candidate genes.
Results: Despite the known, more extreme probability results for traditional p-values, we
observed that both p-values and the BFDP were more precise in reconstructing the “true”
order of the markers in a region. In addition, the BFDP was useful to attenuate unexpected
effects at a genome-wide scale.
Conclusions: For the purpose of selecting markers from an initial GWAS and within the limits of
this study, we recommend either ranking by p-values or the application of a full Bayesian
approach for which the BFDP is a first approximation.
Keywords
Genome-wide association study - p-value - q-value - FPRP - BFDP