Summary
Background: Until recently, genotype stud -ies were limited to the investigation of single SNP
effects due to the computational burden incurred when studying pairwise interactions
of SNPs. However, some genetic effects as simple as coloring (in plants and animals)
cannot be ascribed to a single locus but only understood when epistasis is taken into
account [1]. It is expected that such effects are also found in complex diseases where
many genes contribute to the clinical outcome of affected individuals. Only recently
have such problems become feasible computationally.
Objectives: The inherently parallel structure of the problem makes it a perfect candidate for
massive parallelization on either grid or cloud architectures. Since we are also dealing
with confidential patient data, we were not able to consider a cloud-based solution
but had to find a way to process the data in-house and aimed to build a local GPU-based
grid structure.
Methods: Sequential epistatsis calculations were ported to GPU using CUDA at various levels.
Parallelization on the CPU was compared to corresponding GPU counterparts with regards
to performance and cost.
Results: A cost-effective solution was created by combining custom-built nodes equipped with
relatively inexpensive consumer-level graphics cards with highly parallel GPUs in
a local grid. The GPU method outperforms current cluster-based systems on a price/performance
criterion, as a single GPU shows speed performance comparable up to 200 CPU cores.
Conclusion: The outlined approach will work for problems that easily lend themselves to massive
parallelization. Code for various tasks has been made available and ongoing development
of tools will further ease the transition from sequential to parallel algorithms.
Keywords
Epistasis - GPU - grid - computing