Methods Inf Med 2004; 43(05): 434-438
DOI: 10.1055/s-0038-1633893
Original Article
Schattauer GmbH

Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments

J. Freudenberg
1   Interdisciplinary Center of Bioinformatics (IZBI), University of Leipzig, Germany
,
H. Boriss
1   Interdisciplinary Center of Bioinformatics (IZBI), University of Leipzig, Germany
,
D. Hasenclever
2   Institute of Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
05 February 2018 (online)

Summary

Objective: Due to scarcity of calibration data for micro-array experiments, simulation methods are employed to assess preprocessing procedures. Here we analyze several procedures’ robustness against increasing numbers of differentially expressed genes and varying proportions of up-regulation.

Methods: Raw probe data from oligo-nucleotide micro-arrays are assumed to be approximately multivariate normally distributed on the log scale. Chips can be simulated from a multivariate normal distribution with mean and variance-covariance matrix estimated from a real raw data set.

A chip effect induces strong positive correlations. In reverse, sampling from a normal distribution with strong correlation variance-covariance matrix generates data exhibiting a chip effect. No explicit model of chip-effect is needed. Differences can be artificially spiked-in according to a given distribution of effect sizes.

Thirty preprocessing procedures combining background correction, normalization, perfect match correction and summarization methods available from the BioConductor project were compared.

Results: In the symmetrical setting “50% differentially expressed genes, 50% of which up-regulated” background correction reduces bias, but inflates low intensity probe variance as well as the mean squared error of the estimates. Any normalization reduces variance and increases sensitivity with no clear winner. Asymmetry between up and down regulation causes bias in the effect-size estimate of non-differentially expressed genes. This markedly inflates the false positive discovery rates. Variance stabilizing normalization (VSN) behaved best.

Conclusion: A simple parametric bootstrap was used to simulate oligo-nucleotide micro-array raw data. Current normalization methods inflate the false positive rate when many genes show an effect in the same direction.

 
  • References

  • 1 Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996; 14 (13) 1675-80.
  • 2 Ihaka R, Gentleman RR. A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 1996; 5 (03) 299-314. http://www.r-project.org http://www.bioconductor.org
  • 3 Cope LM, Irizarry RA, Jaffeee H, Wu Z, Speed TP. A Benchmark for Affymetrix Gene Chip Expression Measures. Bioinformatics 2004; 20 (03) 323-31.
  • 4 Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall: New York, NY, USA; 1993
  • 5 van der Laan MJ, Bryan J. Gene expression analysis with the parametric bootstrap. Biostatistics 2001; (04) 445-61.
  • 6 Rocke DM, Durbin B. A Model for Measurement Error for Gene Expression Arrays. Journal of Computational Biology 2001; 8 (06) 557-69.
  • 7 Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4 (02) 249-64.
  • 8 Chudin E, Walker R, Kosaka A, Wu SX, Rabert D, Chang TK, Kreder DE. Assessment of the relationship between signal intensities and transcript concentration for Affymetrix Gene Chip arrays. Genome Biology 2001; 3: 1
  • 9 Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 2001; 98 (24) 13790-5. Epub 2001; Nov 13
  • 10 Eszlinger M, Krohn K, Frenzel R, Kropf S, Tonjes A, Paschke R. Gene expression analysis reveals evidence for inactivation of the TGF-beta signaling cascade in autonomously functioning thyroid nodules. Oncogene 2004; 23 (03) 795-804.
  • 11 Affymetrix. Statistical Algorithms Description Document. Affymetrix, Inc., Santa Clara, CA, 2002. http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
  • 12 Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19 (02) 185-93.
  • 13 Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Parameter estimation for the calibration and variance stabilization of microarray microarray data. Statistical Applications in Genetics and Molecular Biology 2003; 2: 1
  • 14 Li C, Wong WH. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2001; 2: 8
  • 15 Bolstad BM. Comparing the effects of background, normalization and summarization on gene expression estimates. http://www.stat.berkeley.edu/users/bolstad/stuff/components.pdf Unpublished manuscript 2002
  • 16 Lemon WJ, Palatini JJT, Krahe R, Wright FA. Theoretical and experimental comparisons of gene expression indexes for oligonucleotide arrays. Bioinformatics 2002; 18 (11) 1470-6.