Methods Inf Med 2012; 51(02): 152-161
DOI: 10.3414/ME11-02-0019
Focus Theme – Original Articles
Schattauer GmbH

Identification of Breast Cancer Prognosis Markers Using Integrative Sparse Boosting

S. Ma
1   School of Public Health,Yale University, New Haven, Connecticut, USA
,
J. Huang
2   Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA
,
Y. Xie
3   Department of Clinical Sciences, UT Southwestern Medical Center, Dallas, Texas, USA
,
N. Yi
4   Department of Biostatistics, Section on Statistical Genetics, University of Alabama, Birmingham, Alabama, USA
› Author Affiliations
Further Information

Publication History

received:07 June 2011

accepted:08 February 2011

Publication Date:
19 January 2018 (online)

Summary

Objectives: In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution.

Methods: We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects.

Results: Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance.

Conclusions: Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.

 
  • References

  • 1 Cheang M, van de Rijin M, Nielson TO. Gene expression profiling of breast cancer. Annual Review of Pathology: Mechanisms of Disease 2008; 3: 67-97.
  • 2 Knudsen S. Cancer Diagnostics with DNA Microarrays. Wiley 2006
  • 3 Sotiriou C, Wirapati P, Loi S, Harris A, Fox S. et al Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. JNCI 2006; 98: 262-272.
  • 4 van't Veer LJ, Dai H, Vijver MJ, van de He YD, Hart AA, Mao M, Peterse HL, Kooy K, van der Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415: 530-536.
  • 5 Huang Y, Huang J, Shia BC, Ma S. Identification of cancer genomic markers via integrative sparse boosting. Biostatistics 2011 In press.
  • 6 Ma S, Huang J, Wei F, Xie Y, Fang K. Integrative analysis of multiple cancer prognosis studies with gene expression measurements. Statistics in Medicine 2011 In press.
  • 7 Ma S, Huang J, Song X. Integrative analysis and variable selection with multiple high-dimensional datasets. Biostatistics 2011 In press.
  • 8 Guerra R. Goldstein DR. Meta-Analysis and Combining Information in Genetics and Genomics. Chapman and Hall/CRC 2009
  • 9 Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statistics in Medicine 1992; 11: 1871-1879.
  • 10 Schmid M, Hothorn T. Flexible boosting of accelerated failure time models. BMC Bioinformatics 2008; 9: 269
  • 11 Wang Z, Wang CY. Buckley-James boosting for survival analysis with high dimensional biomarker data. Statistical Applications in Genetics and Molecular Biology 2010; 9 (01) 24.
  • 12 Datta S, Le-Rademacher J, Datta S. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO. Biometrics 2007; 63: 259-271.
  • 13 Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics 2004; 20: 3583-3593.
  • 14 Dettling M, Buhlmann P. Boosting for tumor classification with gene expression data. Bioinformatics 2003; 19: 1061-1069.
  • 15 Buhlmann P, Hothorn T. Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat Sci 2007; 22: 477-505.
  • 16 Buhlmann P, Yu B. Sparse boosting. Journal of Machine Learning Research 2006; 7: 1001-1024.
  • 17 Rhodes D, Chinnaiyan AM. Bioinformatics strategies for translating genome-wide expression analyses into clinically useful cancer markers. Annals of the New York Academy of Sciences 2004; 1020 32-40.
  • 18 Rhodes D, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. Large-scale meta-analysis of cancer microarray data identified common transcriptional profiles of neoplastic transformation and progression. PNAS 2004; 101: 9309-9314.
  • 19 Stute W. Consistent estimation under random censorship when covariables are available. Journal of Multivariate Analysis 1993; 45: 89-103.
  • 20 Berk RA. Statistical Learning from a Regression Perspective. Springer 2008
  • 21 Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer 2009
  • 22 Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics 2001; 29: 1189-1232.
  • 23 Zhang J, Ramadge PJ. Sparse boosting. 2009 International Conference on Acoustics, Speech and Signal Processing. 2009
  • 24 Ma S, Huang J, Shi M, Li Y, Shia BC. Semiparametric prognosis models in genomic studies. Briefings in Bioinformatics 2010; 11: 385-393.
  • 25 Huang E, Cheng SH, Dressman H, Pittman J, Tsou MH, Horng CF, Bild A, Iversen ES, Liao M, Chen CM, West M, Nevins JR, Huang AT. Gene expression predictors of breast cancer outcomes. Lancet 2003; 361: 1590-1596.
  • 26 Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Rijn M, van de Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein Lonning P, Borresen-Dale AL. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS 2001; 98: 10869-10874.
  • 27 Sotiriou C, Neo SY, McShane LM, Korn EL, Long PM, Jazaeri A, Martiat P, Fox SB, Harris AL, Liu ET. Breast cancer classification and prognosis based on gene expression profiles from a population based study. PNAS 2003; 100: 10393-10398.
  • 28 Gu F, Schumacher FR, Canzian F, Allen NE. et al Eighteen insulin-like growth factor pathway genes, circulating levels of IGF-I and its binding protein, and risk of prostate and breast cancer. Cancer Epidemiology, Biomarkers, Prevention 2010; 19: 2877-2887.
  • 29 He C, Kraft P, Chasman DI, Buring JE. et al A large-scale candidate gene association study of age at menarche and age at natural menopause. Human Genetics 2010; 128: 515-527.
  • 30 Turashvili G, Bouchal J, Baumforth K, Wei W. et al Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer 2007; 7: 55
  • 31 Hennessy BT, Gonzalez-Angulo AM, Stemke-Hale K, Gilcrease MZ. et al Characterization of a naturally occurring breast cancer subset enriched in epithelial-to-mesenchymal transition and stem cell characteristics. Cancer Research 2009; 69: 4116-4124.
  • 32 Creighton CJ, Massarweh S, Huang S, Tsimelzon A. et al Development of resistance to targeted therapies transforms the clinically associated molecular profile subtype of breast tumor xenografts. Cancer Research 2008; 68: 7493-7501.
  • 33 Kreike B, van Kouwenhove M, Horlings H, Weigelt B. et al Gene expression profiling and histopathological characterization of triple-negative/basal-like breast carcinomas. Breast Cancer Research 2007; 9: R65
  • 34 Kulasingam V, Diamandis EP. Proteomics analysis of conditioned media from three breast cancer cell lines: a mine for biomarkers and therapeutic targets. Mol Cell Proteomics 2007; 6: 1997-2011.
  • 35 Emery LA, Tripathi A, King C, Kavanah M. et al Early dysregulation of cell adhesion and extracellular matrix pathways in breast cancer progression. Am J Pathol 2009; 175: 1292-1302.
  • 36 Poola I, DeWitty RL, Marshalleck JJ, Bhatnagar R. et al Identification of MMP-1 as a putative breast cancer predictive marker by global gene expression analysis. Nat Med 2005; 11: 481-483.