Methods Inf Med 2005; 44(05): 704-711
DOI: 10.1055/s-0038-1634028
Original Article
Schattauer GmbH

Improving Model Robustness with Bootstrapping

Application to Optimal Discriminant Analysis for Ordinal Responses (ODAO)
G. Le Teuff
1   Department of Biostatistics and Medical Informatics, Dijon University Hospital, France
,
C. Quantin
1   Department of Biostatistics and Medical Informatics, Dijon University Hospital, France
,
A. Venot
2   Department of Biostatistics and Medical Informatics, Cochin-Port Royal University Hospital, Paris, France
,
E. Walter
3   Laboratoire des Signaux et Systèmes, CNRS, Supélec, Université Paris-Sud, France
,
J. Coste
2   Department of Biostatistics and Medical Informatics, Cochin-Port Royal University Hospital, Paris, France
› Author Affiliations
Further Information

Publication History

Received: 20 April 2004

accepted: 13 February 2005

Publication Date:
07 February 2018 (online)

Summary

Objective: Recent results published by Coste et al. in discriminant analysis with ordinal responses showed the superiority of optimal discriminating analysis for ordinal responses (ODAO) both in terms of classification and simplicity of implementation compared to classic methods (Fisher’s discrimination, logistic regression) applied to medical data (prognostics of burns) and to simulated data. Nevertheless, the solutions obtained by ODAO may be sensitive to re-sampling (i.e the estimated coefficients by ODAO may show excessive sensitivity to the training sample). This study proposes some solutions to control the fluctuations of sampling and to ensure model stability.

Methods: We used intensive computational methods and bootstrapping, at the outset of model building in order to reduce the sampling variability of estimated coefficients. Thus, the estimation of the coefficients was not based on the minimization of a classification criterion of the training sample, but on the minimization of an aggregate criterion of bootstrapped replications of a classification criterion. Five aggregate criteria were studied.

Results: The improvement in terms of robustness appeared in 30% of the test cases with moderate training sample size and 55% of those with small training sample size.

Conclusion: Simulated test cases showed that bootstrapping can help construct more robust models in difficult classification situations and small training samples which are particularly frequent.

 
  • References

  • 1 Diaconis P, Efron B. Computer-Intensive Methods in Statistics. Scientific American 1983: 96-108.
  • 2 Efron B. Bootstrap methods: Another look at the jackknife. The Annals of Statistics 1979; 7 (01) 1-26.
  • 3 Efron B, Tibshirani R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1986; 1 (01) 54-77.
  • 4 Ganeshanandam S, Krzanowski WJ. Error-rate estimation in two-group discriminant analysis using the linear discriminant function. Journal of Statistical Computation and Simulation 1989; 36: 157-75.
  • 5 Altman DG, Andersen PK. Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine 1989; 8: 771-83.
  • 6 Mc Guinness D, Bennett S, Riley E. Statistical analysis of highly skewed immune response data. Journal of Immunological Methods 1997; 201 (01) 99-114.
  • 7 Breiman L. Bagging predictors. Machine Learning 1996; 24: 123-40.
  • 8 Zhang J. Inferential estimation of polymer quality using bootstrap aggregated neural networks. Neural Networks 1999; 12 (06) 927-38.
  • 9 Adler W, Hothorn T, Lausen B. Simulation based analysis of automated classification of medical images. Methods of Information in Medicine 2004; 43 (02) 150-5.
  • 10 Shao J. Bootstrap model selection. Journal American Statistical Association 1996; 91: 655-65.
  • 11 Sauerbrei W. The use of resampling methods to simplify regression models in medical statistics. Applied Statistics 1999; 48 (03) 313-29.
  • 12 Schumacher M, Holländer N, Sauerbrei W. Resampling and cross-validation techniques: a tool to reduce bias caused by model building. Statistics in Medicine 1997; 16: 2813-27.
  • 13 Ananth CV, Kleinbaum DG. Regression models for ordinal response: a review of methods and applications. International Journal of Epidemiology 1997; (26) 1323-33.
  • 14 Mc Cullagh P. Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society, Series B 1980; 42: 109-42.
  • 15 Agresti A. Modelling ordered categorical data: recent advances and future challenges. Statistics in Medicine 1999; 18: 2191-207.
  • 16 Feldmann U, Konig J. Ordinal classification in medical prognosis. Methods of Information in Medicine 2002; 41 (02) 154-9.
  • 17 Coste J, Walter E, Wasserman D Venot. A. Optimal discriminant analysis for ordinal responses. Statistics in Medicine 1997; 16: 561-9.
  • 18 Lachenbruch PA, Clarke WR. Discriminant analysis and its applications in epidemiology. Methods Inf Med 1980; 19 (04) 220-6.
  • 19 Huber PJ. Robust regression: asymptotics, conjectures and Monte Carlo. The Annals of Statistics 1973; 1 (05) 779-821.
  • 20 Liestol K. ‘Robust’ statistical methods. Scandinavian Journal of Clinical and Laboratory Investigation 1984; 44 (03) 177-81.
  • 21 Morineau A. Régressions robustes: Méthodes d’ajustement et de validation. Revue de Statistique Appliquée 1978; 26 (03) 5-28.
  • 22 Endrenyi L, Tang HY. Robust parameter estimation for a simple kinetic model. Computers and Biomedical Research 1980; 13 (05) 430-6.
  • 23 Greenman RM, Stepniewski SW. Designing compact feedforward neural models with small training data sets. Journal of Aircraft 2002; 39 (03) 452-9.
  • 24 Heckerling PS, Conant RC, Tape TG, Wigton RS. Discrimination and reproducibility of an information maximizing multivariable model. Methods Inf Med 1993; 32 (02) 131-6.
  • 25 Masri SF, Bekey GA, Safford FB. An adaptive random search method for identification of largescale non linear systems, Proc 4th IFAC Symposium on Identification and Systems Parameter Estimation. 1976: 246-55.
  • 26 Pronzato L, Walter E, Venot A, Lebruchec JF. A general purpose global optimizer: implementation and applications. Mathematics and Computers in Simulation 1984; 26: 412-22.
  • 27 Campbell K, Donner As, Webster K. Are ordinal models useful for classification?. Statistics in Medicine 1991; 10: 383-94.
  • 28 Rastrigin LA. The convergence of the random search method in the extremal control of a manyparameter system. Automation and Remote Control 1963; 24 (11) 1337-42.
  • 29 Boender CGE, Rinnooy Kan AHG, Timmer GT, Stougie L. A stochastic method for global optimisation. Mathematical Programming 1982; 22: 125-40.
  • 30 Red WE. A method for determining the equilibrium states of dynamics systems. Journal of Optimization Theory and Applications 1977; 21 (03) 299-317.