Methods Inf Med 2005; 44(01): 89-97
DOI: 10.1055/s-0038-1633927
Original Article
Schattauer GmbH

Selection of Predictor Variables for Pneumonia Using Neural Networks and Genetic Algorithms

P. S. Heckerling
1   Department of Medicine (PSH, BSG), University of Illinois at Chicago, Chicago, Illinois, USA
,
B. S. Gerber
1   Department of Medicine (PSH, BSG), University of Illinois at Chicago, Chicago, Illinois, USA
2   Department of Bioengineering (BSG), University of Illinois at Chicago, Chicago, Illinois, USA
,
T. G. Tape
3   Department of Medicine (TGT, RSW), University of Nebraska, Omaha, Nebraska, USA
,
R. S. Wigton
3   Department of Medicine (TGT, RSW), University of Nebraska, Omaha, Nebraska, USA
› Author Affiliations
Further Information

Publication History

Received: 29 December 2003

accepted: 03 June 2004

Publication Date:
06 February 2018 (online)

Summary

Background: Artificial neural networks (ANN) can be used to select sets of predictor variable that incorporate nonlinear interactions between variables. We used a genetic algorithm, with selection based on maximizing network accuracy and minimizing network input-layer cardinality, to evolve parsimonious sets of variables for predicting community-acquired pneumonia among patients with respiratory complaints.

Methods: ANN were trained on data from 1044 patients in a training cohort, and were applied to 116 patients in a testing cohort. Chromosomes with binary genes representing input-layer variables were operated on by crossover recombination, mutation, and probabilistic selection based on a fitness function incorporating both network accuracy and input-layer cardinality.

Results: The genetic algorithm evolved best 10-variable sets that discriminated pneumonia in the training cohort (ROC areas, 0.838 for selection based on average cross entropy (ENT); 0.954 for selection based on ROC area (ROC)), and in the testing cohort (ROC areas, 0.847 for ENT selection; 0.963 for ROC selection), with no significant differences between cohorts. Best variable sets based on the genetic algorithm using ROC selection discriminated pneumonia more accurately than variable sets based on stepwise neural networks (ROC areas, 0.954 versus 0.879, p = 0.030), or stepwise logistic regression (ROC areas, 0.954 versus 0.830, p = 0.000). Variable sets of lower cardinalities were also evolved, which also accurately discriminated pneumonia.

Conclusion: Variable sets derived using a genetic algorithm for neural networks accurately discriminated pneumonia from other respiratory conditions, and did so with greater accuracy than variables derived using stepwise neural networks or logistic regression in some cases.

 
  • References

  • 1 Faussett L. Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Englewood Cliffs, New Jersey: Prentice Hall; 1994
  • 2 Penny W, Frost D. Neural networks in clinical medicine. Med Decis Making 1996; 16: 386-98.
  • 3 Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression in predicting medical outcomes. J Clin Epidemiol 1996; 49: 1225-31.
  • 4 van de Laar P, Heskes T, Gielen S. Partial retraining: a new approach to input relevance determination. Int J Neural Systems 1999; 9: 75-85.
  • 5 Heckerling PS, Gerber BS, Tape TG, Wigton RS. Entering the black box of neural networks: a descriptive study of clinical variables predicting community-acquired pneumonia. Methods Inf Med 2003; 42: 287-96.
  • 6 Castellano G, Fanelli AM, Pelillo M. An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Networks 1997; 8: 519-31.
  • 7 Baxt WG. Analysis of the clinical variables driving decision in an artificial neural network trained to identify the presence of myocardial infarction. Ann Emerg Med 1992; 21: 1439-44.
  • 8 Goldberg DE. Genetic Algorithms in Search, Optimization, and Machine Learning. 1st edition. Reading, Massachusetts: Addison-Wesley Publishing Company; 1989
  • 9 Holland JH. Adaptation in Natural and Artificial Systems. Ann Arbor, Michigan: University of Michigan Press,1975 reprinted. Cambridge, Massachusetts: MIT Press; 1992
  • 10 Forrest S. Genetic algorithms: principles of natural selection applied to computation. Science 1993; 261: 872-8.
  • 11 Jefferson MF, Pendleton N, Lucas SB, Horan MA. Comparison of a genetic algorithm neural network with logistic regression in predicting outcome after surgery for patients with non-small cell lung carcinoma. Cancer 1997; 79: 1338-42.
  • 12 Bath PA, Pendleton N, Morgan K, Clague JE, Horan MA, Lucas SB. New approach to risk determination: development of risk profile for new falls among community-dwelling older people by use of a genetic algorithm neural network (GANN). J Gerontol Med Sci 2000; 55A: M17-M21.
  • 13 Narayanan MN, Lucas SB. A genetic algorithm to improve a neural network to predict a patient’s response to warfarin. Meth Inform Med 1993; 32: 55-8.
  • 14 Potter SR, Miller MC, Mangold LA, Jones KA, Epstein JI, Veltri RW, Partin AW. Genetically engineered neural networks for predicting prostate cancer progression after radical prostatectomy. Urology 1999; 54: 791-5.
  • 15 Jefferson ME, Pendleton N, Lucas CP, Lucas SB, Horan MA. Evolution of artificial neural network architecture: prediction of depression after mania. Meth Inform Med 1998; 37: 220-5.
  • 16 Heckerling PS, Tape TG, Wigton RS. et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990; 113: 664-70.
  • 17 Heckerling PS, Gerber BS, Tape TG, Wigton RS. Prediction of community-acquired pneumonia using artificial neural networks. Medical Decision Making 2003; 23: 112-21.
  • 18 Jacob C. Illustrating Evolutionary Computation with Mathematica. San Diego, California: Academic Press; 2001
  • 19 Hinton GE. Connectionist learning procedures. In. Carbonell JG. (ed.). Machine Learning: Paradigms and Methods. Cambridge, MA: MIT Press; 1990: 185-234.
  • 20 Dorfman DD, Alf Jr E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals – rating- method data. J Math Psych 1969; 6: 487-96.
  • 21 Heckerling PS. Parametric receiver operating characteristic (ROC) curve analysis using Mathematica. Comput Meth Prog Biomed 2002; 69: 65-73.
  • 22 Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propogation. In. Rumelhart DE, McClelland JL. (eds.). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press; 1986: 318-64.
  • 23 McClelland JL, Rumelhart DE. Training hidden units. In. McClelland JL, Rumelhart DE. (eds.). Explorations in Parallel Distributed Processing. Cambridge, MA: MIT Press; 1988: 121-60.
  • 24 Heckerling PS, Gerber BS, Tape TG, Wigton RS. Use of genetic algorithms for neural networks to predict community-acquired pneumonia. Artif Intell Med 2004; 30: 71-84.
  • 25 Wolfram S. Mathematica: A system for doing mathematics by computer. Second Edition. Reading, Massachusetts: Reading, Massachusetts: Addison-Wesley Publishing Company. 1991
  • 26 Freeman JA. Simulating Neural Networks with Mathematica. Reading, Massachusetts: Addison- Wesley Publishing Company. 1994
  • 27 Heckerling PS, Gerber BS. Feed-forward, backpropagation neural network modeling using Mathematica. Comput Meth Prog Biomed (submitted for publication).
  • 28 Hosmer DW, Lemeshow S. Applied Logistic Regression. New York: John Wiley and Sons; 1989
  • 29 Engleman L. Stepwise logistic regression. In. Dixon WJ, Brown MB, Engleman L, Frane JW, Hill MA, Jennrich RI. BMDP Statistical Software. Berkeley, California: University of California Press; 1985: 330-44.
  • 30 Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983; 148: 839-43.
  • 31 Ohno-Machado L, Fraser HS, Ohrn A. Improving machine learning performance by removing redundant cases in medical data sets. Proc AMIA Symp 1998; 523-7.
  • 32 Hunter A, Kennedy L, Henry J, Ferguson I. Application of neural networks and sensitivity analysis to improved prediction of trauma survival. Comput Meth Prog Biomed 2000; 62: 11-9.
  • 33 Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont, California: Wadsworth International Group; 1984
  • 34 Cook EF, Goldman L. Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis. J Chron Dis 1984; 37: 721-31.
  • 35 Tamminen S, Laurinen P, Roning J. Comparing regression trees with neural networks in aerobic fitness approximation. Available at. http://www.ee.oulu.fi/research/neurogroup/Publications/aida99_cr2.pdf Accessed [5/19/04].
  • 36 Vinterbro S, Ohno-Machado L. A genetic algorithm to select variables in logistic regression: an example in the domain of myocardial infarction. Proc AMIA Symp 1999; 984-8.