Methods Inf Med 2022; 61(03/04): 068-083
DOI: 10.1055/s-0042-1751043
Original Article

Breast Cancer Subtypes Classification with Hybrid Machine Learning Model

Suvobrata Sarkar
1   Department of Computer Science and Engineering, Dr. B.C. Roy Engineering College, Durgapur, West Bengal, India
,
Kalyani Mali
2   Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
› Author Affiliations
Funding None.

Abstract

Background Breast cancer is the most prevailing heterogeneous disease among females characterized with distinct molecular subtypes and varied clinicopathological features. With the emergence of various artificial intelligence techniques especially machine learning, the breast cancer research has attained new heights in cancer detection and prognosis.

Objective Recent development in computer driven diagnostic system has enabled the clinicians to improve the accuracy in detecting various types of breast tumors. Our study is to develop a computer driven diagnostic system which will enable the clinicians to improve the accuracy in detecting various types of breast tumors.

Methods In this article, we proposed a breast cancer classification model based on the hybridization of machine learning approaches for classifying triple-negative breast cancer and non-triple negative breast cancer patients with clinicopathological features collected from multiple tertiary care hospitals/centers.

Results The results of genetic algorithm and support vector machine (GA-SVM) hybrid model was compared with classics feature selection SVM hybrid models like support vector machine-recursive feature elimination (SVM-RFE), LASSO-SVM, Grid-SVM, and linear SVM. The classification results obtained from GA-SVM hybrid model outperformed the other compared models when applied on two distinct hospital-based datasets of patients investigated with breast cancer in North West of African subcontinent. To validate the predictive model accuracy, 10-fold cross-validation method was applied on all models with the same multicentered datasets. The model performance was evaluated with well-known metrics like mean squared error, logarithmic loss, F1-score, area under the ROC curve, and the precision–recall curve.

Conclusion The hybrid machine learning model can be employed for breast cancer subtypes classification that could help the medical practitioners in better treatment planning and disease outcome.



Publication History

Received: 31 December 2021

Accepted: 11 May 2022

Article published online:
12 September 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Sung H, Ferlay J, Siegel RL. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021; 71 (03) 209-249
  • 2 World Health Organization. Breast Cancer. Accessed March 26, 2021 at: https://www.who.int/news-room/fact-sheets/detail/breast-cancer
  • 3 Ferroni P, Zanzotto FM, Riondino S, Scarpato N, Guadagni F, Roselli M. Breast cancer prognosis using a machine learning approach. Cancers (Basel) 2019; 11 (03) 328
  • 4 Kim W, Kim KS, Lee JE. et al. Development of novel breast cancer recurrence prediction model using support vector machine. J Breast Cancer 2012; 15 (02) 230-238
  • 5 Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 2014; 13: 8-17
  • 6 Tao M, Song T, Du W. et al. Classifying breast cancer subtypes using multiple kernel learning based on omics data. Genes (Basel) 2019; 10 (03) 200
  • 7 Zolbanin HM, Delen D, Zadeh AH. Predicting overall survivability in comorbidity of cancers: a data mining approach. Decis Support Syst 2015; 74: 150-161
  • 8 Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X. Developing prognostic systems of cancer patients by ensemble clustering. J Biomed Biotechnol 2009; 2009: 632786
  • 9 Shah SM, Khan RA, Arif S, Sajid U. Artificial intelligence for breast cancer analysis: trends & directions. Comput Biol Med 2022; 142: 105221
  • 10 Saber A, Sakr M, Abo-Seida OM. et al. A novel deep-learning model for automatic detection and classification of breast cancer using the transfer-learning technique. IEEE Access 2021; 9: 71194-71209
  • 11 Anderson P, Gadgil R, Johnson WA, Schwab E, Davidson JM. Reducing variability of breast cancer subtype predictors by grounding deep learning models in prior knowledge. Comput Biol Med 2021; 138: 104850
  • 12 Zhao S, Wang P, Heidari AA, Chen H, He W, Xu S. Performance optimization of salp swarm algorithm for multi-threshold image segmentation: comprehensive study of breast cancer microscopy. Comput Biol Med 2021; 139: 105015
  • 13 Liu L, Zhao D, Yu F. et al. Performance optimization of differential evolution with slime mould algorithm for multilevel breast cancer image segmentation. Comput Biol Med 2021; 138: 104910
  • 14 Huang H, Feng X, Zhou S. et al. A new fruit fly optimization algorithm enhanced support vector machine for diagnosis of breast cancer based on high-level features. BMC Bioinformatics 2019; 20 (08, Suppl 8): 290
  • 15 Tu J, Lin A, Chen H. et al. Predict the entrepreneurial intention of fresh graduate students based on an adaptive support vector machine framework. Math Probl Eng 2019; 2019: 1-16
  • 16 Shahbakhi M, Far DT, Tahami E. Speech analysis for diagnosis of Parkinson's disease using genetic algorithm and support vector machine. J Biomed Sci Eng 2014; 7 (04) 147-156
  • 17 Chen X, Liu K, Cai J. et al. Identification of heavy metal-contaminated Tegillarca granosa using infrared spectroscopy. Anal Methods 2015; 7 (05) 2172-2181
  • 18 Sarkar JP, Saha I, Sarkar A, Maulik U. Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers. Comput Biol Med 2021; 131: 104244
  • 19 Ben Azzouz F, Michel B, Lasla H. et al. Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches. Comput Biol Med 2021; 129: 104171
  • 20 Howlader N, Noone AM, Krapcho M. et al. SEER*Explorer. Breast: Recent Trends in SEER Age-Adjusted Incidence Rates, 2000–2018, by Race/Ethnicity, Delay-Adjusted SEER Incidence Rate, Female, Ages 15–39, All Stages. Bethesda, MD: National Cancer Institute; 2021
  • 21 Trivers KF, Lund MJ, Porter PL. et al. The epidemiology of triple-negative breast cancer, including race. Cancer Causes Control 2009; 20 (07) 1071-1082
  • 22 Amirikia KC, Mills P, Bush J, Newman LA. Higher population-based incidence rates of triple-negative breast cancer among young African-American women: implications for breast cancer screening recommendations. Cancer 2011; 117 (12) 2747-2753
  • 23 Stead LA, Lash TL, Sobieraj JE. et al. Triple-negative breast cancers are increased in black women regardless of age or body mass index. Breast Cancer Res 2009; 11 (02) R18
  • 24 Stark A, Kleer CG, Martin I. et al. African ancestry and higher prevalence of triple-negative breast cancer: findings from an international study. Cancer 2010; 116 (21) 4926-4932
  • 25 Nedeljković M, Damjanović A. Mechanisms of chemotherapy resistance in triple-negative breast cancer-how we can rise to the challenge. Cells 2019; 8 (09) 957
  • 26 Biostudies. BioStudies – one package for all the data supporting a study. Available at: https://www.ebi.ac.uk/biostudies/
  • 27 Mouh FZ, Slaoui M, Razine R, El Mzibri M, Amrani M. Clinicopathological, treatment and event-free survival characteristics in a Moroccan population of triple-negative breast cancer. Breast Cancer (Auckl) 2020; 14: 1178223420906428
  • 28 Biostudies. Clinicopathological, treatment and event-free survival characteristics in a moroccan population of triple-negative breast cancer. Available at: https://www.ebi.ac.uk/biostudies/studies/S-EPMC7218339?query=Clinicopathological%2C%20Treatment%20and%20Event-Free%20Survival%20Characteristics%20in%20a%20Moroccan%20Population%20of%20Triple-Negative%20Breast%20Cancer%20Fatima%20Zahra%20Mouh
  • 29 Adeniji AA, Dawodu OO, Habeebu MY. et al. Distribution of breast cancer subtypes among Nigerian women and correlation to the risk factors and clinicopathological characteristics. World J Oncol 2020; 11 (04) 165-172
  • 30 Biostudies. Distribution of breast cancer subtypes among Nigerian women and correlation to the risk factors and clinicopathological characteristics. Available at: https://www.ebi.ac.uk/biostudies/studies/S-EPMC7430856?query=distribution%20of%20breast%20cancer%20subtype%20among%20nigerian%20women
  • 31 Zeng X, Chen YW, Tao C. Feature Selection Using Recursive Feature Elimination for Handwritten Digit Recognition. 2009. Paper presented at: Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing; 12–14 September 2009; Japan
  • 32 Vapnik V, Lerner A. Pattern recognition using generalized portrait method. Autom Remote Control 1963; 24: 774-780
  • 33 Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning. New York: Addison-Wesley; 1989
  • 34 Davis L. Handbook of Genetic Algorithms. Edition. New York: Van Nostrand Reinhold; 1991
  • 35 Michalewicz Z. Genetic Algorithms+Data Structures, Evolution Programs. New York: Springer; 1992
  • 36 Filho JLR, Treleaven PC, Alippi C. Genetic algorithm programming environments. IEEE Computer 1994; 27: 28-43
  • 37 Scikit-learn. Machine Learning in Python. Available at: https://scikit-learn.org/stable/
  • 38 Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol 2010; 5 (09) 1315-1316
  • 39 Huang ML, Hung YH, Lee WM, Li RK, Jiang BR. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. ScientificWorldJournal 2014; 2014: 795624
  • 40 Algamal ZY, Lee MH. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst Appl 2015; 42 (23) 9326-9332
  • 41 Nursabillilah A, Nor A, Rosli B. Comparison of microarray breast cancer classification using support vector machine and logistic regression with LASSO and boruta feature selection. Indonesian J Electrical Engineering Comp Sci 2020; 20 (02) 712-719
  • 42 Huang CL, Liao HC, Chen MC. Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Syst Appl 2008; 34 (01) 578-587
  • 43 Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 2009; 36 (02) 3240-3247
  • 44 Asri H, Mousannif H, Moatassime AH. et al. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci 2016; 83: 1064-1069
  • 45 Charlton KE, Rose D. Nutrition among older adults in Africa: the situation at the beginning of the millennium. J Nutr 2001; 131 (09) 2424S-2428S
  • 46 Huang MW, Chen CW, Lin WC, Ke SW, Tsai CF. SVM and SVM ensembles in breast cancer prediction. PLoS One 2017; 12 (01) e0161501
  • 47 Alba E, Garcia-Nieto J, Jourdan L. et al. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. Paper presented at: 2007 IEEE Congress on Evolutionary Computation conference proceedings; September 25–28, 2007; Singapore
  • 48 Moteghaed NY, Maghooli K, Garshasbi M. Improving classification of cancer and mining biomarkers from gene expression profiles using hybrid optimization algorithms and fuzzy support vector machine. J Med Signals Sens 2018; 8 (01) 1-11
  • 49 Xu H, Chen T, Lv J. et al. A combined parallel genetic algorithm and support vector machine model for breast cancer detection. J Comp Methods Sci Engineering 2016; 16 (04) 773-785
  • 50 Aličković E, Subasi A. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 2017; 28: 753-763
  • 51 Xu Y, Ju L, Tong J, Zhou C, Yang J. supervised machine learning predictive analytics for triple-negative breast cancer death outcomes. OncoTargets Ther 2019; 12: 9059-9067
  • 52 Castillo W, Melin O, Pedrycz P. Hybrid Intelligent Systems: Analysis and Design Studies in Fuzziness and Soft Computing. Berlin Heidelberg: Springer; 2007: 55-64
  • 53 Vural S, Wang X, Guda C. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Syst Biol 2016; 10 (Suppl. 03) 62
  • 54 Beykikhoshk A, Quinn TP, Lee SC, Tran T, Venkatesh S. DeepTRIAGE: interpretable and individualised biomarker scores using attention mechanism for the classification of breast cancer sub-types. BMC Med Genomics 2020; 13 (Suppl. 03) 20
  • 55 Seo MK, Paik S, Kim S. An improved, assay platform agnostic, absolute single sample breast cancer subtype classifier. Cancers (Basel) 2020; 12 (12) 3506
  • 56 Yu Z, Wang Z, Yu X, Zhang Z. RNA-Seq-based breast cancer subtypes classification using machine learning approaches. Comput Intell Neurosci 2020; 2020: 4737969
  • 57 Xie T, Wang Z, Zhao Q. et al. Machine learning-based analysis of MR multiparametric radiomics for the subtype classification of breast cancer. Front Oncol 2019; 9: 505
  • 58 Wu J, Hicks C. Breast cancer type classification using machine learning. J Pers Med 2021; 11 (02) 61
  • 59 Ma W, Zhao Y, Ji Y. et al. Breast cancer molecular subtype prediction by mammographic radiomic features. Acad Radiol 2019; 26 (02) 196-201
  • 60 Peppercorn J, Perou CM, Carey LA. Molecular subtypes in breast cancer evaluation and management: divide and conquer. Cancer Invest 2008; 26 (01) 1-10
  • 61 Huerta EB, Duval B, Hao JK. A Hybrid GA/SVM approach for gene selection and classification of microarray data. evo workshops 2006. LNCS 2006; 3907: 34-44
  • 62 Ngadi M, Nassih B, Hachimi H. et al. Genetic algorithms combined with support vector machine for breast cancer diagnosis. Paper presented at: International Workshop in Optimization and Applications Woa; 2016;17th-19th May 2016
  • 63 Resmini R, Silva L, Araujo AS, Medeiros P, Muchaluat-Saade D, Conci A. Combining genetic algorithms and SVM for breast cancer diagnosis using infrared thermography. Sensors (Basel) 2021; 21 (14) 4802
  • 64 Wu T, Sultan LR, Tian J, Cary TW, Sehgal CM. Machine learning for diagnostic ultrasound of triple-negative breast cancer. Breast Cancer Res Treat 2019; 173 (02) 365-373
  • 65 Turkki R, Byckhov D, Lundin M. et al. Breast cancer outcome prediction with tumour tissue images and machine learning. Breast Cancer Res Treat 2019; 177 (01) 41-52
  • 66 Parshad R, Kazi M, Seenu V, Mathur S, Dattagupta S, Haresh KP. Suhani. Triple-negative breast cancers: are they always different from nontriple-negative breast cancers? An experience from a tertiary center in India. Indian J Cancer 2017; 54 (04) 658-663
  • 67 Gogia A, Raina V, Deo SVS, Shukla NK, Mohanti BK. Triple-negative breast cancer: an institutional analysis. Indian J Cancer 2014; 51 (02) 163-166
  • 68 Sharma D, Singh G. An institutional analysis of clinicopathological features of triple negative breast cancer. Indian J Cancer 2016; 53 (04) 566-568
  • 69 Doval DC, Sharma A, Sinha R. et al. Immunohistochemical profile of breast cancer patients at a tertiary care hospital in New Delhi, India. Asian Pac J Cancer Prev 2015; 16 (12) 4959-4964
  • 70 Sharma M, Sharma JD, Sarma A. et al. Triple negative breast cancer in people of North East India: critical insights gained at a regional cancer centre. Asian Pac J Cancer Prev 2014; 15 (11) 4507-4511
  • 71 Weston AD, Hood L. Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J Proteome Res 2004; 3 (02) 179-196
  • 72 Kuang F, Xu W, Zhang S. A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl Soft Comput 2014; 18: 178-184