Methods Inf Med 2007; 46(03): 360-366
DOI: 10.1160/ME0425
Schattauer GmbH

Does Drug-target Have a Likeness?

H. Xu
1   College of Life Science, Zhejiang University, Hangzhou, Zhejiang, P. R. China
Y. Fang
2   Department of Biomedical Informatics, Columbia University, New York, NY, USA
L. Yao
2   Department of Biomedical Informatics, Columbia University, New York, NY, USA
Y. Chen
3   Department of Computational Science, National University of Singapore, Singapore
X. Chen
1   College of Life Science, Zhejiang University, Hangzhou, Zhejiang, P. R. China
› Author Affiliations
Further Information

Publication History

Publication Date:
20 January 2018 (online)


Objective: The discovery of new targets that are sufficiently robust to yield marketable therapeutics is an enormous challenge. Conventional target identification approaches are disease-dependent, which require heavy experimental workload and comprehensive domain knowledge. In this work, we propose that a disease-independent property of proteins, “drug-target likeness”, can be explored to facilitate the genomic scale target screening in the post-genomic age.

Methods: ASupport Vector Machine (SVM) classifier was trained to recognize target and non-target protein sequences compiled from the Therapeutic Target Database, Drug Bank, and PFam. Protein sequences are encoded by theircomposition, transition and distribution features of residues and Gaussian kernel function was used in SVM classification.

Results: SVM with a fine-tuned kernel width records 66.4 ± 5.1% of sensitivity and 97.2 ± 0.6% of specificity, corresponding to an overall target prediction accuracy of 94.4 ± 0.8%.

Conclusions: Though primitive, these results suggest that, similar to the “drug likeness” for small chemicals, their binding partners, drug targets, also display shared features which are reflected in their sequences and can be captured bystatistical learning approaches. Further research on how to accurately and interpretably measure the likeness of protein being a drug target is promising. Inspired bythe progress of “drug likeness” studies, advances in protein descriptors, statistical learning algorithms and more comprehensive and accurate gold-standard data set from disease biology research may help to further define the “drug-target likeness” property of proteins.

  • References

  • 1 Alpay D. Reproducing kernel spaces and applications. Boston, MA: Birkhauser Verlag; 2003
  • 2 Alpay D. The Schur algorithm, reproducing kernel spaces, and system theory. Providence, RI: American Mathematical Society; viii 150 2001
  • 3 Arièens EJ. Drug design. New York: Academic Press; v. 1971
  • 4 Bailey D, Zanders E, Dean P. The end of the beginning for genomic medicine. Nat Biotechnol 2001; 19 (03) 207-9.
  • 5 Baur JA, Sinclair DA. Therapeutic potential of resveratrol: the in vivo evidence. Nat Rev Drug Discov 2006; 05 (06) 493-506.
  • 6 Ben-Yacoub S, Abdeljaoued Y, Mayoraz E. Fusion of face and speech data for person identity verification. Ieee Transactions on Neural Networks 1999; 10 (05) 1065-74.
  • 7 Bock JR, Gough DA. Predicting protein-protein interactions from primary structure. Bioinformatics 2001; 17 (05) 455-60.
  • 8 Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the United States of America 2000; 97 (01) 262-7.
  • 9 Burbidge R, Trotter M, Buxton B, Holden S. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput Chem 2001; 26 (01) 5-14.
  • 10 Byvatov E, Schneider G. Support vector machine applications in bioinformatics. Appl Bioinformatics 2003; 02 (02) 67-77.
  • 11 Cai CZ, Han LY, Chen X, Cao ZW, Chen YZ. Prediction of functional class of the SARS coronavirus proteins by a statistical learning method. J Proteome Res 2005; 04 (05) 1855-62.
  • 12 Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 2003; 31 (13) 3692-7.
  • 13 Cai CZ, Wang WL, Chen YZ. Support Vector Machine Classification of Physical and Biological Datasets. Inter J Mod Phys C 2003; 14 (05) 575-85.
  • 14 Chen X, Ji ZL, Chen YZ. TTD: Therapeutic Target Database. Nucleic Acids Res 2002; 30 (01) 412-5.
  • 15 Chinnasamy A, Sung WK, Mittal A. Protein structure and fold prediction using Tree-Augmented naive Bayesian classifier. J Bioinform Comput Biol 2005; 03 (04) 803-19.
  • 16 Conkright MD, Guzman E, Flechner L, Su AI, Hogenesch JB, Montminy M. Genome-wide analysis of CREB target genes reveals a core promoter requirement for cAMP responsiveness. Mol Cell 2003; 11 (04) 1101-8.
  • 17 de Vel O, Anderson A, Corney M, Mohay G. Mining e-mail content for author identification forensics. Sigmod Record 2001; 30 (04) 55-64.
  • 18 Ding CH, Dubchak I. Multi-class protein fold recognition using support vector machines and neuralnetworks. Bioinformatics 2001; 17 (04) 349-58.
  • 19 Drews J. Drug discovery: ahistorical perspective. Science 2000; 287 5460 1960-4.
  • 20 Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 1995; 92 (19) 8700-4.
  • 21 Dusseldorp E, Meulman J. Prediction in medicine by integrating regression trees into regression analysis with optimal scaling. Methods Inf Med 2002; 40 (05) 403-9.
  • 22 Engelhardt BE, Jordan MI, Muratore KE, Brenner SE. Protein Molecular Function Prediction by Bayesian Phylogenomics. PLoS Comput Biol 2005; 01 (05) e45.
  • 23 Friede T, Kieser M, Miller F. Modeling the recovery from depressive illness by an exponential model with mixed effects. Methods Inf Med 2000; 39 (01) 12-5.
  • 24 Gunn SR. Support Vector Machines for Classification and Regression: Technical Report. University of Southampton. 1998
  • 25 Herpfer I, Lieb K. Substance P receptor antagonists in psychiatry: rationale for development and therapeutic potential. CNS Drugs 2005; 19 (04) 275-93.
  • 26 Hollander N, Augustin NH, Sauerbrei W. Investigation on the improvement of prediction by bootstrap model averaging. Methods Inf Med 2006; 45 (01) 44-50.
  • 27 Hopkins AL, Groom CR. The druggable genome. Nat Rev Drug Discov 2002; 01 (09) 727-30.
  • 28 Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 2001; 308 (02) 397-407.
  • 29 Huang X, Huang DS, Zhang GZ, Zhu YP, Li YX. Prediction of protein secondary structure using improved two-level neural network architecture. Protein Pept Lett 2005; 12 (08) 805-11.
  • 30 Ikeda M, Itoh S, Ishigaki T, Yamauchi K. Application of resampling techniques to the statistical analysis of the Brier score. Methods Inf Med 2001; 40 (03) 259-64.
  • 31 Karchin R, Karplus K, Haussler D. Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002; 18 (01) 147-59.
  • 32 Karlsen RE, Gorsich DJ, Gerhart GR. Target classification via support vector machines. Optical Engineering 2000; 39 (03) 704-11.
  • 33 Kim KI, Jung K, Park SH, Kim HJ. Supportvector machine-based text detection in digital video. Pattern Recognition 2001; 34 (02) 527-9.
  • 34 Kirk RI, Deitch JA, Wu JM, Lerea KM. Resveratrol decreases early signaling events in washed platelets but has little effect on platelet in whole blood. Blood Cells Mol Dis 2000; 26 (02) 144-50.
  • 35 Kramer MS, Cutler N, Feighner J, Shrivastava R, Carman J, Sramek JJ, Reines SA, Liu G, Snavely D, Wyatt-Knowles E, Hale JJ, Mills SG, MacCoss M, Swain CJ, Harrison T, Hill RG, Hefti F, Scolnick EM, Cascieri MA, Chicchi GG, Sadowski S, Williams AR, Hewson L, Smith D, Carlson EJ, Hargreaves RJ, Rupniak NM. Distinct mechanism for antidepressant activity by blockade of central substance Preceptors. Science 1998; 281 5383 1640-5.
  • 36 Lin HH, Han LY, Cai CZ, Ji ZL, Chen YZ. Prediction of transporter family from protein sequence by support vector machine approach. Proteins 2006; 62 (01) 218-31.
  • 37 Liong SY, Sivapragasam C. Flood stage forecasting with support vector machines. Journal of the American Water Resources Association 2002; 38 (01) 173-86.
  • 38 Matsuda S, Vert JP, Saigo H, Ueda N, Toh H, Akutsu T. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci 2005; 14 (11) 2804-13.
  • 39 Maurer W. Creative and innovative statistics in clinical research and development. Methods Inf Med 2005; 44 (04) 551-60.
  • 40 Muller KR, Ratsch G, Sonnenburg S, Mika S, Grimm M, Heinrich N. Classifying ‘drug-likeness’ with Kernel-based learning methods. J Chem Inf Model 2005; 45 (02) 249-53.
  • 41 Narayanan MN, Lucas SB. A genetic algorithm to improve aneural network to predict a patient’s response to warfarin. Methods Inf Med 1993; 32 (01) 55-8.
  • 42 Pattini L, Cerutti S. Hydrophobicity analysis of protein primary structures to identify helical regions. Methods Inf Med 2004; 43 (01) 102-5.
  • 43 Rijsbergen CJv. Information Retireval. London: Butterworths; 1979
  • 44 Ryan TE, Patterson SD. Proteomics: drug target discovery on an industrial scale. Trends Biotechnol 2002; 20 (12) S45-51.
  • 45 Stankovski V, Bratko I, Demsar J, Smrke D. Induction of hypotheses concerning hip arthroplasty: a modified methodology for medical research. Methods Inf Med 2001; 40 (05) 392-6.
  • 46 Vapnik VN. The nature of statistical learning theory. 02 New York: Springer; xix 314 2000
  • 47 Vapnik VN. An overview of statistical learning theory. Ieee Transactions on Neural Networks 1999; 10 (05) 988-99.
  • 48 Vapnik VN. Statistical learning theory. New York: Wiley; xxiv 736 1998
  • 49 Walke DW, Han C, Shaw J, Wann E, Zambrowicz B, Sands A. In vivo drug target discovery: identifying the best targets from the genome. Curr Opin Biotechnol 2001; 12 (06) 626-31.
  • 50 Williams M. Genome-based drug discovery: prioritizing disease-susceptibility/disease-associated genes as novel drug targets for schizophrenia. Curr Opin Investig Drugs 2003; 04 (01) 31-6.
  • 51 Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. Drug Bank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006; 34 (Database issue) D668-72.
  • 52 Yang Y, Liu X. A re-examination of text categorization methods. The ACM SIGIR Conference on Research and Development in Information Retrieval. 1999
  • 53 Yevich JP. Drug development: from discovery to marketing. Krogsgaard-Larsen PLT, Madsen U. A textbook of drug design and development.. Australia: Harwood academic; 1996: 508.
  • 54 Yin MJ, Yamamoto Y, Gaynor RB. The antiinflammatory agents aspirin and salicylate inhibit the activity of I(kappa)B kinase-beta. Nature 1998; 396 6706 77-80.
  • 55 Yuan Z, Burrage K, Mattick JS. Prediction of protein solvent accessibility using support vector machines. Proteins 2002; 48 (03) 566-70.
  • 56 Zhao C, Zhang H, Zhang X, Zhang R, Luan F, Liu M, Hu Z, Fan B. Prediction of Milk/Plasma Drug Concentration (M/P) Ratio Using Support Vector Machine (SVM) Method. Pharm Res. 2005
  • 57 Zhao CY, Zhang HX, Zhang XY, Liu MC, Hu ZD, Fan BT. Application of support vector machine (SVM) for prediction of toxic activity of different data sets. Toxicology 2006; 217 2-3 105-19.