Methods Inf Med 2017; 56(03): 209-216
DOI: 10.3414/ME16-01-0116
Paper
Schattauer GmbH

A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering

Mourad Sarrouti
1   Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco
,
Said Ouatik El Alaoui
1   Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco
› Author Affiliations
Further Information

Publication History

received: 07 October 2016

accepted in revised form: 11 January 2017

Publication Date:
24 January 2018 (online)

Summary

Background and Objective: Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary.

Methods: In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine- learning algorithms. Finally, the class label is predicted using the trained classifiers.

Results: Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features’ provider of support vector machine (SVM) lead to the highest accuracy of 89.40%.

Conclusion: The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.

 
  • References

  • 1 Bauer CRKD, Ganslandt T, Baum B, Christoph J, Engel I, Lobe M, Mate S, Staubert S, Drepper J, Prokosch HU, Winter A, Sax U. Integrated Data Repository Toolkit (IDRT): A Suite of Programs to Facilitate Health Analytics on Heterogeneous Medical Data. Methods of Information in Medicine. 2016; 55 (02) 125-135.
  • 2 Baer B, Nguyen M, Woo EJ, Winiecki S, Scott J, Martin D, Botsis T, Ball R. Can Natural Language Processing Improve the Efficiency of Vaccine Adverse Event Report Review?. Methods of Information in Medicine. 2016; 55 (02) 144-150.
  • 3 Gietzelt M, Lopprich M, Karmen C, Knaup P, Ganzinger M. Model and Data Sources Used in Systems Medicine. Methods of Information in Medicine. 2016; 55 (02) 107-113.
  • 4 Hristovski D, Dinevski D, Kastrin A, Rindflesch TC. Biomedical question answering using semantic relations. BMC Bioinformatics. 2015; 16 (01) 6.
  • 5 Athenikos SJ, Han H. Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine. 2010; 99 (01) 1-24.
  • 6 Loni B. A survey of state-of-the-art methods on question classification.. Delft: Delft University of Technology; 2011: 40.
  • 7 Neves M, Leser U. Question answering for Biology. Methods. 2015; 74: 36-46.
  • 8 Voorhees EM. The TREC question answering track. Natural Language Engineering. 2001; 7: 361-378.
  • 9 Abacha AB, Zweigenbaum P. MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Information Processing & Management. 2015; 51 (05) 570-594.
  • 10 Moldovan D, Paçca M, Harabagiu S, Surdeanu M. Performance issues and error analysis in an open- domain question answering system. ACM Transactions on Information Systems (TOIS). 2003; 21 (02) 133-154.
  • 11 Tsatsaronis G, Balikas G, Malakasiotis P, Partalas I, Zschunke M, Alvers MR, Weissenborn D, Krithara A, Petridis S, Polychronopoulos D, Almirantis Y, Pavlopoulos J, Baskiotis N, Gallinari P, Artiéres T, Ngomo ACN, Heino N, Gaussier E, Barrio-Alvers L, Schroeder M, Androutsopoulos I, Paliouras G. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics. 2015; 16 (01) 1-28.
  • 12 Prager J, Radev D, Brown E, Coden A, Samn V. The use of predictive annotation for question answering in TREC8. Information Retrieval. 1999; 1 (03) 4.
  • 13 Silva J, Coheur L, Mendes AC, Wichert A. From symbolic to sub-symbolic information in question classification. Artificial Intelligence Review. 2011; 35 (02) 137-154.
  • 14 Tomuro N. Question terminology and representation for question type classification. In: COL- ING-02 on COMPUTERM 2002: Second International Workshop on Computational Terminology - Volume 14. 2002: 1-7.
  • 15 Liang Z, Lang Z, Jia-Jun C. Structure analysis and computation-based Chinese question classification. In: Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007). 2007: 39-44.
  • 16 Ely JW, Osheroff JA, Gorman PN, Ebell MH, Chambliss ML, Pifer EA, Stavri PZ. A taxonomy of generic clinical questions: classification study. BMJ. 2000; 321 7258 429-432.
  • 17 Jacquemart P, Zweigenbaum P. Towards a medical question-answering system: a feasibility study. Studies in Health Technology and Informatics. 2003; 95: 463-468.
  • 18 Cao YG, Cimino JJ, Ely J, Yu H. Automatically extracting information needs from complex clinical questions. Journal of Biomedical Informatics. 2010; 43 (06) 962-971.
  • 19 Yang Z, Gupta N, Sun X, Xu D, Zhang C, Nyberg E. Learning to Answer Biomedical Factoid and List Questions OAQA at BioASQ 3B. In: Working Notes for the Conference and Labs of the Evaluation Forum (CLEF). 2015
  • 20 Neves M. HPI question answering system in the BioASQ 2015 challenge. In: Working Notes for the Conference and Labs of the Evaluation Forum (CLEF). 2015
  • 21 Sarrouti M, Lachkar A, Ouatik SEA. Biomedical Question Types Classification using Syntactic and Rule based Approach. In: Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. 2015: 265-272.
  • 22 Vicedo JL, Gómez J. TREC: Experiment and evaluation in information retrieval. J Am Soc Inf Sci. 2007; 58 (06) 910-911.
  • 23 Khoury R. Question type classification using a part-of-speech hierarchy. Kamel M, Karray F, Gueaieb W, Khamis A. Autonomous and Intelligent Systems.. Berlin, Heidelberg: Springer; 2011: 212-221.
  • 24 Biswas P, Sharan A, Kumar R. Question Classification using syntactic and rule based approach. 2014 International Conference on Advances in Computing. Communications and Informatics (ICACCI); 2014: 1033-1038.
  • 25 Huang Z, Thint M, Qin Z. Question classification using head words and their hypernyms. Proceedings of the Conference on Empirical Methods in Natural Language Processing.. EMNLP ‘08; 2008: 927-936.
  • 26 Li X, Roth D. Learning question classifiers: the role of semantic information. Natural Language Engineering. 2005; 12 (03) 229-249.
  • 27 Yu Z, Ting L, Xu W. Modified Bayesian model based question classification. Journal of Chinese Information Processing. 2005; 19 (02) 100-105.
  • 28 Xu W, Zhang Y, Ting L, Jin-Shan M. Syntactic Structure Parsing Based Chinese Question Classification. Journal of Chinese Information Processing. 2006; 20: 006.
  • 29 Li F, Zhang X, Yuan J, Zhu X. Classifying what- type questions by head noun tagging. In: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1. 2008: 481-488.
  • 30 Xu J, Zhou Y, Wang Y. A classification of questions using SVM and semantic similarity analysis. In: Sixth International Conference on Internet Computing for Science and Engineering (ICICSE). 2012: 31-34.
  • 31 Yu H, Sable C, Zhu HR. Classifying medical questions based on an evidence taxonomy. Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains. 2005
  • 32 Patrick J, Li M. An ontology for clinical questions about the contents of patient notes. Journal of Biomedical Informatics. 2012; 45 (02) 292-306.
  • 33 Roberts K, Kilicoglu H, Fiszman M, Demner-Fushman D. Automatically Classifying Question Types for Consumer Health Questions. In: AMIA Annu Symp Proc. 2014: 1018-1027.
  • 34 Balikas G, Kosmopoulos A, Krithara A, Paliouras G, Kakadiaris I. Results of the BioASQ tasks of the Question Answering Lab at CLEF 2015. In: CLEF 2015. 2015
  • 35 Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2014
  • 36 Lin J, Wilbur WJ. Syntactic sentence compression in the biomedical domain: facilitating access to related articles. Information Retrieval. 2007; 10 4-5 393-414.
  • 37 Li X, Roth D. Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics - Volume 1. 2002: 1-7.
  • 38 Metzler D, Croft WB. Analysis of statistical question classification for fact-based questions. Information Retrieval. 2005; 8 (03) 481-504.