A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering

Mourad Sarrouti; Said Ouatik El Alaoui

doi:10.3414/ME16-01-0116

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

Methods Inf Med 2017; 56(03): 209-216
DOI: 10.3414/ME16-01-0116

Paper

Schattauer GmbH

A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering

Mourad Sarrouti

¹Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco

,

Said Ouatik El Alaoui

¹Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco

› Author Affiliations

Further Information

Publication History

received: 07 October 2016

accepted in revised form: 11 January 2017

Publication Date:
24 January 2018 (online)

Abstract
Full Text
References

Permissions and Reprints

Summary

Background and Objective: Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary.

Methods: In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine- learning algorithms. Finally, the class label is predicted using the trained classifiers.

Results: Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features’ provider of support vector machine (SVM) lead to the highest accuracy of 89.40%.

Conclusion: The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.

Keywords

Biomedical question answering - information retrieval - biomedical question classification - natural language processing - biomedical informatics

References
1 Bauer CRKD, Ganslandt T, Baum B, Christoph J, Engel I, Lobe M, Mate S, Staubert S, Drepper J, Prokosch HU, Winter A, Sax U. Integrated Data Repository Toolkit (IDRT): A Suite of Programs to Facilitate Health Analytics on Heterogeneous Medical Data. Methods of Information in Medicine. 2016; 55 (02) 125-135.

Article in Thieme Connect PubMed Google Scholar
2 Baer B, Nguyen M, Woo EJ, Winiecki S, Scott J, Martin D, Botsis T, Ball R. Can Natural Language Processing Improve the Efficiency of Vaccine Adverse Event Report Review?. Methods of Information in Medicine. 2016; 55 (02) 144-150.

Article in Thieme Connect PubMed Google Scholar
3 Gietzelt M, Lopprich M, Karmen C, Knaup P, Ganzinger M. Model and Data Sources Used in Systems Medicine. Methods of Information in Medicine. 2016; 55 (02) 107-113.

Article in Thieme Connect PubMed Google Scholar
4 Hristovski D, Dinevski D, Kastrin A, Rindflesch TC. Biomedical question answering using semantic relations. BMC Bioinformatics. 2015; 16 (01) 6.

PubMed Google Scholar
5 Athenikos SJ, Han H. Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine. 2010; 99 (01) 1-24.

Crossref PubMed Google Scholar
6 Loni B. A survey of state-of-the-art methods on question classification.. Delft: Delft University of Technology; 2011: 40.
7 Neves M, Leser U. Question answering for Biology. Methods. 2015; 74: 36-46.

Crossref PubMed Google Scholar
8 Voorhees EM. The TREC question answering track. Natural Language Engineering. 2001; 7: 361-378.

PubMed Google Scholar
9 Abacha AB, Zweigenbaum P. MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Information Processing & Management. 2015; 51 (05) 570-594.

Crossref PubMed Google Scholar
10 Moldovan D, Paçca M, Harabagiu S, Surdeanu M. Performance issues and error analysis in an open- domain question answering system. ACM Transactions on Information Systems (TOIS). 2003; 21 (02) 133-154.

Crossref PubMed Google Scholar
11 Tsatsaronis G, Balikas G, Malakasiotis P, Partalas I, Zschunke M, Alvers MR, Weissenborn D, Krithara A, Petridis S, Polychronopoulos D, Almirantis Y, Pavlopoulos J, Baskiotis N, Gallinari P, Artiéres T, Ngomo ACN, Heino N, Gaussier E, Barrio-Alvers L, Schroeder M, Androutsopoulos I, Paliouras G. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics. 2015; 16 (01) 1-28.

PubMed Google Scholar
12 Prager J, Radev D, Brown E, Coden A, Samn V. The use of predictive annotation for question answering in TREC8. Information Retrieval. 1999; 1 (03) 4.

PubMed Google Scholar
13 Silva J, Coheur L, Mendes AC, Wichert A. From symbolic to sub-symbolic information in question classification. Artificial Intelligence Review. 2011; 35 (02) 137-154.

Crossref PubMed Google Scholar
14 Tomuro N. Question terminology and representation for question type classification. In: COL- ING-02 on COMPUTERM 2002: Second International Workshop on Computational Terminology - Volume 14. 2002: 1-7.

PubMed Google Scholar
15 Liang Z, Lang Z, Jia-Jun C. Structure analysis and computation-based Chinese question classification. In: Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007). 2007: 39-44.

PubMed Google Scholar
16 Ely JW, Osheroff JA, Gorman PN, Ebell MH, Chambliss ML, Pifer EA, Stavri PZ. A taxonomy of generic clinical questions: classification study. BMJ. 2000; 321 7258 429-432.

Crossref PubMed Google Scholar
17 Jacquemart P, Zweigenbaum P. Towards a medical question-answering system: a feasibility study. Studies in Health Technology and Informatics. 2003; 95: 463-468.

PubMed Google Scholar
18 Cao YG, Cimino JJ, Ely J, Yu H. Automatically extracting information needs from complex clinical questions. Journal of Biomedical Informatics. 2010; 43 (06) 962-971.

Crossref PubMed Google Scholar
19 Yang Z, Gupta N, Sun X, Xu D, Zhang C, Nyberg E. Learning to Answer Biomedical Factoid and List Questions OAQA at BioASQ 3B. In: Working Notes for the Conference and Labs of the Evaluation Forum (CLEF). 2015

PubMed Google Scholar
20 Neves M. HPI question answering system in the BioASQ 2015 challenge. In: Working Notes for the Conference and Labs of the Evaluation Forum (CLEF). 2015

PubMed Google Scholar
21 Sarrouti M, Lachkar A, Ouatik SEA. Biomedical Question Types Classification using Syntactic and Rule based Approach. In: Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. 2015: 265-272.

PubMed Google Scholar
22 Vicedo JL, Gómez J. TREC: Experiment and evaluation in information retrieval. J Am Soc Inf Sci. 2007; 58 (06) 910-911.

Crossref PubMed Google Scholar
23 Khoury R. Question type classification using a part-of-speech hierarchy. Kamel M, Karray F, Gueaieb W, Khamis A. Autonomous and Intelligent Systems.. Berlin, Heidelberg: Springer; 2011: 212-221.

Google Scholar
24 Biswas P, Sharan A, Kumar R. Question Classification using syntactic and rule based approach. 2014 International Conference on Advances in Computing. Communications and Informatics (ICACCI); 2014: 1033-1038.

Google Scholar
25 Huang Z, Thint M, Qin Z. Question classification using head words and their hypernyms. Proceedings of the Conference on Empirical Methods in Natural Language Processing.. EMNLP ‘08; 2008: 927-936.

Google Scholar
26 Li X, Roth D. Learning question classifiers: the role of semantic information. Natural Language Engineering. 2005; 12 (03) 229-249.

PubMed Google Scholar
27 Yu Z, Ting L, Xu W. Modified Bayesian model based question classification. Journal of Chinese Information Processing. 2005; 19 (02) 100-105.

PubMed Google Scholar
28 Xu W, Zhang Y, Ting L, Jin-Shan M. Syntactic Structure Parsing Based Chinese Question Classification. Journal of Chinese Information Processing. 2006; 20: 006.

PubMed Google Scholar
29 Li F, Zhang X, Yuan J, Zhu X. Classifying what- type questions by head noun tagging. In: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1. 2008: 481-488.

PubMed Google Scholar
30 Xu J, Zhou Y, Wang Y. A classification of questions using SVM and semantic similarity analysis. In: Sixth International Conference on Internet Computing for Science and Engineering (ICICSE). 2012: 31-34.

PubMed Google Scholar
31 Yu H, Sable C, Zhu HR. Classifying medical questions based on an evidence taxonomy. Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains. 2005

PubMed Google Scholar
32 Patrick J, Li M. An ontology for clinical questions about the contents of patient notes. Journal of Biomedical Informatics. 2012; 45 (02) 292-306.

Crossref PubMed Google Scholar
33 Roberts K, Kilicoglu H, Fiszman M, Demner-Fushman D. Automatically Classifying Question Types for Consumer Health Questions. In: AMIA Annu Symp Proc. 2014: 1018-1027.

PubMed Google Scholar
34 Balikas G, Kosmopoulos A, Krithara A, Paliouras G, Kakadiaris I. Results of the BioASQ tasks of the Question Answering Lab at CLEF 2015. In: CLEF 2015. 2015

PubMed Google Scholar
35 Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2014

PubMed Google Scholar
36 Lin J, Wilbur WJ. Syntactic sentence compression in the biomedical domain: facilitating access to related articles. Information Retrieval. 2007; 10 4-5 393-414.

Crossref PubMed Google Scholar
37 Li X, Roth D. Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics - Volume 1. 2002: 1-7.

PubMed Google Scholar
38 Metzler D, Croft WB. Analysis of statistical question classification for fact-based questions. Information Retrieval. 2005; 8 (03) 481-504.

Crossref PubMed Google Scholar

Subscribe to RSS

Share / Bookmark

A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering

Publication History

Summary

Keywords

References