Summary
Background and Objective: Biomedical question type classification is one of the important components of an
automatic biomedical question answering system. The performance of the latter depends
directly on the performance of its biomedical question type classification system,
which consists of assigning a category to each question in order to determine the
appropriate answer extraction algorithm. This study aims to automatically classify
biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3)
list, and (4) summary.
Methods: In this paper, we propose a biomedical question type classification method based
on machine learning approaches to automatically assign a category to a biomedical
question. First, we extract features from biomedical questions using the proposed
handcrafted lexico-syntactic patterns. Then, we feed these features for machine- learning
algorithms. Finally, the class label is predicted using the trained classifiers.
Results: Experimental evaluations performed on large standard annotated datasets of biomedical
questions, provided by the BioASQ challenge, demonstrated that our method exhibits
significant improved performance when compared to four baseline systems. The proposed
method achieves a roughly 10-point increase over the best baseline in terms of accuracy.
Moreover, the obtained results show that using handcrafted lexico-syntactic patterns
as features’ provider of support vector machine (SVM) lead to the highest accuracy
of 89.40%.
Conclusion: The proposed method can automatically classify BioASQ questions into one of the four
categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated
that our method produced the best classification performance compared to four baseline
systems.
Keywords
Biomedical question answering - information retrieval - biomedical question classification
- natural language processing - biomedical informatics