Summary
Objectives: When patients complete questionnaires during health checkups, many of their responses
are subjective, making topic extraction difficult. Therefore, the purpose of this
study was to develop a model capable of extracting appropriate topics from subjective
data in questionnaires conducted during health checkups.
Methods: We employed a latent topic model to group the lifestyle habits of the study participants
and represented their responses to items on health checkup questionnaires as a probability
model. For the probability model, we used latent Dirichlet allocation to extract 30
topics from the questionnaires. According to the model parameters, a total of 4381
study participants were then divided into groups based on these topics. Results from
laboratory tests, including blood glucose level, triglycerides, and estimated glomerular
filtration rate, were compared between each group, and these results were then compared
with those obtained by hierarchical clustering.
Results: If a significant (p < 0.05) difference was observed in any of the laboratory measurements
between groups, it was considered to indicate a questionnaire response pattern corresponding
to the value of the test result. A comparison between the latent topic model and hierarchical
clustering grouping revealed that, in the latent topic model method, a small group
of participants who reported having subjective signs of uri-nary disorder were allocated
to a single group.
Conclusions: The latent topic model is useful for extracting characteristics from a small number
of groups from questionnaires with a large number of items. These results show that,
in addition to chief complaints and history of past illness, questionnaire data obtained
during medical checkups can serve as useful judgment criteria for assessing the conditions
of patients.
Keywords
Health status - classification - health checkup questionnaire - latent Dirichlet allocation