How to Exploit Twitter for Public Health Monitoring?

K. Denecke; M. Krieck; L. Otrusina; P. Smrz; P. Dolog; W. Nejdl; E. Velasco

doi:10.3414/ME12-02-0010

Methods of Information in Medicine, Inhaltsverzeichnis

Methods Inf Med 2013; 52(04): 326-339
DOI: 10.3414/ME12-02-0010

Focus Theme – Original Articles

Schattauer GmbH

How to Exploit Twitter for Public Health Monitoring?

Autor*innen

K. Denecke

¹Innovation Center Computer Assisted Surgery, Leipzig, Germany

⁵Forschungszentrum L3S, Hannover, Germany
M. Krieck

²Niedersächsisches Landesgesundheitsamt, Hannover, Germany
L. Otrusina

³Brno University of Technology, Brno, Czech Republic
P. Smrz

³Brno University of Technology, Brno, Czech Republic
P. Dolog

⁴Aalborg University, Aalborg, Denmark
W. Nejdl

⁵Forschungszentrum L3S, Hannover, Germany
E. Velasco

⁶Robert Koch Institut, Berlin, Germany

Abstract

Summary

Objectives: Detecting hints to public health threats as early as possible is crucial to prevent harm from the population. However, many disease surveillance strategies rely upon data whose collection requires explicit reporting (data transmitted from hospitals, laboratories or physicians). Collecting reports takes time so that the reaction time grows. Moreover, context information on individual cases is often lost in the collection process. This paper describes a system that tries to address these limitations by processing social media for identifying information on public health threats. The primary objective is to study the usefulness of the approach for supporting the monitoring of a population's health status.

Methods: The developed system works in three main steps: Data from Twitter, blogs, and forums as well as from TV and radio channels are continuously collected and filtered by means of keyword lists. Sentences of relevant texts are classified relevant or irrelevant using a binary classifier based on support vector machines. By means of statistical methods known from biosurveillance, the relevant sentences are further analyzed and signals are generated automatically when unexpected behavior is detected. From the generated signals a subset is selected for presentation to a user by matching with user queries or profiles. In a set of evaluation experiments, public health experts assessed the generated signals with respect to correctness and relevancy. In particular, it was assessed how many relevant and irrelevant signals are generated during a specific time period.

Results: The experiments show that the system provides information on health events identified in social media. Signals are mainly generated from Twitter messages posted by news agencies. Personal tweets, i.e. tweets from persons observing some symptoms, only play a minor role for signal generation given a limited volume of relevant messages. Relevant signals referring to real world outbreaks were generated by the system and monitored by epidemiologists for example during the European football championship. But, the number of relevant signals among generated signals is still very small: The different experiments yielded a proportion between 5 and 20% of signals regarded as “relevant” by the users. Vaccination or education campaigns communicated via Twitter as well as use of medical terms in other contexts than for outbreak reporting led to the generation of irrelevant signals.

Conclusions: The aggregation of information into signals results in a reduction of monitoring effort compared to other existing systems. Against expectations, only few messages are of personal nature, reporting on personal symptoms. Instead, media reports are distributed over social media channels. Despite the high percentage of irrele vant signals generated by the system, the users reported that the effort in monitoring aggregated information in form of signals is less demanding than monitoring huge social-media data streams manually. It remains for the future to develop strategies for reducing false alarms.

Keywords

Textmining - Web science - public health - population surveillance - epidemic intelligence - Medicine 2.0

Volltext

Referenzen

References
1 Paquet C, Coulombier D, Kaiser R, Ciotti M. Epidemic intelligence: a new framework for strengthening disease surveillance in Europe. Euro Surveill 2006; 11 (12) 212-214.
2 Denecke K, Brooks E. Webscience in Medicine and Healthcare. Methods Inf Med 2013; 52 (02) 148-151.
3 Wu H, Fang H. Exploiting Online Discussions to Discover Unrecognized Drug Side Effects. Methods Inf Med 2013; 52 (02) 152-159.
4 Chomutare T, Årsand E, Luque L, Lauritzen J, Hartvigsen G. Inferring Community Structure in Healthcare Forums: an Empirical Study. Methods Inf Med 2013; 52 (02) 160-167.
5 Konstantinidis S, Luque L, Bamidis P, Karlsen R. The role of Taxonomies in Social Media and Semantic Web for Health Education: A study for SNOMED CT terms in YouTube Health Video tags. Methods Inf Med 2013; 52 (02) 168-179.
6 Collier N. Uncovering text mining: A survey of current work on web-based epidemic intelligence. Global Public Health. 2012: 731-749.
7 Linge JP, Steinberger R, Fuart F. et al MedISys - Medical Information System. In Bessis N, Asimakopoulou E. (eds.) Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks. IGI Global Press; 2010: 131-142.
8 von Etter P, Huttunen S, Vihavainen A, Vuorinen M, Yangarber R. Assessment of utility in web mining for the domain of public health. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. 2012: 29-37.
9 Dugas AF, Hsieh Y-H, Levin SR. et al Google Flu Trends: Correlation with emergency department influenza rates and crowding metrics. Clin Infect Dis 2012; 54 (04) 463-469.
10 Cook S, Conrad C, Fowlkes AL, Mohebbi MH. Assessing Google Flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One 2011; 6 (08) e23610
11 Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis 2009; 49 (10) 1557-1564.
12 Eysenbach G. Infodemiology and Infoveillance: Tracking Online Health Information and Cyberbehaviour for Public Health. American Journal of Preventive Medicine 2011; 40 (Suppl. 05) Suppl. 2 154-158.
13 Madoff LC. Promed-mail: An early warning system for emerging disease. Clin Infect Dis 2004; 39 (02) 227-232.
14 Global Public Health Intelligence Network, GPHIN. http://www.phac-aspc.gc.ca/gphin/. (last access: 24.04.2013)
15 BioCaster Global Health Monitor. http://born.nii.ac.jp/. (last access: 24.04.2013)
16 HealthMap. http://www.healthmap.org/. (last access: 24.04.2013)
17 Collier N. et al BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics 2008; 24 (24) 2940-2941.
18 Steinberger R, Fuart F, van der Groot E, Best C, von Etter P, Yangarber R. Text mining from the web for medical intelligence. Mining Massive Data Sets for Security 2008; 19: 295-310.
19 Grishman R, Huttunen Y, Yangarber R. Information extraction for enhanced access to disease outbreak reports. J of Biomedical Informatics 2002; 35 (04) 236-246.
20 Keller M, Blench M, Tolentino H. et al Use of unstructured event-based reports for global infectious disease surveillance. Emerg Infect Disease 2009; 15 (05) 689-695.
21 Hartley D. et al The landscape of international biosurveillance. Emerg Health Threats J. 2010: 3 (published online)
22 Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis 2009; 49 (10) 1557-1564.
23 Corley CD, Cook DJ, Mikler AR, Singh KP. Text and Structural Data Mining of Influenza Mentions in Web and Social Media. Int J Environ Res Public Health 2010; 7 (02) 596-615.
24 Bilge U, Bozkurt S, Yolcular BO, Ozel D. Can social web help to detect influenza related illnesses in Turkey?. Stud Health Technol Inform 2012; 174: 100-104.
25 Chan EH, Sahai V, Conrad C, Brownstein JS. Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance. PLoS Negl Trop Dis 2011; 5 (05) e1206 doi:10.1371/journal.pntd.0001206
26 Backfried G, Schmidt C, Pfeiffer M, Quirchmayr G, Glanzer M, Rainer K. Open Source Intelligence for disaster Management. Intelligence and Security Informatics Conference (EISIC). 2012: 254-258.
27 Multilingual cross-domain temporal tagger HeidelTime. http://dbs.ifi.uni-heidelberg.de/heideltime. (last access: 24.04.2013)
28 Stanford Named Entity Recognizer. http://nlp.stanford.edu/software/CRF-NER.shtml. (last access: 24.04.2013)
29 Stewart A, Smith M, Nejdl W. A transfer approach to detecting disease reporting events in blog social media. HT 2011. 2011: 271-280.
30 Moschitti A, Pighin D, Basili R. Semantic Role Labeling via Tree Kernel joint inference. In: Proceedings of the 10th Conference on Computational Natural Language Learning. New York, USA: 2006.
31 Höhle M. Surveillance: An R package for the surveillance of infectious diseases. Computational Statistics 2007; 22 (04) 571-582.
32 Rossi G, Lampugnani L, Marchi M. An approximate CUSUM procedure for surveillance of health events. Statistics in Medicine 1999; 18: 2111-2122.
33 Farrington P, Andrews N, Beale A, Catchpole M. A statistical algorithm for the early detection of outbreaks of infectious disease. J R Statist Soc A 1996; 159: 547-563.
34 Lage R, Durao F, Dolog P. Towards Effective Group Recommendations for Microblogging Users. Proceedings of the ACM Symposium on Applied Computing, SAC. Italy: 2012: 923-928.
35 Leginus M, Dolog P, Zemaitis V. Improving tensor based recommenders with clustering. In: Proceedings of the 20th conference on User Modeling, Adaptation, and Personalization UMAP 2012. 2012.
36 Gottron T. Document Word Clouds: Visualising Web Documents as Tag Clouds to Aid Users in Relevance Decisions. In Agosti M. et al (eds.) ECDL 2009, LNCS 5714. Berlin, Heidelberg: Springer; 2009: 94-105.