14 Oct 2013
With factors such as air travel act as a catalyst in the spread of new and novel viruses, the need to improve global population monitoring and enhance surveillance of infectious diseases is more pressing than ever. Patty Kostkova (UCL) discusses how the real-time streams of user data generated on social networks like Twitter can be used for monitoring the health of large populations, providing a potential early warning function for pandemics, detecting flu spikes weeks before official surveillance systems. Watch Patty talk on this subject at the OII.
Ed: Could you briefly outline your study?
Patty: We investigated the role of Twitter during the 2009 swine flu pandemics from two perspectives. Firstly, we demonstrated the role of the social network to detect an upcoming spike in an epidemic before the official surveillance systems – up to week in the UK and up to 2-3 weeks in the US – by investigating users who “self-diagnosed” themselves posting tweets such as “I have flu / swine flu”. Secondly, we illustrated how online resources reporting the WHO declaration of “pandemics” on 11 June 2009 were propagated through Twitter during the 24 hours after the official announcement [1,2,3].
Ed: Disease control agencies already routinely follow media sources; are public health agencies aware of social media as another valuable source of information?
Patty: Social media are providing an invaluable real-time data signal complementing well-established epidemic intelligence (EI) systems monitoring online media, such as MedISys and GPHIN. While traditional surveillance systems will remain the pillars of public health, online media monitoring has added an important early-warning function, with social media bringing additional benefits to epidemic intelligence: virtually real-time information available in the public domain that is contributed by users themselves, thus not relying on the editorial policies of media agencies.
Public health agencies (such as the European Centre for Disease Prevention and Control) are interested in social media early warning systems, but more research is required to develop robust social media monitoring solutions that are ready to be integrated with agencies’ EI services.
Ed: How difficult is this data to process? Eg: is this a full sample, processed in real-time?
Patty: No, obtaining all Twitter search query results is not possible. In our 2009 pilot study we were accessing data from Twitter using a search API interface querying the database every minute (the number of results was limited to 100 tweets). Currently, only 1% of the ‘Firehose’ (massive real-time stream of all public tweets) is made available using the streaming API. The searches have to be performed in real-time as historical Twitter data are normally available only through paid services. Twitter analytics methods are diverse; in our study, we used frequency calculations, developed algorithms for geo-location, automatic spam and duplication detection, and applied time series and cross-correlation with surveillance data [1,2,3].
Ed: What’s the relationship between traditional and social media in terms of diffusion of health information? Do you have a sense that one may be driving the other?
Patty: This is a fundamental question. “Does media coverage of certain topic causes buzz on social media or does social media discussion causes media frenzy?” This was particularly important to investigate for the 2009 swine flu pandemic, which experienced unprecedented media interest. While it could be assumed that disease cases preceded media coverage, or that media discussion sparked public interest causing Twitter debate, neither proved to be the case in our experiment. On some days, media coverage for flu was higher, and on others Twitter discussion was higher; but peaks seemed synchronized – happening on the same days.
Ed: In terms of communicating accurate information, does the Internet make the job easier or more difficult for health authorities?
Patty: The communication of risk in any public health emergencies is a complex task for government and healthcare agencies; this task is made more challenging when citizens are bombarded with online information, from a variety of sources that vary in accuracy. This has become even more challenging with the increase in users accessing health-related information on their mobile phones (17% in 2010 and 31% in 2012, according to the US Pew Internet study).
Our findings from analyzing Twitter reaction to online media coverage of the WHO declaration of swine flu as a “pandemic” (stage 6) on 11 June 2009, which unquestionably was the most media-covered event during the 2009 epidemic, indicated that Twitter does favour reputable sources (such as the BBC, which was by far the most popular) but also that bogus information can still leak into the network.
Ed: What differences do you see between traditional and social media, in terms of eg bias / error rate of public health-related information?
Patty: Fully understanding quality of media coverage of health topics such as the 2009 swine flu pandemics in terms of bias and medical accuracy would require a qualitative study (for example, one conducted by Duncan in the EU ). However, the main role of social media, in particular Twitter due to the 140 character limit, is to disseminate media coverage by propagating links rather than creating primary health information about a particular event. In our study around 65% of tweets analysed contained a link.
Ed: Google flu trends (which monitors user search terms to estimate worldwide flu activity) has been around a couple of years: where is that going? And how useful is it?
Patty: Search companies such as Google have demonstrated that online search queries for keywords relating to flu and its symptoms can serve as a proxy for the number of individuals who are sick (Google Flu Trends), however, in 2013 the system “drastically overestimated peak flu levels”, as reported by Nature. Most importantly, however, unlike Twitter, Google search queries remain proprietary and are therefore not useful for research or the construction of non-commercial applications.
Ed: What are implications of social media monitoring for countries that may want to suppress information about potential pandemics?
Patty: The importance of event-based surveillance and monitoring social media for epidemic intelligence is of particular importance in countries with sub-optimal surveillance systems and those lacking the capacity for outbreak preparedness and response. Secondly, the role of user-generated information on social media is also of particular importance in counties with limited freedom of press or those that actively try to suppress information about potential outbreaks.
Ed: Would it be possible with this data to follow spread geographically, ie from point sources, or is population movement too complex to allow this sort of modelling?
Patty: Spatio-temporal modelling is technically possible as tweets are time-stamped and there is a support for geo-tagging. However, the location of all tweets can’t be precisely identified; however, early warning systems will improve in accuracy as geo-tagging of user generated content becomes widespread. Mathematical modelling of the spread of diseases and population movements are very topical research challenges (undertaken by, for example, by Colliza et al. ) but modelling social media user behaviour during health emergencies to provide a robust baseline for early disease detection remains a challenge.
Ed: A strength of monitoring social media is that it follows what people do already (eg search / Tweet / update statuses). Are there any mobile / SNS apps to support collection of epidemic health data? eg a sort of ‘how are you feeling now’ app?
Patty: The strength of early warning systems using social media is exactly in the ability to piggy-back on existing users’ behaviour rather than having to recruit participants. However, there are a growing number of participatory surveillance systems that ask users to provide their symptoms (web-based such as Flusurvey in the UK, and “Flu Near You” in the US that also exists as a mobile app). While interest in self-reporting systems is growing, challenges include their reliability, user recruitment and long-term retention, and integration with public health services; these remain open research questions for the future. There is also a potential for public health services to use social media two-ways – by providing information over the networks rather than only collect user-generated content. Social media could be used for providing evidence-based advice and personalized health information directly to affected citizens where they need it and when they need it, thus effectively engaging them in active management of their health.
[1.] M Szomszor, P Kostkova, C St Louis: Twitter Informatics: Tracking and Understanding Public Reaction during the 2009 Swine Flu Pandemics, IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology 2011, WI-IAT, Vol. 1, pp.320-323.
[2.] Szomszor, M., Kostkova, P., de Quincey, E. (2010). #swineflu: Twitter Predicts Swine Flu Outbreak in 2009. M Szomszor, P Kostkova (Eds.): ehealth 2010, Springer Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering LNICST 69, pages 18-26, 2011.
[3.] Ed de Quincey, Patty Kostkova Early Warning and Outbreak Detection Using Social Networking Websites: the Potential of Twitter, P Kostkova (Ed.): ehealth 2009, Springer Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering LNICST 27, pages 21-24, 2010.
[4.] B Duncan. How the Media reported the first day of the pandemic H1N1) 2009: Results of EU-wide Media Analysis. Eurosurveillance, Vol 14, Issue 30, July 2009
[5.] Colizza V, Barrat A, Barthelemy M, Valleron AJ, Vespignani A (2007) Modeling the worldwide spread of pandemic influenza: Baseline case an containment interventions. PloS Med 4(1): e13. doi:10.1371/journal. pmed.0040013
Further information on this project and related activities, can be found at: BMJ-funded scientific film: http://www.youtube.com/watch?v=_JNogEk-pnM ; Can Twitter predict disease outbreaks? http://www.bmj.com/content/344/bmj.e2353 ; 1st International Workshop on Public Health in the Digital Age: Social Media, Crowdsourcing and Participatory Systems (PHDA 2013): http://www.digitalhealth.ws/ ; Social networks and big data meet public health @ WWW 2013: http://www2013.org/2013/04/25/social-networks-and-big-data-meet-public-health/
Patty Kostkova was talking to blog editor David Sutcliffe.
Dr Patty Kostkova is a Principal Research Associate in eHealth at the Department of Computer Science, University College London (UCL) and held a Research Scientist post at the ISI Foundation in Italy. Until 2012, she was the Head of the City eHealth Research Centre (CeRC) at City University, London, a thriving multidisciplinary research centre with expertise in computer science, information science and public health. In recent years, she was appointed a consultant at WHO responsible for the design and development of information systems for international surveillance.
Researchers who were instrumental in this project include Ed de Quincey, Martin Szomszor and Connie St Louis.