A Survey of Internet based Disease Outbreak Discovery Resources

A Survey of Internet-based Disease Outbreak Discovery Resources

The recent pandemic influenza outbreaks H5N1[1] and H1N1[2] have triggered interest in Internet-based disease outbreak discovery and infection count tracking resources.

In this paper we identify and describe a number of resources that provide various health-related data and disease outbreak detection capabilities. These resources fall into four categories: data aggregation, search analysis, government resources and social media. Each of these categories is discussed below with multiple examples collected as of November, 2012.

These materials were developed by the following staff of the Informatics Technology Resource of MIDAS, at RTI International:Douglas Roberts, Diane Wagener, Nathan Gaddis, Phil Cooley, Laxminarayana Ganapathiand Susan (Neely) Kaydos-Daniels.

Data Aggregation

Data aggregation sites collect information from various sources and combine it into a summary analysis.

Below we provide three examples of data aggregation sites: Healthmap, EpiSPIDER and


Healthmap is available here: (Accessed November 2012.)

Healthmap[3] is an infectious disease-tracking Web site that culls through news Web sites, public health listservs, the World Health Organization's online pages and other Web sites in nine different languages to pinpoint outbreaks of disease that real-world doctors can then act on. The freely available Web site and mobile app, 'Outbreaks Near Me,' deliver real-time intelligence on a broad range of emerging infectious diseases for a diverse audience including libraries, local health departments, governments and international travelers.

HealthMap brings together disparate data sources, including online news aggregators, eyewitness reports, expert-curated discussions and validated official reports, to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. Through an automated process, updating 24/7/365, the system monitors, organizes, integrates, filters, visualizes and disseminates online information about emerging diseases in nine languages, facilitating early detection of global public health threats.

Both the full Web version and the mobile app allow the user to report new disease outbreak instances, thus providing a social network component to the resource.

Figure 1 shows the interactive Healthmap web-based interface.

Figure 1.

Figure 2 shows how the user can drill down to specific reported disease outbreak instances with Healthmap.

Figure 2. receives 1,000 - 10,000 visits per day from around the world. It is cited as a resource on sites of agencies such as the United Nations, National Institute of Allergy and Infectious Diseases, U.S. Food and Drug Administration and U.S. Department of Agriculture[4].


EpiSPIDER is available here: (Accessed November 2012.)

The EpiSPIDER project was implemented in January 2006 to serve as a visualization supplement to the ProMED-mail reports. Through use of publicly available software, EpiSPIDER displays the topic intensity of ProMED-mail reports on a map. Additionally, EpiSPIDER automatically converts the topic and location information of the reports into RSS feeds[4]. As of the time of this writing, EpiSPIDER was actively collating 21 feed streams, as well as uClassify classifications of the latest Google Health News, ProMED Reports, Twitter stream and DayPi aggregated news feeds.

EpiSPIDER also produces a unique "Wordle" visualization generated from titles of news articles posted within the past 24 hours. See Figure 3 for an example.

Figure 3.

MedPedia has a site called FluTrackers, which can be found here:

FluTrackers is an international non-profit investigating infectious diseases. This site is a collection of forums addressing different areas, viruses or diseases in humans and animals. Contrary to the name, it is not limited to flu viruses and diseases. Subscribed users report instances of flu or other diseases. The reports are grouped into forums, where each forum represents a continent. Sub-forums are used to report by country, region and, where appropriate, state. As of November 2012, the site had more than 3,000 members and more than 400,000 posts of active disease situations.

Search Analysis

Google search analysis sites collect search data from searches and correlate the data to perform trend analysis.

We provide three examples of search analysis sites: Google Flu Trends, Google Dengue Trends and Google Correlate.

Google Flu Trends (GFT)

The Google Flu Trends home page: (Accessed November 2012.)

Google Flu Trends allows users to explore flu trends around the world. In an article recently published in PLoS ONE, "Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic"[5], the authors describe the site as follows: "Google Flu Trends (GFT) uses anonymized, aggregated Internet search activity to provide near-real-time estimates of influenza activity. GFT estimates have shown a strong correlation with official influenza surveillance data. The 2009 influenza virus A (H1N1) pandemic [pH1N1] provided the first opportunity to evaluate GFT during a non-seasonal influenza outbreak. In September 2009, an updated United States GFT model was developed using data from the beginning of pH1N1."

Figure 4 shows how Google Flu Trends provides current indicators of influenza outbreak.

Figure 4.

Figure 5, taken from the Google Flu Trends Web site illustrates the correlation between Google Flu Trends and influenza-like illness (ILI) data for the United States.

Figure 5.


We observed that the vertical axis in Figure 5 is not labeled, and therefore it was unclear whether this axis represents ILI counts or verified influenza counts. This issue is clarified in the PLoS ONE GFT publication[5], which clearly states that GFT produces estimates for ILI counts. To verify this fact, we obtained the ILI count data from the CDC for the period 2003-2012 and plotted it against the GFT predictions for that same period. The CDC ILI data can be downloaded from here:

and the GFT data can be downloaded from here:

The results are shown in Figure 6 and do not exactly match the results presented in Figure 5. The time scale for Figure 6 extends beyond that of Figure 5. While the shapes of the two infection count curves in Figure 6 are generally in fairly close agreement prior to mid-2009, the CDC ILI count data do not exactly match the "United States Data" curve in Figure 5. The authors attempted to contact Google to discuss the discrepancies in ILI count data between Google's plot and Figure 6, but Google did not respond to our inquiries.

In May 2009 there was an outbreak of pandemic influenza A virus (pH1N1). One of the first outbreaks resulted in the closure of a semi-rural Pennsylvania elementary school[6]. We believe that it was this pandemic outbreak that caused the large discrepancy shown in Figure 6 between GFT and CDC United States data in late 2009 and again in early 2011.

It has been noted that Google Flu Trends is better at monitoring nonspecific respiratory illnesses - bad colds and other infections, such as SARS, that seem like the flu - than it is at monitoring flu itself[7]. This is not surprising, given that Google Flu Trends was not designed to monitor confirmed flu infections, but is based on Internet search analysis.

Figure 6.

Google Dengue Trends (GDT)

The home page for Google Dengue Trends is at: (Accessed November 2012.)

Similar to Google Flu Trends, Google Dengue Trends analyzes Internet searches that correlate to dengue fever. Figure 7 shows GDT counts for predicted dengue fever infections, compared to actual dengue infection count data for Brazil. Google Dengue Trends allows the user to explore dengue fever trends around the world.

Figure 7.

Google Correlate

Google Correlate can be found at (Accessed November 2012.)

Google Correlate was one of the tools used to develop Google Flu Trends. It allows for automated query selection across millions of candidate queries for any temporal or spatial pattern of interest. Similar to a previous Google product, Trends and Insights for Search, Google Correlate is an online system and can surface its results in real time[8].

We experimented with Google Correlate in an attempt to determine whether meaningful influenza outbreak correlations could be produced using search terms for common over-the-counter medications, such as "ibuprofen". The results of this simple search were somewhat ambiguous. The second highest correlated result, with an r-factor of 0.9490 is shown in Figure 8. We have no explanation for the high degree of correlation between Internet searches for "ibuprofen" and for "gateway bible". Gateway Bible is a searchable online Bible in more than 100 versions.

Figure 8.

Government Resources

Government resources include organizations that receive funding primarily from governments, or are organizations within governments and that provide health information.

We provide three examples of government resource sites: Centers for Disease Control and Prevention (CDC), the U.S. Department of Health and Human Services and the Public Health Agency of Canada. We also include mention of the World Health Organization (WHO), which is the directing and coordinating authority for health within the United Nations system.

Centers for Disease Control and Prevention - CDC

The CDC Web site can be found at

There are several sites within the CDC Web site that have information about flu and other diseases, including: the CDC Flu and Surveillance ( and Weekly Summary Reports (

The CDC Flu Activity and Surveillance Web site[9] presents current and weekly influenza surveillance reports. This resource summarizes and presents data from the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet), such as is presented in Figure 9. The graph shows what percentage of patient visits to health care institutions were due to ILI.

Figure 9.

The weekly summary update CDC site provides a weekly summary update on influenza activity in the United States. Information provided by this site includes:

U.S. Department of Health and Human Services

The Web site for the U.S. Department of Health and Human Services regarding flu can be found at

This U.S. Government site contains general information for avoiding flu, such as a locator for flu vaccine locations. The site provides information collected from the CDC, Department of Homeland Security, Department of Education, the Federal Trade Commission, the Food and Drug Administration, the National Institutes of Health, and In general, this site provides information about flu, with links to different Web sites that contain data about flu or vaccine availability.

The World Health Organization (WHO)

The WHO  Web site is

WHO is the directing and coordinating authority for health within the United Nations system. It is responsible for providing leadership on global health matters, shaping the health research agenda, setting norms and standards, articulating evidence-based policy options, providing technical support to countries and monitoring and assessing health trends.

Various summary global health reports can be found on the WHO site, such as WHO's annual World Health Statistics reports, and the site presents the most recent health statistics for the WHO Member States.

The Global Public Health Intelligence Network (GPHIN)

The Web site for the Public Health Agency of Canada is The GPHIN site location is

GPHIN is a resource provided by the Public Health Agency of Canada. GPHIN's mission is to be an indispensable source of early warning for potential public health threats worldwide, including chemical, biological, radiological and nuclear (CBRN). GPHIN's goal is to use leading-edge communications technology and automated processes on a real-time, 24/7 basis complemented by human analysis to monitor media sources worldwide and provide organized, relevant information to users allowing them to respond to potential health threats in a timely manner. Infectious diseases covered by GPHIN. General information about the diseases, treatment and prevention are available to the public, and data showing outbreaks is available on some diseases. The following are links to sites containing detailed information about the following diseases:

Figure 10 shows a recent map of influenza activity that is available publicly. Data are also available showing the number of regions reporting widespread or localized influenza activity and overall number of influenza outbreaks by week, as well as the number of positive tests and Influenza-like illness (ILI) visits. Other detailed information is available through subscription.

Figure 10.

Social Media

Various social media sites are actively being mined for data that can be used to conduct studies of infectious disease trends.

Below we discuss social media sites generally. There are many sites being developed, with a variety of purposes.

Facebook, Twitter, Google, Etc.

In June, 2011 the New York Times reported on the following outbreak in an article entitled "Social Media Join Toolkit for Hunters of Disease"[10]:

On a chilly February night in Los Angeles, attendees at the DomainFest Global Conference crushed together in a tent at the Playboy Mansion for cocktails and dancing. Two days later, Nico Zeifang, a 28-year-old Internet entrepreneur from Germany, woke up with chest pains, chills and a soaring fever. Four colleagues shared his symptoms, Mr. Zeifang soon learned.

So he did what any young techie would: He logged on to Facebook and posted a status update. "Domainerflu count," it said. "Who else caught the disease at D.F.G.?"

Within hours, 24 conference attendees from around the world added themselves to Mr. Zeifang's Facebook list; within a week, the number climbed to 80. Many of them "friended" him to get information and to compare notes on their fevers and phlegmy coughs. Almost everyone, it seemed, had a theory about the source of the infection. Many suspected the artificial fog that permeated the tent.

Los Angeles County health authorities and the federal Centers for Disease Control and Prevention stepped in to investigate a few days later. By that time, victims from across the globe already had arrived at their own diagnosis - legionellosis - and had posted their own Wikipedia entry on the outbreak.

Social media - Facebook, Google, Twitter, location-based services like Foursquare and more - are changing the way epidemiologists discover and track the spread of disease. At one time these guardians of public health swooped onto the scene of an outbreak armed with diagnostic kits and a code of silence. Officials spent weeks interviewing victims privately, gathering test results and data, rarely even acknowledging in public that an investigation was under way. The results might not be announced for weeks or months.

Numerous other articles that discuss the role of social media in disease detection can be found via Web search. Here are a few:

  • Social Network Data Predicts Disease Outbreaks[11]
  • Social media could help detect pandemics, MD says[12]
  • Monitoring Influenza Trends through Mining Social Media[13]


Web and social media provide resources for detecting increases in influenza like illness (ILI), as well as other types of disease outbreak. As has been demonstrated by Google Flu Trends, this method of detecting disease onset trends can provide more timely results than traditional data collection and reporting methods. However, events such as the 2009 H1N1 pandemic can affect the quality of GFT predictions. The web resources mentioned in this paper have greatly increased the rate at which information propagation occurs for many types of disease outbreak.

Social network tools and usage patterns are evolving rapidly. New mashups of news feeds, live social network feeds, maps and other types of visualizations will undoubtedly be developed as their potential to facilitate information dissemination about disease outbreak are explored.


We would like to thank the NIGMS-funded Models of Infectious Disease Study (MIDAS) project (U24 GM087704) for supporting this activity and the MIDAS Network.


1. (2004) ASIL Insights, American Society of International Law. Available: Accessed 2011 October 10.

2. CDC - 2009 H1N1 Early Outbreak and Disease Characteristics. (2009) Available: Accessed 2011 October 9.

3. (2008) Web-Crawling Program ID's Disease Outbreaks. Available: Accessed 2011 October 7.

4. Keller M, Blench M, Tolentino H, Freifeld CC, Mandl KD, Mawudeku A, et al. Use of unstructured event-based reports for global infectious disease surveillance. Emerg Infect Dis [serial on the Internet]. 2009 May [2011 October 25]. Available from

5. Cook S, Conrad C, Fowlkes AL, Mohebbi MH, 2011 Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic. PLoS ONE 6(8): e23610. doi:10.1371/journal.pone.0023610

6. Clin Infect Dis. (2011) 52 (suppl 1):S154-S160.doi: 10.1093/cid/ciq058

7. (2010) Available: Accessed 2011 October 23.

8. (2011) Google Correlate Whitepaper. Available:

9. CDC - Seasonal Influenza (Flu) - Flu Activity & Surveillance. Available: Accessed 2011 October 7.

10. (2011) Social Media Join Toolkit for Hunters of Disease, NYT. Available: Accessed 2011 October 11.

11. (2010) Available: Accessed 2011 October 13.

12. (2011) Available: Accessed 2011 October 13.

13. (2011) Available:, Accessed: 2011 October 11.