Crowdbreaks tracks disease trends through social media
By: Seth Palmer
Marcel Salathé explains, "We are crowd-sourcing disease surveillance by collecting raw data from online social media websites, and by asking the crowd to help us with the evaluation of that data," adding that he foresees this method being "an important addition to classical disease surveillance approaches."
Toward that end, crowdbreaks is partnering with HealthMap — a web-based tool for real-time surveillance of emerging public health threats — to respond to the Now Trending Challenge sponsored by the U.S. Office of the Assistant Secretary for Preparedness and Response (ASPR) to develop web-based applications that track Twitter to identify trending illnesses.
So how exactly does crowdbreaks work?
A software program developed by Shashank Khandelwal of the Salathé research group connects to Twitter's streaming application programming interface (API) — a specification used by software components to communicate with each other — and is delivered a stream of tweets based on a specified set of disease-related keywords.
Visitors to crowdbreaks.com are prompted to evaluate whether the tweets are, in fact, disease-related, and their input is fed to a machine-learning algorithm that analyzes contextual patterns and then attempts to filter out any irrelevant data.
HealthMap's own aggregated feeds supply additional contextual data to crowdbreaks.
As HealthMap's co-founder John Brownstein notes, "We are using both news and official reports from HealthMap to help contextualize the social media data mined with crowdbreaks and provide an overall more robust public health surveillance tool."
And both sites enjoy further benefits from the data sharing: HealthMap is able to aggregate crowdbreaks' data into its own web system and use it in populating its mobile application — Outbreaks Near Me — and crowdbreaks is able to incorporate HealthMap's Local tool to present visitors to crowdbreaks.com with a map of current disease trends in their area.
More than the sum of its parts...
Despite crowdbreaks being still in beta release, Dr. Salathé has even bigger plans for the software that drives the site.
While he says his initial goal was to create an entirely crowd-sourced tool, Dr. Salathé has broadened his focus to include the potential for other researchers to use the crowdbreaks software to study other areas such as population dynamics of noninfectious diseases, vaccination trends, patterns of behavior, and trends related to diet and exercise.
He explains, "The strength of the system is that it is not constrained to infectious disease only: it's easily adaptable to almost anything people talk about on social media websites."
Dr. Brownstein adds, "If organized and filtered properly, social media [data] has tremendous potential to identify events early and provide key situational awareness on emerging health threats."
About the researchers
Dr. Salathé is an assistant professor of biology at Penn State, an adjunct faculty member in computer science and engineering, a faculty member of the Huck Institutes' graduate program in bioinformatics and genomics, and head of the Salathé research group — developers of crowdbreaks — in the Center for Infectious Disease Dynamics.
Mr. Khandelwal is a software and application developer with the Salathé research group in the Center for Infectious Disease Dynamics.