NIH Record - National Institutes of Health

Tracking ‘Digital Exhaust’

Digital Disease Detection Needs Boost, Brownstein Says

Dr. Brownstein
Dr. John Brownstein

The current global coronavirus pandemic is going to be a poster child for all sorts of future public health preparation, but perhaps nowhere more actionably than in real-time surveillance of an infectious disease as it explodes across the world.

“Every domain except health care has data—shopping, travel, news, entertainment, learning—that helps with our decision-making,” said Dr. John S. Brownstein, professor of biomedical informatics at Harvard Medical School and chief innovation officer at Boston Children’s Hospital. “Covid-19 has put a real focus on the inadequacies of public health data collection.”

Brownstein offered a whirlwind overview of how digital disease detection can be improved at the National Library of Medicine/Medical Library Association’s annual Joseph Leiter Lecture on Aug. 11.

“What data streams,” he asked, “can we tap into that unwind the hierarchical—and time-consuming—structure” that governed how the world came to know about a novel coronavirus that was first publicly reported on Dec. 30, 2019, when 7 patients in Wuhan, China, came down with a mysterious illness?

Early reports were linear and slow, arising from the public, to health care workers, to laboratories, to ministries of health (such as the CDC) and finally to world bodies such as the WHO.

“Our view is that all stakeholders in public health should have access immediately,” said Brownstein. Monitoring public health threats should be as obvious and simple as whipping out your smartphone to check the weather, he said.

Brownstein gives lecture on video
Brownstein speaks virtually to NIH audience.

“Why this doesn’t happen in public health baffles us, right?” he continued. “A National Weather Service for disease outbreaks is something people keep talking about. But we lack the underlying data sets, we lack access to APIs [Application Programming Interfaces, a software system that allows two applications to talk to each other], there is no ecosystem of tools that you can turn to.”

It’s not that the clues aren’t everywhere, it’s that no one’s aggregating them, he argued.

“People move through the world and they have a ‘digital exhaust,’” said Brownstein. “They search online, they tweet, they use their Fitbit. There is a subset of your digital exhaust that is health-related. We need to tap into that data, at scale. If you aggregate it, you can get amazing insights about population-level events.”

Fifteen years ago, Brownstein and others envisioned such an early alert system. Using funds from NLM, they created HealthMap, with the goal of tapping into the huge troves of online data.

The effort now includes 171 public and private sources, 15 languages and more than 200,000 websites.

“It can tease out important events,” said Brownstein, “but we need to structure the massive noise of data.” Two common tools for modern librarians—natural language processing and machine-learning—“help tag the information churned through daily on HealthMap. This is how the WHO learned of the alarming spread of Covid-19, along with ProMED [the Program for Monitoring Emerging Diseases, run by the International Society for Infectious Diseases].”

Sites in place before covid

These tracking sites were in place well before Covid-19. Examples of their utility include:

  • Detection of H1N1 (swine flu) in 2009 in Veracruz, Mexico, which enabled authorities to track the spread from country to country.
  • Five years later, the same technology identified H7N9 flu in China. Brownstein suggested that parsing of data from Facebook and Twitter could be the next step in bolstering digital epidemiology.
  • Bots knew that Ebola was coming in West Africa in 2014. “Access to [airline] passenger data out of West Africa could enable rapid threat assessment—where is the disease going next?” said Brownstein. His team built a zoonotic niche map to label high-risk environments where Ebola might flourish.
  • In 2016, Zika virus expansion was tracked by digital sleuths.

The technology is not foolproof. “We found that tracking flu through Google search query data did a poor job on H1N1,” Brownstein reported. However, “Wikipedia is incredibly valuable in tracking influenza, just by counting the page views seeking flu data.”

Other unexpected klaxons of public health issues include:

  • Open Table, the online reservation system “has been valuable in Covid-19,” said Brownstein. The availability of table reservations is a predictor of flu-like illness; cancellations are an indicator of social disruption.
  • Yelp reviews. Brownstein said that about 10 percent of all Yelp reviews relate to food-poisoning. “That’s an incredibly valuable health care source.”
  • Through a partnership with Twitter, Brownstein and colleagues found that “many people, surprisingly, tweet about their diarrhea and food-related issues. We captured that information online from millions of posts and built a tool for public health. We created a social media dashboard for public health, to discuss food-poisoning data…There was a huge number of foodborne illness searches in the wake of Hurricane Harvey.”

More traditional platforms

Platforms that are more traditional in structure and intent include:

  • Google Mobility Project, which collects data from 300 million users from 243 countries/territories, representing 65 percent of Earth’s habitable surface. “The goal is to develop a global human movement typology that can track the spread of disease,” said Brownstein.
  • Crowdsourcing, or “putting the public back in public health,” on sites such as Flu Near You, where 100,000 users report weekly on their symptoms.
  • Covid Near You—a collaboration with Amazon, Google, Apple and others—has already been deployed in the U.S., Canada and Mexico, attracting more than 1 million users in the U.S. alone. A symptom-based tracker, it has been “incredibly effective at identifying [pathogen] emergence in populations, determining age-based attack rates, etc.,” said Brownstein. There are plans to merge the flu and covid versions.
  • EpiCore, already in use for Covid-19, which taps into a network of experts. “Essentially, it’s a Bat Phone for epidemiologists—an information exchange,” said Brownstein. “It pushes information to stakeholders like WHO and the CDC.”
  • Health care chat boxes. “This is the next iteration of patient-engagement tools,” said Brownstein. The Symptom Checker, for example, is a self-assessment questionnaire. “These tools are providing an incredible level of diagnostic accuracy, as well as triage accuracy—shifting care to the right place, anywhere from the ER to a telemedicine visit…It also helps with surveillance. There’s been massive growth in these tools as covid has emerged.”

New platforms include Alexa, the voice-activated search tool. “If you’re using them in your consumer life, you can use them in health care as well,” Brownstein said. Other online platforms include KidsMD, to quickly determine the level of care a patient needs, and Flu Doctor, which is also applicable to covid; it is expected to help with vaccine effectiveness and side-effect reporting, and help counter misinformation.

Brownstein speaks virtually.
“People move through the world and they have a ‘digital exhaust,’” said Brownstein.

With the coronavirus pandemic, Brownstein said his team has been “heads-down for the last half year.” They are using data-mining tools to track spread, but are limited by scans of news and social media. But what began as a small volunteer effort in January has now become a massive enterprise.

“We’ve built a global repository of cases, with many partners around the world [including Baidu, the Chinese version of Google]. We used it to measure the impact of the lockdown in China. It proved that lockdowns are effective.”

New tools and tricks

The big new tool is Global.health, which launched this summer. Although built as a Covid-19 data science initiative, it is expected to be a boon in future pandemics. A collaboration between Google and a wide range of institutions, it will make public health data freely available in real time.

Even satellite data—photos taken from space—has a public health application. By comparing images of parking lot usage at hospitals in Wuhan taken a year before the outbreak, an uptick could be found last fall, indicating more visits, more need for care. Parking lot photos have helped predict flu season in Venezuela, too, reported Brownstein.

A partnership with the popular survey software SurveyMonkey, offered as an option for those doing other kinds of surveys, showed Brownstein and his team that “half of the people we surveyed in the U.S. wouldn’t get a vaccine as soon as it became available; 1 in 8 would not want to get it ever. Those age 75 and older are the most eager to get it right away, without hesitation.”

Brownstein speaks on video.
“Our view is that all stakeholders in public health should have access immediately,” said Brownstein.

The researchers also surveyed mask-wearing, an important avenue to lowering R-naught, or the reproductive number, a measure of virus transmission. At least half the people in a population need to be masked to budge the number toward viral extinction.

Aware that Covid-19 has disproportionately ravaged minority communities, Brownstein’s team is also tracking access to testing, including travel time to the nearest testing center. They will apply Vaccine Finder, a tool begun for H1N1 flu, to Covid-19, to make testing more convenient.

The Joseph Leiter NLM/MLA Lectureship was established in 1983. Brownstein’s full, and rather dizzying, version is available at https://videocast.nih.gov/watch=38269.

The NIH Record

The NIH Record, founded in 1949, is the biweekly newsletter for employees of the National Institutes of Health.

Published 25 times each year, it comes out on payday Fridays.

Editor: Dana Talesnik
Dana.Talesnik@nih.gov

Associate Editor: Patrick Smith
pat.smith@nih.gov

Assistant Editor: Eric Bock
Eric.Bock@nih.gov

Staff Writer: Amber Snyder
Amber.Snyder@nih.gov