NIH Record - National Institutes of Health

Scientists Use Machine Learning to Better Identify Long Covid

A bunch of clustered crystal blue circles surrounding tissue
Transmission electron micrograph of SARS-CoV-2 virus particles, isolated from a patient.

An NIH-supported research team identified characteristics of people with long Covid and those likely to have it. Using machine learning techniques, scientists analyzed an unprecedented collection of electronic health records (EHRs) available for Covid-19 research to better identify who has long Covid. 

Exploring de-identified EHR data in the National Covid Cohort Collaborative (N3C), a centralized public database led by NCATS, the team found more than 100,000 likely long Covid cases as of October 2021 (as of May 2022, the count is more than 200,000). The findings appear in The Lancet Digital Health

Long Covid is marked by wide-ranging symptoms, including shortness of breath, fatigue, fever, headaches, “brain fog” and other neurological problems that last for many months or longer after an initial Covid-19 diagnosis. Its symptoms mimic those of other diseases and conditions, often making it hard to identify. 

The N3C data enclave currently includes information representing more than 13 million people nationwide, including nearly 5 million Covid-19-positive cases. The resource enables rapid research on emerging questions about Covid-19 vaccines, therapies, risk factors and health outcomes.

The new research is part of a related, larger trans-NIH initiative, Researching Covid to Enhance Recovery (RECOVER), which aims to improve understanding of the long-term effects of Covid-19, called post-acute sequelae of SARS-CoV-2 (PASC). 

In the Lancet study, researchers examined patient demographics, health care use, diagnoses and medications in the health records of 97,995 adult Covid-19 patients in the N3C. They used this information, along with data on hundreds of long Covid patients from several clinics, to create three machine learning models.

In machine learning, scientists “train” computational methods to rapidly sift through large amounts of data to reveal new insights, patterns and clues.

The models focused on identifying potential long Covid patients among Covid-19 patients who were hospitalized and not hospitalized. 

“Once you’re able to determine who has long Covid in a large database of people, you can begin to ask questions about those people,” said Dr. Josh Fessel, NCATS senior clinical advisor and a RECOVER scientific program lead. “Was there something different about those people before they developed long Covid? Did they have certain risk factors? Was there something about how they were treated during acute Covid that might have increased or decreased their risk for long Covid?”

The models searched for common features, including new medications, doctor visits and new symptoms in patients who were at least 90 days out from their acute infection. The research team hopes to use its long Covid patient classifier for clinical trial recruitment.

The NIH Record

The NIH Record, founded in 1949, is the biweekly newsletter for employees of the National Institutes of Health.

Published 25 times each year, it comes out on payday Fridays.

Assistant Editor: Eric Bock
Eric.Bock@nih.gov (link sends e-mail)

Staff Writer: Amber Snyder
Amber.Snyder@nih.gov (link sends e-mail)