Celi Cautions Developers, Clinicians to Beware of Bias in Healthcare AI Models
Artificial intelligence (AI) in healthcare is advancing at a rapid pace.
“Those who are at the cutting edge of AI are convinced we are a year or two away from dumping a whole dataset into an AI model and asking it to write a full research manuscript,” said Dr. Leo Anthony Celi, clinical research director and principal research scientist at the Massachusetts Institute of Technology’s Laboratory for Computational Physiology. He spoke at a recent NIH AI symposium in Masur Auditorium.
The day-long symposium brought researchers from a broad range of disciplines together to share their AI-related research. It was sponsored by NIH’s National Heart, Lung, and Blood Institute and NIH Office of Intramural Research, in partnership with the Foundation for Advanced Education in the Sciences (FAES).
Electronic health records are the “de facto building blocks” of AI healthcare models, he said. Despite their importance, these records were never designed to be building blocks.
“Data is not an objective representation of the world,” Celi said. “It is a representation of the world as seen through the lenses of the observers.”
Anyone who trains AI models on electronic health records must be aware of bias, which can significantly impact a model’s effectiveness and fairness. Healthcare AI models that don’t account for bias often perform inadequately, compared to models that control for bias.
“You need to take caution when you’re using electronic health record data for developing AI,” Celi warned.
There are medical devices that do not perform consistently across different populations. For instance, Celi’s team found skin tone can affect the accuracy of a pulse oximeter’s oxygen saturation reading. The models don’t just know that. “You need to give it context, he said.”
Photo: NICOEININO/SHUTTERSTOCK
Anyone who builds AI models must think about where the data came from, what instruments and devices measured the signals and who collected the data, he said.
AI developers must also be aware of how models influence human behavior. One of Celi’s colleagues studied how an AI medical imaging tool affected the performance of radiologists. The researchers found experienced radiologists performed worse after they began using the tool.
The radiologists were confident in their abilities until the tool began interpreting images differently. They realized the tool caught potential abnormalities that they didn’t. Afterwards, they were less confident and made more mistakes.
“AI will change the behaviors of users,” Celi said. “When AI becomes more and more accurate, our tendency is to just hit accept, accept, accept.”
Despite these challenges, Celi is excited about AI’s potential. “We have an immense opportunity to start with a clean slate and truly improve the way we learn.”
A few years ago, Celi began teaching courses in AI at MIT. Because the field is moving so fast, he teaches his students how to think critically.
“We need to teach our students how to ask the right questions, how to be able to evaluate those answers, how to know when their understanding of a problem is limited, how to seek help and how to seek other expertise to be able to come up with a good study design,” he said.
Celi’s team created several publicly available databases, including the Medical Information Mart for Intensive Care (MIMIC), a freely available, de-identified, electronic health record dataset. More than 10,000 papers have cited the database.
His lab regularly organizes “datathons,” where experts from data science and healthcare backgrounds come together to attend workshops and evaluate the performance of AI models in different medical fields. An upcoming datathon will assess models for depression recognition. Patients, psychologists, social workers, psychiatrists and nurses will meet to evaluate the performance of the models.
Currently, Celi is measuring the impact of these events. He’s found attendance increases the chances scientists work with others outside their specialties.
They also bring together high school students and experts. They ask participants to read papers that have profound implications in the application of health AI. Then, they have a discussion.
“We ask students questions such as: What are the worst scenarios that could happen as a result of this? What should we do? What policies can we erect? What about guardrails and how do we change the incentive structure?”
By focusing on the ability to think critically, developers and clinicians alike will be able to take advantage of the promise AI healthcare models offer.
“There’s so much energy around AI,” Celi concluded. “I think we will be remiss if we waste that.”