As artificial intelligence systems improve, their applications in medicine — particularly in radiology — have drawn interest. A June study from Brown researchers, published in European Radiology, examines how radiologists are influenced when using AI as an aid to interpret chest radiograph results for cancer.
Michael Bernstein, assistant professor of diagnostic imaging at the Warren Alpert Medical School and the first author of the study, told The Herald that the study aimed to determine if AI “causes performance detriments” for radiologists when it is wrong. After determining if and how AI negatively impacts radiologists’ readings of chest X-rays, researchers can then begin to mitigate the consequences.
“What our research focuses on is not the AI itself, but how should the AI actually be used clinically?” Bernstein said. “Which is a question of psychology, not actually of statistics or computer scientists.”
In order to test this phenomenon, Bernstein, who is a research scientist, worked with his colleagues and radiologists to create a fake AI system. The system, dubbed “sham AI,” was designed to occasionally give false readings on chest X-rays, which were used for the study. Radiologists were then given the same chest X-rays to read under various conditions.
Other co-authors included Associate Professors of Diagnostic Imaging at Warren Alpert Elizabeth Dibble and Saurabh Agarwal, Robert Ward, Terrance Healey and Grayson Baird, as well as Assistant Professors of Diagnostic Imaging at Warren Alpert Aaron Maxwell, Adib Karam,
“In order to show that radiologists had to be misled, we had to manipulate the AI feedback to be wrong,” said Greyson Baird, associate professor of diagnostic imaging at Warren Alpert and another author of the study.
Throughout the study, radiologists were asked to read chest radiographs for a pulmonary lesion, with an AI indicating if X-rays were “abnormal or normal.” No X-rays would show a pulmonary lesion; instead, radiologists would be tested on their ability to screen for lung cancer.
Throughout the study, the authors found that accompanying AI readings could have detrimental effects on the final interpretations of whether X-rays were abnormal or normal, though those effects were mitigated when the radiographs had a box around a “region of interest.”
For example, some radiologists who used incorrect AI readings made errors in their final interpretations when their initial interpretations would have been correct without AI. This resulted in both an increase in false positives — reading cancer finding where radiologists should not have detected cancer — and in false negatives, ignoring or missing a cancer finding when AI said it was not there.
The study also found that if radiologists are told the AI interpretation will be saved in a patient’s file, the clinician is more likely to agree with AI — resulting in more false negatives and positives. Bernstein and Baird said that this might be out of fear.
“It doesn’t matter if you were right 100 times in disagreeing with the AI,” Baird said. “That one time is now in the patient’s file, so that can be used against you.”
When this technology is used in radiology spaces, according to Bernstein, “you’re going to have AI contradict the radiologist on more ambiguous cases. And because it’s more ambiguous, they are probably going to be more susceptible to being misled.”
But an unexpected result of the study was that even in less ambiguous cases, “we still found that (radiologists) were susceptible to being misled,” Bernstein said.
This is an example of automation bias — when a computer-generated result leads a human to think a certain way. The study’s results also show anchoring bias — where an early suggestion, whether from a physician or AI, may obscure or prevent consideration of alternatives, Baird said.
Another study from Germany and the Netherlands “correlated and saw where AI was wrong (and) radiologists were wrong,” when looking at a set of mammograms, according to Baird.
“They looked at radiologists with AI – and where AI was wrong, radiologists were more likely to be wrong,” Baird said. But that study could not demonstrate that AI had a psychological impact on radiologists. It “can’t show causation, because they forgot to have them read it without AI.”
Bernstein, Baird and Michael K. Atalay, professor of diagnostic imaging and medicine at Warren Alpert, published a letter to the editor in Radiology Oct. 17 emphasizing the German study’s limitations. “To fully capture the effect of incorrect AI, it is important to compare radiologists’ performance with the same cases where AI was wrong versus when AI was not used — the counterfactual,” they wrote.
According to Bernstein, many AI programs are designed to be used in clinical settings — “but no one is actually thinking about how to implement them.”
“The question isn’t, ‘should (AI) be used?’” Baird said. “It should be used, but could you enhance the performance of radiologists with AI depending on how it’s implemented?”
Martha Mainiero, a professor of diagnostic imaging focusing on breast imaging at Warren Alpert, agreed. “AI is an exciting new tool, but like all tools it has appropriate uses and ways that it can be misused,” she wrote in an email to The Herald.
Mainiero noted that AI is not currently being used for mammography at Brown, but that “radiologists will need to be trained about how to interact with these systems to maximize its potential and minimize the potential negative effects.”
Francesca Grossberg is a Staff Writer covering Science and Research. She is a first-year from New York City planning to concentrate in Health and Human Biology.