A significant number of artificial intelligence tools given the green light by the U.S. Food and Drug Administration (FDA) are missing crucial clinical validation data. According to a recent study published in Nature Medicine, over 40% of FDA-authorized AI-enabled medical devices lack the clinical evidence necessary to validate their effectiveness and safety.
The research, conducted by the University of North Carolina School of Medicine, scrutinized 226 AI-enabled medical devices authorized by the FDA between 1995 and 2022. This figure represents approximately 43% of the 521 AI devices approved during this period, highlighting a substantial gap in clinical validation data.
“The FDA and medical AI developers should publish more clinical validation data and prioritize prospective studies for device validation,” urged the study authors in their August 26 article.
Much of the medical AI technology has been approved under the FDA’s 510(k) process. This pathway is notably less rigorous, requiring manufacturers to demonstrate that their products are safe and effective compared to a predicate device rather than necessitating comprehensive clinical validation.
“A lot of those technologies legally did not need [validation data], but the argument we made in the paper is it could potentially slow adoption if we’re not seeing evidence of clinical validation, even if the device is substantially equivalent to a predicate,” explained Sammy Chouffani El Fassi, the study’s first author and an M.D. candidate at UNC.
Additionally, minor algorithm adjustments can significantly change a device’s functionality once implemented, according to Chouffani El Fassi. This underscores the importance of testing devices in clinical settings to accurately gauge their real-world performance.
The research categorized AI medical devices depending on the presence or absence of clinical validation data, which involves testing on actual patient data for safety and efficacy. Among the 292 devices that had validation data, 144 underwent retrospective studies—these tests used data collected before the devices were implemented in patient care. Meanwhile, 148 devices were validated prospectively, meaning they were tested during actual patient care or using data gathered after a trial started. Notably, only 22 of these prospectively tested devices were subjected to randomized controlled trials.
Chouffani El Fassi stressed the value of prospective studies in understanding how an AI device functions in real-life clinical settings. For instance, as part of a team at Duke University, he validated an algorithm designed to detect cardiac decompensation using electronic health records. The team conducted both retrospective and prospective studies, allowing cardiologists to use the algorithm and report their agreement with its diagnoses of cardiac decompensation.
“Prospective validation was particularly helpful because we could see what needed to be improved in the device,” Chouffani El Fassi noted. The team discovered the user interface required enhancements to make the device more user-friendly and efficient.
Prospective testing also uncovers potential confounding variables that may not be apparent in retrospective studies. For example, a device reading chest X-rays might reveal new patterns in post-COVID-19 pandemic data that pre-2020 data wouldn’t show.
Chouffani El Fassi emphasized that prospective studies don’t need to be overly complex or resource-intensive. A simple scenario could involve a physician using an AI-optimized ultrasound probe on a patient and then rating its usefulness on a scale of 1 to 5. “It’s really basic, but we would consider that a prospective study because the confounding variables are there. Let’s say a doctor has a hard time using it because it has a bad user interface; he or she is going to give it a 1 out of 5,” Chouffani El Fassi explained. “You learned something new; you saw how that device actually works in real life. We see that as the most valuable kind of data.”