OpenEvidence, Doximity, and the FDA
The Gray Zone in Clinical AI
Another great question submitted by a reader. Send in your question about AI in Medicine for the next edition of the Ashoo Review. And as always, subscribe and tell a friend.
Sam
One of the most common questions I hear about clinical AI tools is:
“Why aren’t these products regulated as Software as a Medical Device (SaMD)?”
At first glance, it seems obvious that they should be. Clinicians are increasingly using ChatGPT, Gemini, Claude, OpenEvidence, Doximity Ask, and similar systems during real patient care. Questions about diagnosis, treatment, imaging, ECGs, and differential generation are becoming routine.
Yet most of these tools currently operate outside the traditional FDA medical device framework. Why? The answer has less to do with whether the software uses AI and more to do with what function it performs.
Clinical Decision Support v. Medical Device
Historically, the FDA has distinguished between software that supports clinician judgment and software that independently performs diagnostic or treatment functions. That distinction matters.
A tool that retrieves evidence, summarizes guidelines, or helps brainstorm differential diagnoses may qualify as clinical decision support rather than a regulated medical device. The clinician remains responsible for interpreting the information and making the final decision.
That’s very different from software that says:
“This patient has pneumonia.”
or
“This ECG demonstrates atrial fibrillation.”
Once software begins diagnosing patient-specific data and generating diagnostic conclusions, it starts looking much more like Software as a Medical Device (SaMD).
The Image Interpretation Problem
This is where things become more complicated for modern multimodal AI systems.
Most current AI assistants are careful in their public positioning. OpenEvidence emphasizes evidence retrieval and clinical knowledge synthesis. Doximity Ask focuses on workflow support, clinical questions, and documentation. ChatGPT, Gemini, and Claude are broadly positioned as general-purpose AI systems rather than diagnostic engines.
At the same time, clinicians are increasingly uploading:
Chest X-rays
CT scans
ECGs
Pathology images
and asking these systems for interpretation. That creates a fascinating gray zone.
The official positioning is “informational support.” The real-world usage looks like diagnostic interpretation.
My Own Testing
In my own testing, both OpenEvidence and Doximity Ask allowed me to upload ECG images and generate clinical interpretations.
Open Evidence
OpenEvidence included the disclaimer:
“This is not a formal interpretation and should be correlated with clinical context, prior ECGs, and laboratory data.”
Doximity
Doximity Ask displayed:
“Image use is experimental and may make mistakes. Please use accordingly.”
These disclaimers matter. They reinforce that the systems are intended to support clinician judgment rather than autonomously diagnose disease. At the same time, the systems are still functionally analyzing patient-specific data and generating clinical interpretations. That’s precisely where the FDA framework becomes more complicated.
The FDA Language
The FDA specifically discusses software that:
“assess[es] or interpret[s] the clinical implications or clinical relevance of a signal, pattern, or medical image.”
The agency includes ECGs, EEGs, CT scans, x-rays, pathology images, ultrasound, MRI, and dermatologic images within this broader category of patient-specific medical data interpretation.
An AI system summarizing atrial fibrillation guidelines occupies one regulatory category.
An AI system interpreting an uploaded ECG occupies another.
The issue is not whether companies are violating FDA rules. The issue is that multimodal AI is beginning to challenge the traditional distinction between informational support tools and diagnostic software.
Capability Versus Intent
The FDA traditionally focuses heavily on intended use. That intent can be inferred from marketing language, workflow integration, product demonstrations, clinician targeting, sales materials, and overall product design.
Disclaimers matter, but they are not the whole story. A company cannot realistically build a radiology interpretation engine, integrate it deeply into clinical workflow, and avoid regulatory scrutiny simply by adding: “For informational purposes only. Function still matters.
At the same time, general-purpose AI systems still occupy a different category from purpose-built diagnostic software. A chatbot that occasionally comments on an uploaded ECG is different from an FDA-cleared ECG interpretation platform marketed specifically for arrhythmia detection.
FDA-cleared AI Already Exists
Importantly, the FDA has already cleared multiple AI-powered SaMD platforms.
Viz.ai helps identify stroke findings and rapidly alert specialists. Aidoc offers radiology triage and acute finding detection tools. HeartFlow analyzes coronary CT imaging. Several ECG interpretation systems now use AI-assisted analysis, and newer sepsis prediction platforms combine clinical data with machine learning risk assessment.
These systems analyze patient-specific clinical data and produce a condition-specific finding, alert, risk estimate, or interpretation. That is fundamentally different from a general-purpose chatbot discussing medical knowledge.
Ambient Scribing, Coding, and Billing
Another interesting wrinkle is that many of these same platforms are no longer functioning solely as information retrieval tools. Both OpenEvidence and Doximity now offer ambient scribing workflows integrated into clinical documentation.
OpenEvidence also markets Coding Intelligence features that automatically suggest:
ICD-10 diagnoses
CPT codes
E/M coding levels
from encounter documentation.
Interestingly, FDA guidance generally treats billing and administrative support differently from diagnostic software. Claims processing, financial records, coding support, and administrative workflows are often excluded from device oversight.
But these distinctions may become harder to separate as platforms increasingly combine:
clinical reasoning
image interpretation
documentation
coding
billing
workflow automation
inside a single integrated AI environment.
The more these systems become embedded into real clinical and financial decision-making, the harder it may become to maintain clear regulatory boundaries between administrative support, clinical decision support, and diagnostic software.
What does FDA clearance or approval actually involve?
This is another area that is often misunderstood.
FDA oversight is not simply a bureaucratic label. Companies pursuing SaMD clearance must demonstrate that the software performs reliably and safely for its intended clinical use.
That usually means clearly defining the intended use, specifying the patient population and clinical setting, documenting how the algorithm functions, validating performance on clinical datasets, and establishing systems for software quality control and version management.
For image and ECG AI, companies perform retrospective and prospective validation studies using labeled clinical datasets interpreted by expert physicians.
The FDA may review:
sensitivity and specificity
false positive and false negative rates
subgroup performance
external validation cohorts
reproducibility across settings
In many cases, companies spend years and millions of dollars generating evidence before obtaining clearance.
Why is this difficult for large language models?
Traditional SaMD products usually perform narrow clinical tasks with relatively stable outputs and measurable endpoints. A stroke detection model either flags hemorrhage or it does not. An ECG algorithm either identifies atrial fibrillation or it does not.
Large language models are much harder to regulate within that framework. Their outputs are probabilistic, open-ended, context-dependent, and continuously evolving. Even defining intended use becomes challenging.
Is the model summarizing literature? Generating differential diagnoses? Interpreting images? Educating clinicians? Drafting notes? Triaging patients?
Sometimes it is doing all of those things simultaneously. That creates major regulatory friction.
The Next FDA Challenge
The next FDA challenge in AI may not come from traditional medical device companies. It may come from general-purpose multimodal AI systems that increasingly behave like diagnostic tools while carefully avoiding explicit diagnostic claims.
For now, many of these platforms remain positioned as clinician-support systems rather than autonomous diagnostic engines. But multimodal AI continues to improve rapidly. And once AI systems can reliably interpret images, ECGs, and pathology slides at scale, the line between “medical reference” and “medical device” may become increasingly difficult to defend.
A few questions are worth watching:
Will regulators focus more on real-world use than stated intent?
Does multimodal capability itself eventually trigger a new oversight framework?
How much clinician oversight is enough?
At what point does “clinical support” become “clinical interpretation”?
Can a general-purpose AI remain outside device regulation once clinicians routinely use it diagnostically?
We are still early in that conversation.




