Transparency Isn't a Safeguard
The FDA Loosened AI Oversight. Your Liability Didn't Move.
On January 6, 2026, the FDA published revised guidance on clinical decision support (CDS) software, and the coverage has largely been positive. Less red tape. Faster innovation. Tools that can finally say what they actually mean instead of hedging behind padded lists of possibilities.
There’s real merit to that change. But there’s also a version of this story that hasn’t been told yet, one that is especially relevant in emergency medicine and critical care.
Let’s get into it.
As always, if you enjoy reading, I encourage you to subscribe and tell a friend.
Sam
What Actually Changed
For years, one of the FDA’s more awkward regulatory quirks was that clinical decision support (CDS) software offering a single recommendation was more likely to be classified as a medical device than software offering multiple options. The perverse result: developers were incentivized to dilute outputs, presenting three or four choices even when the evidence clearly favored one. Clinicians had to sort through the noise. Nobody loved it.
The new guidance fixes that. The FDA will now allow single-recommendation CDS without triggering device classification, provided the logic, data sources, and guideline basis behind that recommendation are visible to the clinician. They call it a “glass box” model: not opaque AI rendering a verdict, but transparent reasoning a clinician can inspect before acting.
There’s also an expanded “general wellness” carveout for consumer wearables. Devices reporting metrics like blood pressure and oxygen saturation can now stay outside device regulation as long as they don’t make diagnostic claims. That part is not controversial.
What’s important to understand is that this isn’t complete deregulation. The FDA still asserts authority over opaque models and tools that substitute for clinical judgment. The line just moved meaningfully in the direction of “trust the clinician to evaluate the output.”
The Case for Optimism (And It’s a Real One)
I want to be fair here because my job isn’t to reflexively oppose change. It’s to evaluate it honestly.
The previous regulatory logic was genuinely broken. When guidelines and patient data points in one direction, medicine often has a right answer. Forcing CDS to pretend otherwise didn’t make things safer; it just made the tools harder to use.
A well-designed, transparent AI that surfaces the relevant evidence, flags the applicable guideline, and tells you its reasoning is a useful clinical assistant. If the FDA’s revised framework actually enables more of that and less of the multi-option noise, that’s a good thing for both clinicians and patients.
This Should Give You Pause
The entire framework rests on one assumption: that transparency functions as a reliable safeguard. It’s worth asking whether that assumption holds in practice.
In theory, a glass box lets you inspect the AI’s reasoning before accepting its recommendation. In practice, in an overcrowded ED, at the tenth hour of a twelve-hour shift, with a septic patient in bay 3 and a chest pain in bay 7, are you clicking through the reasoning panel? Are your colleagues? Are you confident that institutional productivity pressures won’t quietly reward the physicians who just accept the output and move on?
Cognitive offloading isn’t a character flaw. It’s a predictable human response to cognitive overload. The FDA guidance even acknowledges automation bias as a concern, but it doesn’t solve it. It just names it and hands the responsibility back to the clinician.
And here’s the real kicker: the FDA explicitly carved emergency and time-critical CDS out of the loosened framework. The guidance states that software intended for urgent, high-stakes decisions where the clinician lacks time to independently review the logic does not qualify for the exemption, specifically citing automation bias in those settings. That is, our propensity to accept what the machine is telling us even when there is contradictory evidence.
Read that again: the specialty with the highest acuity, the fastest decision cycles, and the most cognitively demanding environment is the one the FDA flagged as highest-risk. If you’re practicing emergency medicine, the tools most likely to influence your practice are the ones that still require close regulatory scrutiny. Which means some of what’s entering your ED workflow may not meet that bar, and you may not know which is which.
The Liability Math Nobody Is Talking About
Here’s where it gets uncomfortable.
The FDA declined to define what “clinically appropriate” means when it comes to single-recommendation CDS. That decision gets made by the developers. And when an AI-influenced recommendation leads to a bad outcome, the responsibility lands where it always has: with the physician who accepted it.
More AI authority in the workflow. Same physician accountability. That’s not necessarily wrong. It’s how medicine has always worked with every tool we use. But it’s worth being clear-eyed about the asymmetry. The guidance accelerates the path for tools to enter your workflow while leaving unchanged the standard of care you’re held to when they’re wrong.
Your license. Your responsibility. That’s not just a tagline. It’s the legal and ethical reality that the FDA’s framework reinforces.
The LLM Blind Spot
One more thing worth noting: the guidance is nearly silent on generative AI.
The tools that are actually proliferating at the bedside right now- AI scribes, ambient documentation platforms, chatbot-style decision support embedded in the EHR- are largely built on large language models. And LLMs present a specific transparency challenge that rule-based systems don’t: their outputs are probabilistic, not deterministic. That means the models rely on educated guesses when faced with uncertainty, rather than following a set of rules that reach the same conclusion every time. The “glass box” concept is much harder to apply when the reasoning isn’t a traceable logic chain.
The FDA’s guidance doesn’t really address this. Whether that represents regulatory humility or a gap that needs to be filled is an open question. However, it means clinicians are navigating a rapidly evolving LLM-enabled ecosystem without clear guidance on how those tools fit into the framework.
A Case Study
Abstract regulatory language is easier to evaluate when it touches something real. So let’s apply the FDA’s four criteria to a tool many emergency physicians are already using: OpenEvidence.
OpenEvidence allows physicians to enter patient-specific clinical information, including protected health information, and receive synthesized answers and recommendations from peer-reviewed literature. It cites its sources. It also draws conclusions.
Walk it through the criteria.
✅Criterion 1: Data inputs. OpenEvidence ingests text-based clinical information: symptoms, labs, diagnoses, history. This isn’t imaging data or signals from diagnostic hardware. Criterion 1 is probably satisfied.
✅Criterion 2: Displaying and analyzing medical information. The software matches patient-specific data against clinical literature and guidelines, which is precisely the FDA’s own example of what this criterion covers. Criterion 2 is satisfied.
❓Criterion 3: Supporting versus directing judgment. This is where it gets murky. The FDA draws a sharp line between software that presents options for a clinician to weigh and software that summarizes answers and draws conclusions. OpenEvidence’s outputs function more like directives than option lists. The answer to this criterion depends on exactly how its recommendations are framed, and that’s worth looking at closely.
❌Criterion 4: Independent reviewability. This is where the ED context becomes decisive, and where the FDA’s own language is crystal clear.
The guidance states directly that software intended for critical, time-sensitive decisions does not meet Criterion 4, because clinicians are unlikely to have sufficient time to independently review the basis of the recommendations. The FDA states that in urgent situations, the pressure to act accelerates the tendency to accept AI output without independent scrutiny.
OpenEvidence used by a primary care physician working up a chronic condition, with time to click through citations and evaluate the reasoning, might satisfy all four criteria. The same tool, used by an emergency physician making a time-critical disposition decision, certainly doesn’t. Not because the software changed. Because the context did.
That distinction matters more than most clinicians realize. The FDA’s framework isn’t tool-specific; it’s context-specific. And a lot of what’s currently running in ED workflows may be operating in a regulatory gray zone that neither clinicians nor hospital administrators have fully reckoned with.
Citing sources isn’t the same as giving clinicians time to read them. In the ED, those two things are rarely the same.
What to Do With All This
The FDA’s January guidance isn’t reckless, and it isn’t trivial. It’s a deliberate bet that clinical AI can move faster without sacrificing safety if clinicians stay meaningfully engaged with what the tools are telling them and why.
Whether that bet pays off depends almost entirely on us. Before your department adopts a new AI CDS tool, here’s what I’d want to know:
Does the reasoning actually surface in the workflow, or is it buried three clicks deep? What does the vendor say about performance in high-acuity, time-critical settings specifically? Has it been validated on a patient population that resembles yours? Who reviewed the validation data, and was it anyone independent of the company selling it? And critically, what happens when it’s wrong, and how is that tracked?
The FDA has done its part by drawing a clearer map. But we’re the ones practicing in the territory, and the terrain in an ED at 2 am looks nothing like the conference room where these policies get written.
Transparency is a good start. Reflection is the part we have to supply ourselves.


