UpDoc Is Making Headlines

The clearance documents deserve the same attention.

Jun 29, 2026

Pull up the actual FDA clearance documents for UpDoc, and you’ll find a story that looks pretty different from the one in the press releases. In today’s newsletter, I’ll spend some time breaking down those details and why some scrutiny is warranted.

Meanwhile, if you enjoy reading, subscribe and tell a friend.

Sam

What the media is saying.

The narrative told by the WSJ Pro, MedCity News, and multiple others goes something like this: two Stanford physicians, Sharif Vakili and Ashwin Nayak, ran a rigorous randomized controlled trial proving an AI system could autonomously manage insulin titration in type 2 diabetic patients. The results were dramatic. The AI group hit glycemic control targets in 15 days on average. Fewer than half the standard care group got there at all within eight weeks. Result: 81% controlled by the AI versus 25% by standard practice.

They then built UpDoc on that clinical foundation, got it FDA-cleared as a Software as a Medical Device, and are deploying what the company calls “physician-grade agentic AI” to autonomously adjust insulin doses between office visits. They claim it is the first of its kind, ushering in a new era of care.

That’s the story, but there is a lot of nuance to be discussed.

What the study actually tested

The MIVA trial (Managing Insulin with Voice AI) was a real RCT, published in JAMA Network Open, and conducted at four Stanford primary care clinics from March 2021 to December 2022. Thirty-two adults with type 2 diabetes were randomized into two groups. Half got the voice AI intervention; half got standard care. The results were genuinely strong for a pilot of that size.

Here’s what the coverage consistently leaves out: the voice AI in that trial was Amazon Alexa. Not a generative model, and nothing resembling what most people picture when they hear “agentic AI.” Alexa in 2021 was a narrow, deterministic speech recognition virtual assistant using wake word detection, intent classification, and programmed rules. When a patient said “my sugar was 140 this morning,” the system translated that to glucose_value: 140 through rigid pattern matching. It did not use probability distributions, learned weights, or generative inference.

The insulin titration logic itself came straight from clinical guidelines published by the American Association of Clinical Endocrinologists and the American College of Endocrinology. If glucose is X, adjust dose by Y. An extremely well-executed, patient-friendly, clinically grounded flowchart, but still a flowchart.

What the MIVA trial proved is that a deterministic, rules-based system could dramatically outperform standard care at insulin titration. That’s a legitimate and important finding. It didn’t prove that AI was necessary. It proved that consistent protocol execution was sufficient and that the existing standard of care was failing patients badly enough that almost any disciplined approach would beat it.

What UpDoc actually built

The commercial UpDoc product is architecturally different from what ran in the MIVA trial. The FDA documents describe three software components: a provider-facing web portal, a patient mobile app, and a cloud-based application with a “Conversation Service” (the UpDoc Agent) and a “Clinical Service.”

The Conversation Service is where the LLM lives. It handles patient interaction: collecting glucose readings, asking about symptoms, communicating instructions back in natural language. This is the part UpDoc markets as “agentic AI.” The Clinical Service is where the dosing decision actually happens. It computes insulin instructions based on treatment parameters defined by the ordering physician. That’s the deterministic calculator.

The architecture is sensible. When your favorite AI needs to do arithmetic, it calls a Python tool rather than generating an answer probabilistically. You want a specific and correct (deterministic) answer to a math problem, not the most likely (probabilistic) answer from an LLM. UpDoc uses the same design. It’s brilliant. The problem is the marketing claims about it.

What the FDA actually cleared

Pull up the FDA K253281 submission and decision summaries for UpDoc. Two things struck me.

First: “No clinical testing was performed.”

The MIVA trial appears nowhere in either FDA document. It is not mentioned as supporting evidence, as a reference, or in the bibliography. The FDA never evaluated the 32-patient sample. It never weighed the 81% versus 25% outcome data. The Stanford trial played no role in the clearance.

Second: The clearance rests on substantial equivalence to the d-Nav System, a handheld insulin dose calculator made by Hygieia, Inc., which was cleared in 2018. It’s not an AI product or an LLM. It’s a software-based dose calculator that predates the MIVA trial by three years. The product code is NDC, and the regulation is 21 CFR 868.1890 “Predictive pulmonary-function value calculator,” a classification repurposed for insulin calculators. The word “AI” doesn’t appear in the regulatory classification.

What the FDA evaluated was software testing per IEC 62304, cybersecurity review, and human factors validation studies. That’s it. UpDoc is a Class II device cleared because it’s substantially equivalent to a prior calculator, with a voice-and-chat interface as its primary differentiating feature.

The change plan is the most revealing document

The Predetermined Change Control Plan (PCCP) is a roadmap of modifications UpDoc can make post-clearance without filing a new 510(k). It contains a critical sentence:

All future modifications must “maintain deterministic insulin dosing logic without altering core clinical decision-making.”

The FDA locked this in as a condition of clearance. UpDoc can’t exchange a probabilistic LLM for dosing decisions without filing a new submission. Whatever “agentic AI” means in the press releases, the cleared device’s dosing engine is, by regulatory requirement, deterministic. The regulators saw the architecture, understood which layer was doing which job, and explicitly required the calculator layer to stay a calculator.

The PCCP also specifies “zero tolerance for deviation and incorrect unit conversion rates of zero” for alternative data input methods, including voice. They’ve identified the handoff between the LLM interface and the deterministic dosing engine as a risk point. That’s the right decision. It also quietly acknowledges that the handoff is where the probabilistic layer touches a safety-critical area, and that this interface has never been clinically validated.

“Agentic AI” — and what that actually means

UpDoc’s press release calls the platform “physician-grade agentic AI” at least three times. The coverage has largely accepted this framing without much scrutiny.

Agentic AI has a reasonably specific meaning in the field: a system that perceives its environment, makes autonomous decisions across multiple steps, selects and uses tools, and pursues a goal over time while adapting its approach based on intermediate results. The defining characteristic of a true agent is that it decides how to accomplish something, not just what to output when given a specific input.

To its credit, UpDoc has been transparent about what it means by the term. Their press release defines agentic through a three-step workflow: the system monitors patient data and identifies trends requiring intervention, executes insulin titration within physician-approved parameters, then closes the loop by triggering follow-up lab orders and documenting the intervention in the EHR.

Map each of those steps against what the clearance documents actually describe, and the picture doesn’t look as agentic.

“Monitors patient data and identifies trends”: The system receives glucose values the patient reports or a CGM transmits, then checks them against pre-defined thresholds. That’s threshold alerting. A blood pressure cuff that beeps when you’re hypertensive does the same thing.

“Executes titration within physician-approved parameters”: This is the deterministic calculator we’ve already covered. This is very good and safe for patients. But to be clear, the AI has no agency here. It can’t deviate, can’t reason an alternative approach, can’t decide that a different protocol might fit better.

“Triggers follow-up lab orders and documents in the EHR”: This is the most plausibly agentic-sounding item on the list and genuinely new relative to the predicate (comparison) device. But “triggers necessary follow-up labs” almost certainly means the physician pre-specified which labs fire under which clinical conditions. It’s another if/then rule in the protocol, executed automatically. Important, but still not agentic reasoning.

The “physician-governed” framing they use to address safety concerns is the clearest argument against the agentic claim. They describe the physician as prescribing the treatment plan while the AI implements it within defined boundaries, with zero tolerance for deviation, as required by the PCCP. A system that’s fully constrained by a pre-specified protocol, with no discretion and no ability to adapt its approach, isn’t an agent. It’s an automated executor with a very detailed job description.

For comparison, insulin pumps automate delivery. Ventilators automate titration. Pacemakers automate rhythm correction. Automated pharmacy refill systems initiate patient outreach. None of those are called agentic AI because automation and agency aren’t the same thing. What’s genuinely new about UpDoc is the natural language interface, the EHR integration, and the physician governance model. Those are real innovations worth evaluating on their own terms. Calling them agentic doesn’t make them more impressive; it makes the term less meaningful.

The conflict of interest worth understanding

The Medscape coverage flagged that Nayak and co-authors disclosed owning UpDoc stock at the time of publication. UpDoc was founded three months after the MIVA trial was completed. The company didn’t exist when the trial was designed or conducted, so there was nothing to disclose at conception. The trial appears to have been designed and executed cleanly.

What the disclosure reflects is that by the time the paper was published, the researchers had become the founders. The conflict isn’t in the data. It’s in how that data has since been used. The researchers who designed the study are now the executives with the most to gain from that study being accepted as definitive validation of their commercial product. They’re the most prominent voices promoting it. They’re the ones driving the conflation of the MIVA findings with UpDoc’s commercial viability.

That’s not misconduct. Physician-researchers commercializing their findings is how medical innovation is supposed to work. But it does mean the most enthusiastic advocates for the study’s conclusions have the strongest financial interest in those conclusions being stretched beyond what a 32-patient pilot can actually support.

The liability question nobody asked

One detail from the WSJ piece deserves attention. CEO Vakili drew a clear legal line: UpDoc is liable for accurately implementing the physician’s care plan. It’s not responsible if the care plan itself is faulty.

That’s a meaningful posture, and it’ll be tested. When an autonomous AI executes a physician’s protocol and something goes wrong, the line between “bad protocol” and “bad execution” is exactly what litigation will contest. A plaintiff’s attorney doesn’t need to prove the algorithm malfunctioned. They need to create reasonable doubt about where the failure originated. The Cleveland Clinic’s executive framing of UpDoc as liable for implementation is a clean division of responsibility in a press release that will look considerably more complicated in a deposition.

What’s genuinely worth crediting

The care gap UpDoc is targeting is real and large. Basal insulin titration requires frequent patient contact, glucose logs, clinician availability, and patient follow-through. The MIVA trial demonstrated that even a fully deterministic system dramatically outperforms passive standard care. If UpDoc’s commercial product can replicate that in a larger, more diverse population, patients will be better off.

The FDA pathway they chose is the right one. Robert Califf noted they sought regulatory scrutiny rather than avoiding it. That’s notable in a space where plenty of clinical AI tools deploy without any regulatory engagement at all. And the architecture (physician sets the protocol, algorithm executes it, LLM handles the conversation) is well-reasoned for this use case. Deterministic dosing logic is exactly what you want when you’re adjusting medications autonomously. You want rule-following fidelity, not creative inference.

The concern isn’t the product. It’s the story being told about it.

The three-layer disconnect

The MIVA trial tested a deterministic Alexa-based system, not UpDoc. It validated consistent protocol execution. UpDoc then added an LLM to that architecture, and that substitution has never been clinically validated. The FDA, meanwhile, cleared a dose calculator based on a 2018 predecessor and never saw the trial or evaluated the LLM. Three layers. Three separate stories. What UpDoc is selling is the version where they all fuse into one: the trial validates the product, the product earns the clearance, the clearance confirms the AI. None of those connections hold up.

Your license. Your responsibility.

UpDoc may become an important tool. The clinical problem is real. The regulatory pathway was handled responsibly. The underlying architecture is defensible.

But before your health system signs on or you prescribe this as a treating physician, ask the questions the press coverage didn’t:

What, specifically, does the LLM component do, and what does the deterministic clinical service do? Get that in writing.

Has the LLM interface layer been clinically validated in a population comparable to yours? The MIVA trial didn’t test it. The FDA didn’t evaluate it. Who did?

What happens when the LLM misparses a patient’s glucose report? What’s the actual error rate at the handoff between the conversational layer and the dosing engine? The PCCP mandates zero tolerance, but mandating and demonstrating are different things.

What does the liability split mean in practice for your institution when something goes wrong?

The algorithm that ran in the MIVA trial followed AACE and ACE guidelines faithfully. That system worked. Know what’s running in the commercial product that replaced it. That’s our job.

Ashoo Review: AI in Medicine

Discussion about this post

Ready for more?