Same Model. Different Database. Different Answer.
Rag is the Reason These Ai Tools Feel So Different
If you have been enjoying this AI in Medicine newsletter, I invite you to subscribe and tell a friend. There is no greater endorsement than a personal recommendation. Help spread the word… and ask questions! There is a comment section, and I love talking about these topics. No such thing as a silly question. I’m sure others are thinking the same thing you are.
Sam
In the last post, I said I’d break down RAG and why it changes how you should think about tools like OpenEvidence, DoximityGPT (now Doximity Ask), and ChatGPT for Clinicians. This is the part that actually matters because without it, all of these tools start to look the same.
First, how RAG works
RAG stands for Retrieval-Augmented Generation. The model doesn’t just answer your question from its own knowledge; it searches for information first and then writes an answer using what it finds. That search step runs on a specific database, and that detail drives the quality of the output.
Every RAG system sits on top of a defined body of literature. Many systems rely on PubMed abstracts or summaries. A smaller number incorporate society guidelines. An even smaller group has access to full-text journal articles. Some systems mix multiple sources, others stay narrow.
Those differences aren’t small. An abstract gives you a compressed version of a study. A guideline gives you consensus and synthesis. A full-text article gives you methods, limitations, and nuance. The model can only work with what it receives.
So the real question becomes: what database did this system search, and how deep is that access?
Why this matters clinically
If you’ve used a large language model long enough, you’ve seen confident answers with incorrect details or outdated recommendations. That behavior reflects the limits of training alone.
RAG changes the behavior by allowing the system to pull recent papers, surface guidelines, and anchor answers to identifiable sources. That shift makes these tools feel more grounded.
The failure point doesn’t disappear. It moves into the retrieval layer.
Now the weak link sits in the database itself, and in how the system searches it. A system that only sees abstracts will produce a different answer than one that can read the full text. A system that lacks access to guidelines will miss consensus recommendations entirely.
A confident answer built on thin retrieval will still sound convincing.
Why these tools don’t behave the same way
A lot of clinicians assume these tools are interchangeable. They aren’t. They’re different retrieval systems built around similar models, and they point at very different databases.
OpenEvidence takes a very explicit approach. It has direct partnerships with journals and specialty societies, and it includes access to full-text content in addition to abstracts. That matters. It is not just searching more, it is searching deeper. It also appears to lean toward guidelines and systematic reviews, placing those results up front. The exact ranking process is not fully transparent, but the behavior is noticeable when you use it.
DoximityGPT (AKA Doximity Ask) is doing something different. Doximity acquired a company called Pathway, which built a large structured medical dataset for AI. That dataset includes guidelines, drugs, and landmark trials across specialties.
That sounds similar to literature retrieval, but it is not the same thing. This is not raw access to journals. It is a processed layer on top of the literature. The data has already been selected, structured, and tied back to sources before the model ever sees it.
That can make answers faster and cleaner. It also means you are relying on how that dataset was built. You are one step removed from the original paper. The retrieval and ranking processes are also not clearly visible.
ChatGPT for Healthcare / Clinicians is different from both of these. It does not come with a built-in clinical database. It may search PubMed when it perceives the need. It is not pulling guidelines unless you explicitly connect those sources.
Out of the box, it is just a model generating answers from training.
It only becomes a RAG system if you give it something to retrieve from. That could be web browsing, your institution’s documents, or a defined knowledge base. The database is whatever you connect it to. Same model class, completely different behavior.
It is also worth realizing that OpenEvidence and DoximityGPT are just the visible layer. A lot of the most advanced RAG systems are not standalone products. They are being built inside health systems. These systems pull from EHRs, institutional guidelines, and internal knowledge bases, then combine that with the literature. That is a very different model from a tool that only searches PubMed or a structured dataset. This is where things are heading: systems that pull from multiple sources at once, not just one database.
The mental shift
You’re not using a single model. You’re using a pipeline that interprets your question, searches a specific database, ranks results, and then generates an answer. Each step carries its own potential failures.
If the system searches a shallow or limited dataset, the answer reflects that limitation. If the ranking favors less relevant material, the synthesis reflects that bias. That explains why you’ll sometimes see correct answers with weak citations, incorrect answers with convincing citations, and disagreement across tools that sound equally confident.
How to actually use these tools
A few practical habits help:
Look at the source, not just the answer
Ask what database the system searched
Check whether the citations reflect abstracts, guidelines, or full text
Use more than one system for higher-stakes questions
Recognize that missing evidence may reflect a retrieval gap
You’re evaluating both a search process and a generated summary at the same time.
Where this is going
RAG systems are improving quickly, with better indexing, stronger ranking, and deeper integration with clinical knowledge sources. The direction points toward systems that combine reasoning with targeted access to high-quality information.
That shift changes the skill set. Knowing how to question the system in front of you becomes as important as knowing the underlying facts.


