AI Tools Don't Fail — Data Infrastructure Fails Them
More than 85% of healthcare AI pilots never scale. The reason isn't the models — it's the fragmented, unstandardized data underneath them.

There is a graveyard in healthcare AI, and it is growing faster than anyone wants to admit. Not a graveyard of bad ideas — a graveyard of tools that worked in the pilot but never survived contact with production data.
A recent Becker's Hospital Review article by Punit Soni, CEO of Suki, puts numbers to the problem: more than 85% of AI pilots in healthcare never reach full-scale deployment, and 95% report no measurable return on investment. Enterprises are spending $30 to $40 billion on generative AI globally with minimal returns. The tools don't explode — they fade. Budgets shift, clinical champions move on, and once-promising AI solutions quietly disappear.
The article identifies four root causes: workflow integration failures, trust deficits, wrong success metrics, and data fragmentation. The first three get most of the attention. The fourth is the one that actually kills the most projects.
The Data Problem Nobody Wants to Talk About
When an AI pilot runs in a single department at a single hospital, the data looks manageable. The team curates a clean dataset, maps the relevant codes, and trains or fine-tunes their model. The demo works. The metrics look strong.
Then someone tries to deploy it across the health system.
Suddenly the same medication is represented three different ways. One facility codes diagnoses with ICD-10-CM, another uses legacy SNOMED mappings, and a third has free-text entries from a decade-old EHR migration. Lab results use different LOINC codes for the same test. Provider identifiers don't resolve across facilities.
The AI model hasn't gotten worse. The data underneath it has gotten real.
This is the pattern that produces what Soni calls the "slow, silent fade." The AI tool still technically works, but its outputs become unreliable at scale because the underlying data is inconsistent. Clinicians lose trust after a few bad results — and as the article notes, "if you lose trust the first time around, getting people to retry a tool is an uphill battle."
Why Data Standardization Is the Missing AI Strategy
The Becker's article advocates four strategies for breaking out of pilot purgatory: data infrastructure first, workflow embedding, outcome-based metrics, and change management. That first strategy — data infrastructure — is the prerequisite for the other three.
Consider what "workflow embedding" actually requires. An AI tool that surfaces medication interaction alerts inside the EHR needs to resolve drug codes in real time. If the system stores medications as NDC codes in one context and RxNorm CUIs in another, the AI needs a reliable mapping layer just to understand what drug it is looking at. Without that, the alert either misfires or stays silent. Both outcomes erode trust.
Or consider "outcome-based metrics." You can't measure whether an AI tool reduces readmissions if the diagnosis codes it ingests aren't normalized. A model that performs well on ICD-10 I50.9 (heart failure, unspecified) but has never seen the same condition coded as I50.22 (chronic systolic heart failure) or the SNOMED equivalent will produce inconsistent results that make outcomes measurement meaningless.
The tools that survive — ambient clinical documentation is the best current example — succeed in part because they side-step the terminology problem entirely. They work with natural language, not coded data. But most clinical AI applications don't have that luxury. Drug interaction checks, clinical decision support, population health analytics, claims adjudication — these all depend on standardized, coded clinical data.
What Standardization Actually Looks Like
Data standardization in healthcare isn't a one-time ETL job. Clinical code systems update continuously: the FDA publishes new NDC records monthly, CMS releases ICD-10 updates annually, NLM refreshes RxNorm weekly. A "normalized" dataset from six months ago is already drifting.
This is where terminology services become infrastructure rather than tooling. Instead of each AI application maintaining its own code mappings, you need a shared layer that resolves codes in real time.
Here's a concrete example. Suppose an AI model is analyzing medication data across three facilities, and each represents the same drug differently:
import { Fhirfly } from '@fhirfly-io/terminology';
const client = new Fhirfly({ apiKey: process.env.FHIRFLY_API_KEY });
// Facility A stores NDC codes
const fromNdc = await client.ndc.getProduct('0069-3150-83');
// → { nonproprietary_name: "Lipitor", active_ingredients: ["atorvastatin calcium"] }
// Facility B stores RxNorm CUIs
const fromRxNorm = await client.rxnorm.getConcept('83367');
// → { name: "atorvastatin", tty: "IN" }
// Facility C has a brand name string: "Lipitor 20mg"
const fromSearch = await client.rxnorm.search('Lipitor 20mg');
// → [{ rxcui: "617314", name: "atorvastatin calcium 20 MG Oral Tablet [Lipitor]" }]
// All three resolve to the same drug. Without normalization,
// your AI model sees three unrelated medications.
This is not a sophisticated ML pipeline. It is plumbing. But it is the plumbing that determines whether your AI model sees one patient on atorvastatin or three patients on three different drugs.
The Infrastructure Beneath the Intelligence
The Becker's article makes an important observation: when AI integration succeeds, the technology becomes "almost invisible — operating in the background, handling documentation, surfacing insights, and streamlining decisions without adding burden to clinical teams."
The same principle applies to data infrastructure. Terminology services, code normalization, and interoperability layers shouldn't be visible to the AI application or the clinician. They should be foundational — a reliable substrate that every AI tool in the health system can depend on.
Health systems that invest in this layer before deploying AI tools will see a compounding return. Every subsequent AI application benefits from the same normalized data. Every model trains on consistent representations. Every clinical validation produces meaningful results.
Health systems that skip it will keep adding headstones to the graveyard.
Key Takeaways
- 85% of healthcare AI pilots fail to scale — and data fragmentation is the most underappreciated reason
- Terminology normalization is infrastructure, not a feature — it needs to be shared across all AI applications, not embedded in each one
- Code systems update continuously — static mappings decay; real-time terminology resolution is required
- Trust is fragile — a single bad result from inconsistent data can kill adoption permanently
- The successful AI tools are the invisible ones — and the most invisible infrastructure of all is standardized data
The AI graveyard will keep growing until health systems treat data standardization as a prerequisite, not a follow-up project. The question isn't whether your AI model is good enough. It's whether your data is ready for it.