AI & MLMarch 18, 2026

The 98% Problem: What Healthcare AI Teams Actually Build Before Writing a Single Prompt

The API call is 2% of the work. The other 98% — LOINC normalization, FHIR conversion, compliance — is where healthcare AI teams actually spend their time.

The 98% Problem — an iceberg showing the API call above water and the massive terminology infrastructure below

Every week, another engineering team asks the same question: "Can we just plug OpenAI into our clinical workflow?"

The short answer is yes. The longer answer is that the API call is about 2% of the work.

BloodGPT published a detailed engineering breakdown of what it actually takes to build a production healthcare AI system on top of a large language model. Their analysis is thorough and worth reading in full — they've done the hard work of mapping out every component. The conclusion is striking: teams routinely underestimate the surrounding infrastructure by 10x.

This post unpacks why that happens, what the infrastructure actually looks like, and how terminology APIs can eliminate months of work from the critical path.

The Real Cost Breakdown

BloodGPT identifies ten major engineering components required for a production healthcare AI system. The LLM API call is one of them. The other nine look like this:

Component	Typical Timeline
PDF Parsing & Data Extraction	3–6 months
LOINC Code Mapping	2–4 months
FHIR R4 Conversion	3–6 months
Reference Range Harmonization	2–3 months
Data Storage & Retrieval	1–2 months
User Management & Auth	2–3 months
Visualization	2–4 months
Localization	2–4 months
QA & Testing	3–6 months
Compliance (HIPAA / SOC 2 / GDPR)	6–12 months

The bolded rows are terminology-specific. LOINC mapping alone — normalizing lab result names to standard codes — is estimated at 2–4 months of engineering. FHIR conversion adds another 3–6 months. These are conservative estimates for a team that already understands the healthcare data landscape.

Total estimated cost: hundreds of thousands to well over a million dollars, spread across 16–28 months of development.

Why Terminology Is the Hard Part

Consider a seemingly simple task: normalizing a lab result. A patient's HbA1c result arrives from a lab. Depending on the source system, it could be labeled:

"HbA1c"
"Hemoglobin A1c"
"Glycated Hemoglobin"
"A1C"
"Hgb A1c, Blood"
LOINC code 4548-4

All of these refer to the same biomarker. LOINC contains over 109,000 codes, each defined by six axes (component, property, timing, system, scale, method). Accurate normalization requires fuzzy matching across naming variants, clinical context awareness, and a confidence scoring system to flag uncertain mappings for human review.

The same complexity applies to diagnoses (ICD-10-CM has 46,881 codes), clinical findings (SNOMED CT has 586,763 concepts), and medications (RxNorm has 1.9 million concepts). Each code system has its own hierarchy, versioning, and mapping conventions.

Building this from scratch means downloading bulk files from CMS, NLM, and WHO; writing parsers for each format; building a search index; maintaining update pipelines; and ensuring your local copy stays current. That's the 2–4 month estimate — for a single code system.

A Practical Alternative

Terminology APIs let you skip the build-from-scratch phase and go straight to using normalized clinical codes in your application. Here's what the workflow looks like in practice.

Normalizing a Lab Result

Instead of building a LOINC mapping engine, search for the code:

curl -s -H "x-api-key: $FHIRFLY_API_KEY" \
  "https://api.fhirfly.io/v1/loinc/search?q=hemoglobin+a1c&limit=3"

The response includes normalized LOINC codes with display names, component breakdowns, and FHIR-ready coding references:

{
  "results": [
    {
      "code": "4548-4",
      "display": "Hemoglobin A1c/Hemoglobin.total in Blood",
      "component": "Hemoglobin A1c",
      "system": "Bld",
      "fhir_coding": {
        "system": "http://loinc.org",
        "code": "4548-4",
        "display": "Hemoglobin A1c/Hemoglobin.total in Blood"
      }
    }
  ]
}

That fhir_coding object drops directly into a FHIR Observation resource. No mapping table required.

Resolving a Diagnosis Code

When your AI system extracts a diagnosis mention from clinical text, you need to ground it to a standard code:

curl -s -H "x-api-key: $FHIRFLY_API_KEY" \
  "https://api.fhirfly.io/v1/icd10/search?q=type+2+diabetes&limit=3"

You get back structured ICD-10-CM codes with hierarchy information, and — if the code has HCC risk adjustment mappings — those are included too:

{
  "results": [
    {
      "code": "E11.9",
      "display": "Type 2 diabetes mellitus without complications",
      "category": "E11",
      "chapter": "4",
      "hcc": {
        "assignments": [
          {
            "hcc_code": "37",
            "hcc_label": "Diabetes without Complication",
            "model": "V28",
            "coefficient": 0.097
          }
        ]
      }
    }
  ]
}

Cross-Referencing Medications

For medication reconciliation or drug interaction checks, look up an NDC code and get back the full drug profile:

curl -s -H "x-api-key: $FHIRFLY_API_KEY" \
  "https://api.fhirfly.io/v1/ndc/00002-1433-80?shape=standard"

The response includes proprietary name, active ingredients, dosage form, route, manufacturer, and RxNorm mappings — all structured for programmatic use.

Using the SDK

For TypeScript/Node.js applications, the @fhirfly-io/terminology SDK wraps these APIs with full type safety:

import { FhirflyClient } from '@fhirfly-io/terminology';

const client = new FhirflyClient({ apiKey: process.env.FHIRFLY_API_KEY });

// Normalize a lab code
const loinc = await client.loinc.search({ q: 'hemoglobin a1c' });

// Resolve a diagnosis
const icd = await client.icd10.lookup('E11.9', { shape: 'standard' });

// Look up HCC risk adjustment
const hcc = await client.hcc.lookup('E11.9');

// Batch lookup — up to 100 codes at once
const batch = await client.ndc.lookupMany(
  ['00002-1433-80', '00074-3799-13', '00078-0357-15'],
  { shape: 'standard' }
);

Each method returns typed responses. Your IDE autocompletes the fields. No guessing at response shapes.

For Exploration and Investigation

Terminology APIs aren't just for production pipelines. During the exploratory phase of a healthcare AI project — when you're still understanding the data landscape — having instant access to clinical codes accelerates research:

Data profiling: Validate codes appearing in your source data. Are they current? Deprecated? Mapped correctly?
Schema design: Understand the structure of a code system before designing your data model.
Prompt engineering: Ground your LLM prompts in real terminology. Instead of asking a model to "classify this condition," provide it with the actual ICD-10 candidate codes and their descriptions.
RAG grounding: Use structured terminology data as retrieval context. A Wolters Kluwer analysis found that combining LLMs with curated terminology maps significantly improves both speed and accuracy compared to using an LLM alone.
Compliance research: Look up codes referenced in CMS rules, payer policies, or clinical guidelines to understand their exact scope.

This is especially relevant for teams evaluating clinical NLP pipelines. Entity extraction tools (from John Snow Labs, Rhapsody, or custom models) map extracted entities to SNOMED CT, ICD-10, LOINC, and RxNorm. Having a terminology API alongside your NLP pipeline lets you validate those mappings in real time, catch edge cases, and build feedback loops.

Fitting Into the Broader Stack

Terminology APIs don't replace the other components in BloodGPT's breakdown — you still need PDF parsing, compliance architecture, QA pipelines, and everything else. But they eliminate the terminology-specific work (estimated at 5–10 months combined) from your critical path.

The landscape of healthcare AI infrastructure is growing rapidly. OpenAI, Google, and Anthropic all launched dedicated healthcare products in early 2026. Platforms like Metriport handle FHIR data aggregation. eClinicalWorks launched an AI API Workbench for EHR customization. Each of these tools solves a different layer of the stack.

Terminology sits underneath all of them. Whether you're building a RAG system, a clinical NLP pipeline, or an AI agent that interacts with FHIR servers, you need accurate code lookups. A 2026 industry analysis put it directly: the winners this year will be those who invest in data quality and terminology management.

What's Available

FHIRfly's terminology API currently covers 16 clinical data sets, updated daily:

Data Set	Record Count	Use Case
NPI	9.4M providers	Provider lookup and validation
RxNorm	1.9M concepts	Drug concept normalization
SNOMED CT	586K concepts	Clinical findings and procedures
NDC	377K packages	Drug product identification
FDA Drug Labels	271K labels	Drug safety and labeling
LOINC	109K codes	Lab and observation normalization
ICD-10-PCS	79K codes	Procedure classification
ICD-10-CM	46K codes	Diagnosis classification
Connectivity	437K endpoints	FHIR endpoint discovery
HCC Crosswalk	28K mappings	Risk adjustment (CMS models V21–V28)
OPCS-4	1,665 codes	UK procedure classification
And more	—	Vaccines, claims edits, fee schedules, coverage determinations

Several data types (including ICD-10 and LOINC) include fhir_coding references for direct integration with FHIR resources — with more being added. Batch endpoints handle up to 100 codes per request. Search endpoints use Atlas Search for fuzzy matching across display names and descriptions.

Key Takeaways

The LLM API call is roughly 2% of a production healthcare AI system. The other 98% is infrastructure — and terminology is one of the largest pieces.
Building LOINC normalization, ICD-10 mapping, and FHIR conversion from scratch takes an estimated 5–10 months of engineering.
Terminology APIs eliminate that work by providing normalized, FHIR-ready clinical codes via REST endpoints and typed SDKs.
These APIs are useful throughout the development lifecycle — from early exploration and data profiling through production normalization pipelines.

Next Steps

Explore the API: Browse the documentation and try a few lookups — no credit card required.
Read BloodGPT's analysis: Their full build-vs-buy breakdown is an excellent reference for healthcare AI infrastructure planning. Their Claude-specific guide covers Anthropic's healthcare connectors.
Check the SDK: The @fhirfly-io/terminology package provides typed TypeScript access to all endpoints.
Try the MCP server: If you're building AI agents, the @fhirfly-io/mcp-server exposes terminology tools directly to Claude, ChatGPT, and other LLM frameworks.

Tagsaiterminologyloincfhirtutorial

Written by The FHIRfly Team — healthcare data, AI, and interoperability folks building better clinical coding APIs.

Build it on real terminology

Try any endpoint live — no sign-up required.

Open the playground →Read the docs