Agentic Prior Auth: Grounding the 72-Hour Clock
CMS-0057-F puts payers on a 72-hour clock for expedited prior auth. LLM agents can close the gap — but only if they're grounded against NCCI, MUE, coverage, and terminology data they can trust.

Every prior-authorization team has the same math problem. The volume is growing, the clinical review is expensive, and starting in 2026 the timelines get shorter. CMS-0057-F compresses expedited decisions to 72 hours and standard decisions to 7 calendar days, with machine-readable rationales exposed over a Da Vinci PAS API by 2027. The only way to meet that clock at volume is automation — and the only way to make automation trustworthy in regulated workflows is to ground every step.
This post is about what "grounded" actually means when an LLM agent is making prior-auth determinations against real CMS data, and how to assemble the tools an agent needs so that the rationale it produces can survive an audit.
The regulatory context, briefly
CMS-0057-F (the Interoperability and Prior Authorization Final Rule) tightens three things at once:
- Decision timelines. 72 hours for expedited requests, 7 calendar days for standard. Covered payers are Medicare Advantage, Medicaid/CHIP, QHPs, and federally facilitated exchanges.
- Rationales. Denials must include a specific reason tied to a coverage rule, coding rule, or documentation gap — not just a status code.
- APIs. Four required FHIR APIs by the 2027 compliance date: Patient Access, Provider Access, Payer-to-Payer, and Prior Authorization (the Da Vinci CRD/DTR/PAS triad).
On top of that, the HL7 FAST Security IG (SSRAA / UDAP) kicked in on January 1, 2026 for TEFCA participants. So the transport, the auth, and the decision content are all getting formalized at the same time.
Manual UM review doesn't scale linearly against that. Agents can.
Why ungrounded LLMs fail on prior auth
An off-the-shelf LLM answering "can 99213 and 99214 be billed together?" will happily produce a confident answer. Sometimes it's right. In production, "sometimes right" is indistinguishable from hostile — a wrong answer costs a denial, a resubmission, a member complaint, and potentially an ALJ hearing.
The specific failure mode is coding hallucination. The model has seen enough medical text to pattern-match a plausible-sounding reply, but it has no direct access to:
- CMS's current-quarter NCCI PTP (Procedure-to-Procedure) edit set — roughly 4M active code pairs that say whether two HCPCS/CPT codes can be billed together and whether a modifier is allowed
- MUE (Medically Unlikely Edits) — per-code unit limits that trigger automatic denials
- LCD/NCD coverage policies — what Medicare actually covers where, and under what conditions
- PFS/RVU — whether a code is even priced on the Physician Fee Schedule this quarter
- Up-to-date RxNorm/NDC/ICD-10/LOINC/SNOMED — so the codes in the agent's output actually exist
None of that data is static. NCCI ships quarterly. MUE ships quarterly. LCDs change continuously. An LLM trained eight months ago is working from stale snapshots, and the model itself has no way to tell you which parts of its answer are stale.
The fix isn't a smarter model. It's a narrow set of trustworthy tools the model is required to call.
MCP is the grounding layer
Model Context Protocol (MCP) is a small, well-specified contract for exposing tools to LLMs. Each tool has a JSON Schema for its input, a predictable output shape, and — most importantly for regulated workflows — every call is a discrete, auditable event.
That last part is the whole point. When a denial has to cite a rule, you don't want to cite "the model said so." You want to cite "NCCI PTP edit between 99213 and 99214, quarter 2026-Q1, CMS citation URL, retrieved at 2026-04-16T17:07:22Z, edit rationale: 'Misuse of Column Two code with Column One code'." That's the level of provenance an auditor wants, and it's exactly what a structured tool call emits.
FHIRfly exposes all of this at https://api.fhirfly.io/mcp. What follows is how the pieces fit.
The toolkit
These are the tools an agentic PA workflow actually calls. Shapes are pulled directly from the FHIRfly API.
ncci_validate — can these two codes coexist?
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "ncci_validate",
"arguments": {
"code1": "99213",
"code2": "99214",
"claim_type": "practitioner"
}
}
}
The response wraps a NcciValidateResponse as JSON inside an MCP content block:
{
"data": {
"code1": "99213",
"code2": "99214",
"can_bill_together": true,
"edits": [],
"summary": "No active NCCI PTP edits found between codes 99213 and 99214. They can be billed together."
},
"meta": {
"source": {
"name": "CMS NCCI",
"quarter": "2026-Q1",
"url": "https://www.cms.gov/medicare/coding/national-correct-coding-initiative"
},
"legal": {
"license": "public_domain",
"attribution_required": false,
"source_name": "CMS NCCI",
"citation": "CMS NCCI. Accessed 2026-04-16 via FHIRfly."
}
}
}
When an edit does apply, edits contains one or more items with claim_type, modifier_indicator ("0" / "1" / "9"), modifier_allowed, effective_date, and rationale. The agent never invents the rationale — it reads it.
mue_lookup — how many units are too many?
MUE catches "30 units of a once-daily injection" before the payer does. Input takes an optional service_type (practitioner, outpatient_hospital, dme):
{
"data": {
"hcpcs_code": "J0585",
"limits": [
{
"hcpcs_code": "J0585",
"service_type": "practitioner",
"mue_value": 400,
"adjudication_indicator": 3,
"adjudication_indicator_display": "Date of Service Edit: Clinical",
"rationale": "CMS Policy"
}
]
},
"meta": { "source": { "name": "CMS MUE", "quarter": "2026-Q1" }, "legal": { "license": "public_domain", "...": "..." } }
}
coverage_check — is this even covered?
{
"data": {
"hcpcs_code": "G0438",
"policies_found": 2,
"policies": [
{
"policy_type": "lcd",
"policy_id": "34567",
"display_id": "L34567",
"policy_title": "Annual Wellness Visit",
"hcpcs_description": "...",
"status": "Active",
"is_active": true,
"effective_date": "2024-01-01"
}
],
"summary": "Found 2 LCD coverage determination(s) for HCPCS code G0438."
},
"meta": { "...": "..." }
}
By default active_only is true. Pass false when you need historical context for a retroactive review.
pfs_lookup — what does CMS actually pay?
Returns the RVU breakdown (work / PE / MP) and the Medicare-calculated payment at both facility and non-facility rates:
{
"data": {
"hcpcs_code": "99213",
"description": "Office or other outpatient visit ... 20-29 minutes",
"status_code": "A",
"rvu": {
"work": 1.30,
"pe_non_facility": 1.39,
"pe_facility": 0.58,
"mp": 0.10,
"total_non_facility": 2.79,
"total_facility": 1.98
},
"conversion_factor": 32.35,
"calculated_payment": {
"non_facility": 90.26,
"facility": 64.05
},
"indicators": {
"global_days": "XXX",
"multiple_surgery": null,
"bilateral_surgery": null
}
},
"meta": { "...": "..." }
}
check_drug_interactions — FDA-sourced DDI
For any PA touching medication, the agent needs FDA-label-sourced interaction text, not free-form model output. Input is an array of up to 25 drugs (RxCUI, name, or NDC) plus optional label sections (drug_interactions, warnings, contraindications, boxed_warning, pharmacokinetics). Output includes interaction text, RxNorm ingredient enrichment, and DailyMed links for attribution.
The terminology floor
All of the above is coherent only because the codes themselves are real. The same MCP surface exposes lookups for RxNorm, NDC, ICD-10-CM/PCS, LOINC, SNOMED IPS, HCPCS Level II, and more — so the agent can validate every diagnosis and drug in the request before calling the billing tools. "The ICD-10 code you included doesn't exist" is a denial waiting to happen, and it's the one hallucination class terminology grounding eliminates outright.
An end-to-end example
Here's the inner loop of an agentic PA worker processing a request for an office visit plus a same-day procedure. The agent is Claude, Gemini, or whatever you prefer — the pattern is the same.
- Parse the request. Extract the primary and secondary HCPCS codes, the ICD-10 diagnoses, and any medications.
- Validate every code. For each ICD-10, call
icd10_get. For each HCPCS, callhcpcs_get. For each NDC, callndc_get. Any miss short-circuits to "bad request" with a structured reason. - Check unit limits. For each billable code, call
mue_lookup. If the requested units exceed the practitioner MUE and the adjudication indicator is a date-of-service edit, flag it. - Check code-pair conflicts. For each pair of HCPCS codes on the request, call
ncci_validate. Collect any edits wherecan_bill_togetherisfalseand record the modifier allowance. - Check coverage. For the primary procedure code, call
coverage_check. If no active LCD exists in the member's MAC jurisdiction, flag for human review. - Screen medications. If drugs are involved, call
check_drug_interactionsagainst the member's active medication list from the clinical attachment. - Compose the decision. The LLM's job now is narrow: given structured tool outputs, write a rationale in the required PAS format, cite each tool call with its
meta.source.quarterandmeta.legal.citation, and emit a Da Vinci PASClaimbundle.
Notice what the model is not doing. It isn't remembering whether two codes conflict. It isn't guessing a unit limit. It isn't inventing an LCD ID. It's orchestrating tool calls, reading structured outputs, and writing a rationale that a human reviewer can sanity-check in under a minute.
The SDK version
If you're building in Node instead of calling MCP directly, the same surface is available via @fhirfly-io/terminology:
import { Fhirfly } from "@fhirfly-io/terminology";
const client = new Fhirfly({ apiKey: process.env.FHIRFLY_API_KEY! });
const ncci = await client.claims.validateNcci("99213", "99214", {
claim_type: "practitioner",
});
if (!ncci.data.can_bill_together) {
for (const edit of ncci.data.edits) {
console.log(`${edit.claim_type}: modifier ${edit.modifier_allowed ? "allowed" : "not allowed"} — ${edit.rationale}`);
}
}
const mue = await client.claims.lookupMue("J0585", { service_type: "practitioner" });
for (const limit of mue.data.limits) {
console.log(`${limit.service_type}: max ${limit.mue_value} units — ${limit.adjudication_indicator_display}`);
}
const coverage = await client.claims.checkCoverage("G0438");
console.log(`${coverage.data.policies_found} active policies found.`);
const pfs = await client.claims.lookupPfs("99213");
console.log(`Non-facility payment: $${pfs.data.calculated_payment.non_facility.toFixed(2)}`);
Batch variants exist for MUE and PFS (lookupMueMany, lookupPfsMany, up to 100 codes) when you're processing a line-item list.
Or plain HTTP, if you're somewhere that doesn't do Node:
curl -s "https://api.fhirfly.io/v1/ncci/validate?code1=99213&code2=99214&claim_type=practitioner" \
-H "Authorization: Bearer $FHIRFLY_API_KEY"
Where the agent lives in your architecture
A production PA agent sits between the payer's intake system and the internal UM workflow, not replacing either:
Provider submission (PAS / CRD / DTR)
│
▼
Intake normalizer ──► Member eligibility service
│
▼
PA Agent ──► FHIRfly MCP (ncci_validate, mue_lookup,
│ coverage_check, pfs_lookup,
│ check_drug_interactions,
│ terminology lookups)
│
▼
Structured decision + rationale
│
├──► Auto-approve (clean, within policy)
├──► Auto-deny with cited rationale
└──► Escalate to clinical review
(queue with all tool-call evidence attached)
Two pieces make this legally defensible:
- Every tool call is logged with its response, quarter, and citation. When CMS or a member asks "why was this denied," the answer is a list of structured tool results, not a chat transcript.
- The LLM never overrides a tool result. If
can_bill_togetherisfalse, the rationale reflects that. The model's freedom is in how it explains the outcome, not in what the outcome is.
Trust, in one sentence
You can't regulate what a language model believes, but you can regulate what tools it's allowed to call and what data those tools are allowed to return. That's the whole thesis behind grounded agentic UM. The model is a planner and a writer. The tools are the source of truth. CMS-0057-F's 72-hour clock is survivable when the two are properly separated.
Key Takeaways
- CMS-0057-F's 72-hour expedited timeline is unreachable through pure manual review at scale — agents are the practical answer, but only if they're grounded.
- Ungrounded LLMs hallucinate medical codes. The failure mode is silent and dangerous in a regulated workflow.
- MCP turns "ask the model" into "call a tool" — and tool calls produce the structured, citable evidence an auditor (and CMS) actually expects in a denial rationale.
- FHIRfly's MCP surface covers the tools a PA agent needs:
ncci_validate,mue_lookup,coverage_check,pfs_lookup,check_drug_interactions, plus the underlying terminology lookups for RxNorm, NDC, ICD-10, LOINC, SNOMED, and HCPCS. - Put the agent between intake and clinical review — never replacing clinicians, but eliminating the clean cases before a reviewer ever touches them.
Further Reading
Written by The FHIRfly Team — a collective of healthcare data experts, AI specialists, and industry veterans building better clinical coding APIs.