AI & Technology · Underwriting Risk

Context Bloat: The Hidden Liability in AI-Assisted Underwriting

Minervian AI Research12 min read

There is a moment in almost every AI-assisted underwriting session when something goes quietly wrong. You have been working the model for an hour, feeding it the rent roll, the T-12, the broker OM, your own assumptions. You change the exit cap rate by 25 basis points. The model adjusts the output. You move on.

What you may not realize is that you didn't just change a number, but the entire foundation of a structure that the model has been building, layer by invisible layer, since prompt #1. Unlike a spreadsheet, which recalculates cleanly from first principles, the model doesn't go back. It goes forward, carrying the entire context from the first prompt: the OM, the tweaks, the code it built.

This is context bloat. In commercial real estate underwriting, where precision is the product and IC credibility is the currency, it may be the most underappreciated risk in adopting AI at scale.

Three Forces Are Working Against You

Understanding context bloat requires understanding how large language models actually process information. Unlike a human analyst who can selectively recall relevant facts, an LLM re-reads the entire conversation context on every single prompt. Every exchange, from your questions, your corrections, to the model's prior outputs, the caveats it generated, the assumptions it made when you were ambiguous, is present and influential every time you ask something new.

Three structural forces turn this into a liability.

The first is architecture. LLMs don't have working memory in the human sense. They have a context window: a document that grows with every prompt and response. By prompt #10 of a substantive underwriting session, that document might contain your original deal brief, three rounds of cash flow modeling, a sensitivity table the model generated, a clarification you made about the lease structure, and a correction you issued when the model miscalculated vacancy. All of these are active. All of these are influencing the next output.

The second force is entropy. Ambiguity compounds. Early in a session, you might describe a lease as “NNN with some landlord obligations.” That phrase is doing a lot of work. The model makes an implicit assumption about what “some” means. Perhaps it weights landlord obligations at 3% of revenues but it never tells you this. You move on. Two prompts later, you ask for a stabilized NOI. The model builds on its earlier interpretation. By prompt eight, when you ask for an IRR sensitivity, you are getting a number that is downstream of an assumption you never made, never reviewed, and may not even know exists.

The third force is human behavior. Analysts iterate. They do not restart. In a normal underwriting workflow, you refine assumptions progressively; you don't tear up the model every time you learn something new. This is efficient when you're working with a spreadsheet, where formulas recalculate deterministically. It is risky when you're working with an LLM, where each refinement doesn't replace earlier context. It joins and compounds on it.

The Problem You Won't See Until You're Presenting

The insidious quality of context bloat is that it doesn't announce itself. The model continues to produce confident, well-formatted, internally coherent language. The numbers may not look wrong. The narrative seems to hold together. The sensitivity table has the right column headers.

What degrades is the relationship between those outputs and your actual deal.

Consider a concrete scenario. You are underwriting a suburban office-to-medical conversion. Early in the session, you ask the model to sketch a leasing schedule based on your initial assumptions: 18-month lease-up, one major anchor tenant, three smaller tenants. The model generates a schedule and, in doing so, makes assumptions about tenant improvement allowances and free rent periods consistent with its training on similar deals.

You never explicitly validate those assumptions. You move on to the debt sizing. Then the waterfall. Then you come back and tell the model the anchor tenant is actually taking 40% less space than originally discussed.

The model updates the leasing schedule. But does it recalibrate the TI assumptions it embedded two prompts ago? Does it flag that the free rent periods it assumed were sized for a larger tenant? In most cases, it does not. The anchor tenant just got smaller. The TI amount, quietly, did not.

By the time you are presenting levered returns to your investment committee, you may be defending outputs that are downstream of five or six unexamined assumption layers. The session drifted, and the outputs' quality drifted with it.

The Specific Risks for CRE Underwriting

Commercial real estate amplifies these dynamics in specific ways. Deal complexity creates more surface area for entropy. A value-add multifamily acquisition might involve unit mix assumptions, renovation costs, lease-up phasing, refi assumptions, and promote structures. Each of these is a potential site of implicit model inference that can silently distort downstream outputs. Long sessions create more drift, and market condition sensitivity creates brittleness: a late-session cap rate change can destabilize outputs in ways that are genuinely difficult to trace.

Dilution of Critical Data Points

When a model's context window is saturated with documents, early or buried inputs, such as a lease expiration cliff, a below-market anchor tenant, or a carve-out in the environmental report, can receive disproportionately less "attention." The model may surface a coherent narrative that glosses over the detail that would have killed the deal.

Contradictory Source Reconciliation

Deal packages routinely contain conflicting figures: broker-adjusted NOI versus in-place actuals, sponsor projections versus third-party appraisals. With heavy context loads, LLMs may blend contradictory figures rather than flag the discrepancy, producing underwriting outputs that appear internally consistent but are built on unresolved conflicts.

Hallucination Under Compression

When context approaches token limits, models may begin interpolating rather than retrieving; generating plausible-sounding rent comps, cap rate assumptions, or lease terms that have no basis in the source documents. In CRE, where a 25bps cap rate error can shift value by millions, confident confabulation is a structural risk.

The Right Tool for the Right Job

LLMs belong in the underwriting workflow, but not everywhere in it. The highest-value applications are well-defined and short-session work:

Extracting structured data from rent rolls, OMs, and leases in short sessions
Synthesizing qualitative market research
Flagging risks in narrative form
Generating initial assessments and summaries

Complex cash flow projection is a different problem entirely. A DCF model with proper lease-by-lease waterfalls, debt sizing logic, and IRR sensitivity isn't something that should be reconstructed from a prompt each session. It requires a dedicated, deterministic engine: purpose-built underwriting software where the logic is fixed, auditable, and consistent across deals.

Legacy general-purpose software creates a third failure mode. Spreadsheets duct-taped together over years, with hidden rows, stale links, and version ambiguity, introduce their own entropy before an LLM ever touches the deal. Replacing them isn't just a luxury but a prerequisite for modern CRE underwriting workflow: reducing tens of hours of data extraction per deal and freeing the time for physical due diligence and relationship building.

AI-assisted underwriting fails when AI is asked to do everything. It works when LLMs handle extraction, research, and qualitative synthesis; a deterministic engine owns the financial model; and neither is asked to compensate for the other's blind spots.

The Underwriter's Edge, Reconsidered

The professionals who will get the most from AI-assisted underwriting are not the ones who trust the model most, but the ones who pair it with the right tools, and reserve their own judgment for the work that neither AI nor software can touch.

The right combination will be: LLMs handling extraction, synthesis, and qualitative first-pass work; a deterministic underwriting engine owning the financial model with integrity and auditability; and legacy patchwork replaced before it poisons either. When those pieces are aligned, context bloat stops being a burden.

The Underwriter's Advantage

That frees up the resource that doesn't scale with software: the underwriter's time and presence. This means making the physical site visit where you notice deferred maintenance the OM doesn't mention. Or building the broker relationship that surfaces an off-market deal before it's widely advertised. Or making the conversation with a tenant rep that reveals a renewal is shakier than the rent roll suggests. Or having the design and engineering judgment that spots a capital expenditure the model had no way to anticipate.

These are not tasks that better prompting will solve. They require being in the room, on the ground, and in the relationship. They are where underwriting edge is actually built.

AI & TechnologyUnderwriting RiskLLMCRE TechnologyDeal AnalysisRisk Management
Back to Insights