AI & Technology · Product Strategy

Why LLM Alone Can't Underwrite Your Deal, And What Actually Can

Minervian AI Research10 min read

The conversation around AI in commercial real estate has moved fast, perhaps too fast for the tools to keep up with the narrative. There is a wide spectrum of approaches being bundled under the label of “AI-powered underwriting,” and the differences between them are not cosmetic. They determine whether you end up with a workflow that is genuinely faster, more consistent, and more defensible — or one that generates sophisticated-looking outputs you cannot fully trust.

The most common misconception is that a large language model, given enough context, can underwrite a deal from end to end. The reality is it can provide an impressive first pass, but what happens after remains a gray area, and that matters enormously when you are deciding whether to deploy capital.

What a Language Model Actually Does

To understand where AI fits in underwriting, and where it does not, it helps to be precise about what a large language model is actually doing when it processes a document or responds to a financial question.

A language model is, at its core, a probability engine. It has been trained on an enormous amount of text and has learned to predict what tokens (words, numbers, symbols) follow other tokens in a given context. When you ask it to analyze a rent roll or build a cash flow projection, it is not executing a calculation the way a spreadsheet formula does. It is generating a sequence of outputs that, given its training, look like what a correct answer would look like. Usually the output is plausible, and can be quite good. But the mechanism is fundamentally generative, not computational.

This distinction has real consequences in financial modeling. Ask a language model to produce the same 10-year cash flow twice and you may get two answers that differ in subtle but meaningful ways: different assumptions about how recoveries are structured, different treatment of free rent periods, different formula logic for reversion value. Both outputs can be internally coherent. Neither may match the other.

Language Model
Generative

Produces outputs that look statistically correct. Two runs of the same prompt may yield different formula logic, different assumption treatments, and different numerical results.

Dedicated Calc Engine
Deterministic

Same inputs, same outputs, every time. Logic is auditable, versioned, and tested against real transactions. Variance is zero by design.

For tasks where variance is acceptable, such as summarizing a market report, drafting memo sections, answering lease clause questions, this distinction does not matter much. For financial modeling, where differences in assumption treatment compound across a 10-year projection, and directly influence your go/no-go decision, it matters entirely.

The Specific Failure Modes in CRE Underwriting

Commercial real estate cash flow modeling is not a straightforward calculation problem. It is a layered one, where the outputs of one calculation feed the inputs of another in ways that require careful ordering and consistent logic.

Consider what a proper 10-year cash flow model for a mixed-use retail asset needs to handle: base rent schedules with contractual bumps on different escalation structures, vacancy and credit loss applied differently to anchor versus inline tenants, expense recoveries that vary by lease type, tenant improvement allowances and leasing commissions that affect free cash flow timing, debt service across a financing structure that may include senior, mezzanine, and preferred equity, and a disposition calculation that correctly reflects stabilized NOI at exit, net of closing costs.

Each component has internal logic that must remain consistent across every period of the projection. The recovery calculation in Year 4 needs to use the same methodology as Year 7. The reversion NOI must be calculated using the same operating expense assumptions as the in-place cash flow, adjusted for stabilization. When a language model is asked to generate this model from scratch, it is being asked to write and execute a complex program on the fly, in a single pass, with no testing.

The probability that every component is correct, consistent with every other component, and applied correctly across all ten projection years, is low. Not because the model lacks intelligence, but because that is not what it was designed to do.

In practice, LLM-generated financial models exhibit characteristic failure patterns:

Simplification of recovery structures to avoid modeling complexity
Inconsistent treatment of vacancy across the projection period
Reversion calculations that do not correctly use stabilized NOI
Debt service calculations applying the wrong amortization logic or missing fee structures entirely

The probability that every component is correct, consistent with every other component, and applied correctly across all ten projection years, is low.

The Latency and Cost Problem

The mechanical limitations of LLM-driven underwriting are compounded by practical ones. A typical pure LLM workflow, from uploading documents, running rounds of prompting to extract structured data, generating the financial model, to validating outputs and iterating, takes 20 to 30 minutes to reach a first round of outputs.

Unlike a human-driven workflow, additional iterations still consume enormous amounts of time and tokens. A simple change, such as increasing rent growth from 2% to 2.5%, should take an analyst seconds. In a pure LLM workflow, it can take several minutes, because the entire context from deal inception must be re-sent to the server for processing. With that amount of context, attention quality declines and coherence drifts. An AI that built the first draft with ease can struggle to provide coherent answers by the eighth or ninth prompt in the same session.

For a deal team screening 30 opportunities a month, this latency and quality drift compounds quickly. More importantly, it inverts the value proposition: if the AI workflow takes longer than a prepared analyst with a well-built template, you have not gained any advantage.

20–30minutes
to first round of outputs

per deal, in a pure LLM workflow

$225–300per deal
in API costs at volume

at 15–20M tokens per full analysis with iterations

15–20Mtokens
consumed per deal cycle

across OM, rent roll, T12, code, iteration context over 10-15 runs

The cash flow calculation does not benefit from a language model's capabilities. It does not require interpretation, synthesis, or reasoning about ambiguous information. It requires deterministic, auditable arithmetic applied consistently to structured inputs. Running a language model to perform that arithmetic is similar to engaging a specialist to add up a column of numbers. The capability is there, but the fit is wrong, and you pay accordingly.

Where AI Actually Belongs in the Underwriting Stack

The right framing is not “AI-powered underwriting” versus “traditional underwriting.” It is: which parts of the underwriting process genuinely benefit from AI capabilities, and which parts require something else entirely?

The architecture that actually works combines both: AI at the extraction and interpretation layer, a deterministic calculation engine at the modeling layer. AI reads documents and populates assumptions. The calc engine runs the model. AI then helps the analyst interrogate results in natural language, while the engine runs those scenarios instantly and consistently.

Layer 1: Language Model
Extraction & Interpretation
Where AI wins
Reading and extracting OM assumptionsParsing rent rolls, lease classification, expiration mapping, embedded optionsIdentifying risk signals in trailing financialsQualitative market and submarket analysisLease clause interpretationNatural-language scenario interrogation
Structured assumptions passed to engine
Layer 2: Dedicated Engine
Deterministic Calculation
No AI needed
Base rent schedules with contractual bumpsExpense recoveries (NNN, gross, modified gross, CAM caps)Vacancy and credit loss by tenant cohortDebt service across senior, mezz, and preferred structuresReversion NOI and capitalized exit value10-year period-by-period cash flow projection

This architecture produces something a pure LLM workflow cannot: a result you can defend. When a lender, LP, or investment committee asks how you got to a particular number, the answer is a traceable calculation with documented inputs, not a black box.

The Traditional Software Problem Is Real, But Different

The case for AI in CRE underwriting is not made against a strong incumbent. Traditional dedicated software that has dominated institutional CRE for decades has its own serious limitations: desktop-only access, license costs that put sophisticated modeling out of reach for smaller operators, feature development measured in years rather than weeks, and user interfaces that reflect design sensibilities from an earlier era.

Spreadsheets solve the access problem but create others. Flexibility without structure is fragility. A complex commercial lease model in a spreadsheet is typically brittle: formula errors propagate silently, scenarios require manual duplication, and the model becomes progressively harder to audit as it grows. Commercial lease projection at any real scale, including recoveries, co-tenancy, percentage rent, portfolio roll-ups, strains spreadsheet software to the point where the tool itself becomes a source of risk rather than a check against it.

The problem with traditional tools is not that they are too powerful. It is that they are inaccessible, slow to evolve, and structurally ill-suited to the modern deal team's workflow. AI alone does not solve those problems. A purpose-built platform that uses AI in the right places does.

What the Right Tool Looks Like

The underwriting workflow that actually serves deal teams in the current environment has three defining characteristics.

1
Qualitative layer

LLMs should be deployed to take advantage of their strengths: ingestion of unstructured information (emails, pdfs, screenshots), data extraction & verification, market research, comps analysis, reasoning over memo sections write-up.

2
Speed, at the right place

Deterministic outputs should take seconds. The model runs instantly once assumptions are in. The analyst reviews results rather than waiting. AI-intensive work runs in the background, not blocking the workflow.

3
Consistency

The same deal run twice must produce the same output. The same methodology applies to every asset class. A new analyst on the team runs the model the same way the senior analyst does and gets comparable results.

4
Auditability

Every output traces back to an explicit input. Financing calculations, recovery structures, exit assumptions are all visible, editable, and documentable. When someone challenges a number, the answer is a clear chain of logic.

The Bottom Line

The firms that will extract the most value from AI in CRE over the next few years are not the ones that hand the most work to a general-purpose language model. They are the ones that build or adopt workflows where AI and deterministic computation each do what they are genuinely suited for, and where the handoff between the two is clean enough that neither degrades the other.

LLM alone cannot underwrite your deal. But the right platform, one that uses AI deliberately, within a reliable calculation architecture, can change how fast and how confidently you do it yourself.

AI & TechnologyProduct StrategyUnderwritingLLMCRE TechnologyCash Flow Modeling
Back to Insights