There is a moment — it happens every day, in offices in midtown Manhattan and in the glass towers of Century City and in the low buildings off the highway where mid-market funds do their quiet work — when an analyst sits down before a screen and opens an offering memorandum, and the question they have been given to answer is not complicated in the way that philosophical questions are, but in the specific and demanding way of commercial real estate: will this deal work? Will the numbers, under honest scrutiny, produce a return sufficient to justify the risk?
The work is not glamorous. It consists of pulling a rent roll and reading every line of it — not skimming it — and understanding not merely what the tenant is paying today, but what the lease says they will pay in year three, and what happens if they go dark. It consists of modeling those cash flows month by month, working through recovery clauses with the precision of an accountant. There is no shortcut to the task.
It is into this world of irreducible calculation that the technology industry arrived with a proposition: a new kind of machine, trained on more text than any human had ever read, could now understand the language of commercial real estate, extract structure from unstructured documents, summarize and synthesize and respond. The problem the industry carried was real. The solution being offered is partial. The distance between those two facts is where fortunes will be made and lost in the years ahead.
The Yellow Brick Road That Was Never There
Joe Schmidt, writing at Andreessen Horowitz in the spring of 2025, offered the clearest structural map of the territory in which this confusion has flourished. His argument was that the companies which would survive the consolidation of the AI application boom were not the ones that had taken a frontier model, surrounded it with connectors any developer could access, and called it a product. Those companies, Schmidt argued, were walking the Yellow Brick Road: a path that appeared to lead somewhere, smoothly paved and visually convincing, but that terminated in a revelation that the wizard was not what the travelers had been promised. The durable companies were the ones deep in Oz — vertical by design, threading models through proprietary calculation logic, domain-specific workflows, and institutional data so particular to their problem that no horizontal platform could absorb it.
This argument is correct. But applied to commercial real estate underwriting, it surfaces a failure mode Schmidt's framework identifies but does not fully anatomize. The problem in CRE is not only that horizontal AI apps face structural disadvantages, but that commercial real estate underwriting was never on the Yellow Brick Road to begin with. The road does not pass through the calculation layer.
A language model is, at its architecture, a probability distribution over tokens. This is not a limitation that the next model generation will resolve. It is a description of what the technology is: a system that has absorbed an extraordinary quantity of human language and learned to predict, with impressive persuasion, what token should follow what token. This capability is genuinely powerful in the domain of language — interpretation, extraction, summarization, the recognition of patterns in text.
What it structurally cannot be is a system capable of computing recovery structures for dozens of tenants with staggered lease commencement dates, individually negotiated CAM caps, tiered commissions specific to each market, and multi-tranche financing with tailored loan advance mechanisms. That calculation requires precision, not probability. The deal does not pencil to approximately the right IRR.
What the Analysts Built in the Tools That Were Never Designed for Them
The commercial real estate analyst of the early 2000s, of the 2010s, of last year, was a sophisticated practitioner responsible for analysis on which hundreds of millions of dollars of institutional capital might rest, and the tools available were, at their core, a spreadsheet application designed for general-purpose business calculation that had never contemplated the recursive complexity of commercial lease structures.
And yet what those analysts built in Excel — in multi-sheet workbooks with thousands of linked formulas, with custom VBA functions that encoded the firm's institutional knowledge, with careful manual checks because the software would not catch the error — was extraordinary. They built, on general-purpose infrastructure, something that functioned as purpose-built infrastructure, because they had no alternative and the work had to be done. Those professionals deserve better infrastructure. Not infrastructure that asks them to accept less precision in exchange for a faster interface. Infrastructure that meets them at the level of complexity they have always operated at.
The Architecture of the Two Layers
There is a distinction that must be made clearly, because it is one on which everything else rests. It is the distinction between the stochastic and the deterministic: between the layer of the stack that operates on probability and the layer that operates on precision.
Where an offering memorandum arrives and is translated into structured, verified input. Frontier models are genuinely capable at extraction, summarization, and pattern recognition in text. The analyst who spends hours manually transcribing data from a PDF should understand that this time cost is real and the technology to reduce it substantially already exists.
The waterfall. The multi-tranche capital structure with a construction facility tied to capEx spending, a permanent loan that closes at stabilization, and mezzanine debt whose repayment is governed by an intercreditor agreement. This is not a language problem. It requires a purpose-built engine that takes clean structured inputs and produces exact outputs whose derivation is auditable at every step.
The platform that conflates these two layers produces outputs that look correct and are not — which is materially worse than outputs that are obviously wrong, because outputs that are obviously wrong get caught.
What Accumulates in the Engine Over Time
The tribal knowledge of commercial real estate underwriting is not stored in any training corpus. It lives in the production behavior of a system that has processed thousands of deals across dozens of markets: in the pattern of exceptions that get escalated, in the recovery structures that flag anomalies, in the understanding of how institutional buyers in certain markets interpret lease language that technically admits of two readings but practically has always been read one way. This knowledge cannot be acquired by reading. It can only be acquired by running — by processing, by accumulating the feedback that comes from doing the work in production.
The calculation engine, built and refined through that accumulation, is the system of work. Not the AI layer above it — which can, and should, be swapped as better models emerge — but the deterministic engine beneath, which encodes decades of institutional knowledge and improves with every deal it processes. That engine is what the labs cannot absorb, cannot replicate, cannot reach with a horizontal platform, because it was built in production, through the work, over time, and the work is not text.