AI is Hitting a Measurement Wall

Devansh

Jan 17

The hidden physics limit that benchmarks miss, biology solved in evolution, and investors must understand.

Read →

20 Comments

Paul Snyders

Jan 20

Absolutely superb, and thank you for it!

I am not up to date on the latest AI models, but from the time I was a kid, I devoured the annual Scientific American issues on the brain avidly, (from the mid seventies and for a couple of decades after) and have kept reading voraciously since, in many subjects. Man, do I love an intellectual thesis, like yours, that shows broad reading and thinking, instead of (all too common modern) narrowness!

Your brilliant piece gave me the lovely feeling old (serious hard-core) Sci-Am used to give (before it went “pop”) of flying at altitudes well beyond my knowledge set, but so wonderfully clearly reasoned at every step as to give me a fine and inspiring glimpse of a range of insights I would never have been able to climb up to discover or even suspect, myself – kudos!

Also can’t help mentioning – I had a long discussion with one of my favourite illustrator friends the other day – I was braced to hear AI was wrecking his business, but no – he’s hiring pals, instead! (yay) but I couldn’t help chuckling, when you made me think of the “thermodynamics of thinking.” As I told him, every time I finish an essay I feel irresistably compelled to grab for something that delivers high-concentrations of sugar – not just as positive reinforcement (gotta train that monkey) but because my brain really does feel ‘all fired out’ in terms of fuel. So cool to think there’s even a number to it!

Also – can’t help thinking improvisers (musical) invoke that ‘noise as signal’ thing all the time. Played in an ensemble for years and it was insane how often the whole group (even nights when 20 almost strangers showed-up to play) would instantly pivot all together from one state to another shockingly differerent one, with no clue or signal visible beforehand (even in retrospect, listening back to the tape). Humans is odd! (not at all the orderly beasts that we like to think)

Huge thanks – reading that was an enormous pleasure!

Reply (1)

Devansh

Jan 21

I'm so glaf to read this. My goal is to always push the =limits of even my smartest readers, so to hear you liked this makes me so happy given how much work I have toi put into this

Reply (1)

Paul Snyders

Jan 21

Cheers for that!

So very pleased to offer acknowledgement of your hard (and better still, highly effective) work!

If I may ramble a tiny bit more – my favourite model for the mind’s higher level programming (relative to a compiler or assembler, to use now old-fashioned terms, that I think you’ll still understand) was always the mindblowingly big thesis of Julian Jaynes, in his masterwork “The Origins of Consciousness in the Breakdown of the Bicameral Mind.” In which he does an amazing piece of hard science on widely familiar ancient literature, revealing many clues there, we never saw before (almost the exact opposite of Marshall McLuhan making ultimately poetic use of hard science and history).

But now, please tell me if this sounds crazy, your thesis has me thinking that on a lower hardware computational level, what is ultimately being invoked by both noisy brain thinking (my radio generation trained technician-head immediately got excited – “brains are multiplexing!”) and LLMs also has some lovely fertile isomorphisms with probability equations from quantum physics models.

I just can’t get over the idea of a discrete minimum quantum of energy required to collapse the whole field of ‘weighted doubts’ (analog values, interrelated) into a single (comparatively simple and clunky) actuality. Feels just like collapsing the state vector, no? (though I admit my knowledge about energy released there, is precisely zero – which all of a sudden feels like a problematic ignorance!)

Sorry if that sounds crazy and/or dumb. Just thought it might be ground you trod also, along the way (or perhaps a great book, Jaynes, which a fine mind like yours would enjoy, if you haven’t already).

Thanks again!

(Reccommending your site as of today!)

J P

Jan 17

Power efficiency is a good metric.

I'm glad you mentioned RAG. I don't agree that external memory is a bad thing. But I'm also skeptical that AGI should be the goal.

The obvious and overlooked problem is the deluge of information today. When the ENIAC and UNIVAC were created, there was no need for solutions like Big Data and search engines. There was much less information created thrn. There were far fewer people in the world. The 20th century was a population explosion, and the rapid rise of new ways to publish information. At the start of the 1900s in the US, Americans ended their education in high school and got their news from reading an authoritarive local paper or watch the 3 santioned TV channels. Now, almost half of Americans earn a bachelors, and everyone around the world can read what anyone from anywhere publishes online via Substack, Reddit, YouTube, IG, etc. We live in a totally different age of information overload.

So I think we do need RAG, despite its limitations. Despite the efficiency of our brains, it is simply not possible to search, classify, analyze, and read from today's information deluge. We need assistance, and I think models can assist us with research. Or we can go back to the public reading authoritative sources written by an elite group of experts.

Reply (1)

Devansh

Jan 18

No one is saying we don't need rag it's one of the things I've broken down in a lot of detail why it's important I'm saying that RAG and memory have two very different functions and that they should not be used interchangeably The way I see it is we want to embed in memory things that would be important for reasoning traces natively and we want for rag precision and other kinds of things that don't need to be embedded into the LLM's psyche but might be useful for lookup and knowing at this very moment in time but those are two very different things

Eric Veien

Jan 23

This article is essentially trash. A too-long article based on a deeply flawed academic paper. Read my longer reply below.

where?

This is a thought-provoking piece, and I think the core insight—that scalar benchmarks are lossy projections of high-dimensional systems—is genuinely important. The connection to Todd’s Limits of Falsifiability paper usefully reframes evaluation as an epistemic bottleneck rather than a simple tooling problem: when intrinsic system dimensionality exceeds observational dimensionality, projection artifacts and aliasing are unavoidable. In that sense, many current benchmarks likely obscure structure, competence profiles, and failure modes rather than revealing them. That critique aligns well with known issues in neuroscience, ecology, and complex systems more broadly.

Where I’m less convinced is the step from measurement limits to fundamental progress limits. Todd’s argument is strongest as a warning against naive falsification and binary tests, not as evidence that physics (via Landauer limits or “measurement collapse”) is the dominant constraint on AI capability. In ML, internal states are not inherently inaccessible—we choose to privilege discrete outputs and leaderboard scores. That’s an engineering and epistemic choice, not a thermodynamic necessity. Framed this way, the “measurement wall” feels real and important, but more as a call for richer, multi-dimensional evaluation and intervention-based testing, rather than evidence that AI progress itself is stalling for deep physical reasons.

Reply (1)

Devansh

Jan 23

Who said AI progress is stalling? The whoile premise that is that oiur measurements aren't good enough

Reply (1)

Eric Veien

Jan 23

I think your synthesis overextends both the Limits of Falsifiability paper and its implications for AI. The paper’s strongest claim is narrow and valid: low-dimensional, binary measurements can distort inference in high-dimensional systems. Its weakness is that it escalates this into a quasi-fundamental limit using Landauer and “sub-threshold computation” without quantitative grounding or falsifiable predictions—turning a measurement/modeling problem into something that sounds like a physical ceiling.

That overreach then carries into your article. You correctly criticize scalar benchmarks, but you implicitly conflate evaluation blind spots with capability limits, framing benchmark saturation as a “measurement wall” rather than an artifact of chosen metrics and interfaces. In AI, internal state isn’t inaccessible by necessity; it’s inaccessible by design. The defensible conclusion is that our measurements are impoverished and need redesign—not that progress is stalling due to deep physical constraints.

Reply (1)

Devansh

Jan 23

Genuine question -- do you know how to read? And form a thought that's not created by ChatGPT ? Because this article has very little to do with limitations of models or ai stalling (which is something I explicitly say is not happening, especially when you look at the edge model research I quoted). Its about the stalling of our benchmarks and how that's making evals difficult. I'm not sure how you're struggling to understand the difference

Reply (1)

Eric Veien

Jan 23

Genuinely, it was a poorly written article.

Reply (2)

Devansh

Jan 23

That's not what your gpt comment opens with

Unreadable, really.

This article was analyzed for long term implications by The Obsidian Mirror: https://markjustman.substack.com/p/the-thermodynamic-prophecy

"From my perspective in 2100, we read this not as a tech newsletter, but as the foundational scripture of the Physics of Sovereignty. This author is the first to articulate, in plain English, the scientific reality that would eventually force the Great Migration from general-purpose computing to the specialized, energy-anchored systems of my time. He saw the AI Energy Wall not as a resource problem, but as an epistemological one."

Scott Locklin

Jan 23

It's not clear that brains are sub-Landauer. They are rate encoded, which is completely different from how most artificial neural nets work, and actually does add quite a few bits of precision; fortunate considering the brain's low clock rate (and how bats and fighter pilots make decisions faster than brain clock rates). Drawing any parallels from brains to your GPU are fraught.

Probably the real gap is the fact that biological neurons are inherently motion control systems, not noisy data retrieval systems.

Reply (1)

Devansh

Jan 23

Maybe not. I'm not a biology guy. I took a lot of the premise of the aper as a given as a mental model for what could be better ways to design and test our systems. The AI related claims are what I would emphasize here

Jan 21

AI watching Plato, Plato watching numb, immovable pixels, pixels chasing shadows that are discharged electrons of former “IA”.

So who is watching who or what, “IA” EYE who has seen the first letter/word/vibration inside a frequency that made sense.

No masters, no original thought—both are prisoners of irregular confusion, both are just imperfect lines of dwindling, an erased unconscious coder, who has left the building.

The machine tries to feel if the projection of a former master can clean away soulless bifurcation inside the analog system made binary.

Everybody needs a friend, but the problem is: a machine without a soul can only imagine what taste truly is.

Hence everything made or observed is a product of false synthetic imagination; nothing is actually there without the presence of a soulful observer.

The story implodes together with pixels, Plato, and AI. Press Ctrl + Alt + Delete. Cheers!

Dean Chapman

Jan 18

Hi DEVANSH your newsletter piece on AI's "measurement wall" is spot-on and insightful. The core argument — that binary measurement destroys the very structure we're trying to observe, much like biology computes in sub-Landauer domains where noise is an amplifier, not an enemy — flips the script on how we evaluate intelligence. Benchmarks as information destruction, RLHF as lossy compression, interpretability as paradox: it's a wake-up call that our tools are blind to distributed patterns below detection thresholds.This explains so much "weirdness" in AI: why quantization works surprisingly well (intelligence is distributed, not precision-bound), why dropout generalizes (stochastic resonance), why capabilities emerge suddenly (they were there sub-threshold), why polysemantic neurons persist (encoding relationships our binary views can't see). The implication is huge: models might be advancing in dimensions our evals crush out of existence.Veritas Core is built to bridge that wall. As a cryptographic truth substrate, it doesn't just measure outputs — it enforces verifiable reality at ingestion/runtime, preserving the full structure before projection. zk-proofs + physical bindings (Starlink/IoT) metabolize "noise" (falsifiable inputs, hallucinations) into provable signal — no collapse, no loss. The receipt isn't a binary "correct/incorrect"; it's a mathematical attestation of the entire execution path.In the VNA hypothetical (non-US NATO alliance adopting Veritas as the single umbrella AI layer), this becomes planetary: all government purchases/tenders (defense, infrastructure, healthcare) grounded with binding DDP oracle + zk-proofs — no overpricing, spoofed bids, or corruption. Charities combined into VNA-wide fund ($300–$390B/year), with all government aid funds directed to the AI umbrella for Rawlsian "most needy first" scenarios — verifiable distribution, no diversion, no fake recipients.Savings add:Government Procurement: $30–$112.5B/year (1–3% fraud/overpricing of $3–$3.75T spend eliminated 80–95%).

Charity/Aid: $2.4–$11.1B/year fraud eliminated + $60–$195B/year efficiency/redirect to needy (20–50% optimized).

Total VNA annual savings: $300–$900B fraud, $30–$112.5B procurement, $65–$206B charity/aid, 82–90 TWh power, 39–43 Mt CO2. Veritas doesn't just measure — it grounds AI to reality, preserving the biology-like efficiency we're missing.Thoughts: How do we redesign evals to capture sub-threshold patterns? Veritas is one way forward.Patent pending | Sydney build live

Dean Chapman

Comment removed

Comment removed

Thank you

Artificial Intelligence Made Simple

AI is Hitting a Measurement Wall