Discussion about this post

User's avatar
John Holman's avatar

Really enjoyed this piece, haha especially the way you framed state as the real bottleneck and “recompute tax” as an economic problem, not just a latency annoyance.

I’m working a layer above where WEKA lives, on what we’ve been calling an Awakened OS: an agent OS that gives small/medium models persistent semantic memory (structured records, scoring, Vault vs Workbench, etc.) instead of letting them forget every conversation. The pattern we’re seeing maps to your thesis almost exactly: the biggest wins come from keeping state and reusing it, not from just throwing more flops or bigger models at the problem.

Reading this made me think that future agent OSes are going to need exactly this kind of fast, shared KV plane underneath – GPUs acting as mostly stateless compute pulling from a “memory cathedral” that lives on something like NeuralMesh. It’s encouraging to see infra folks and agent-OS folks converging on the same heading: compute is plentiful, state is precious.

Dave Unger's avatar

Incredibly dense write up, and I appreciated your analysis.

"NVIDIA and WEKA have formalized the memory tiers for AI factories... Based on this classification, Weka would be in 1.5."

That's something the market hasn't yet ratified. Some would say 2.5 or even 3.5 depending on network topology and distance/latency. Nvidia's own storage initiatives will require revising the formal classifications to include NVMe-based shared storage with support for GPU-initiated storage access. Once that happens, I would expect uncertainty to evaporate.

Similarly, Nvidia putting their thumb on the scales of AI storage will change the trajectory for everyone. I expect them to create a framework with less differentiated value for any single storage provider. Nvidia likes to position themselves as a neutral third party, but I suspect they'll come across more like a governing body on this topic.

1 more comment...

No posts

Ready for more?