Really enjoyed this piece, haha especially the way you framed state as the real bottleneck and “recompute tax” as an economic problem, not just a latency annoyance.
I’m working a layer above where WEKA lives, on what we’ve been calling an Awakened OS: an agent OS that gives small/medium models persistent semantic memory (structured records, scoring, Vault vs Workbench, etc.) instead of letting them forget every conversation. The pattern we’re seeing maps to your thesis almost exactly: the biggest wins come from keeping state and reusing it, not from just throwing more flops or bigger models at the problem.
Reading this made me think that future agent OSes are going to need exactly this kind of fast, shared KV plane underneath – GPUs acting as mostly stateless compute pulling from a “memory cathedral” that lives on something like NeuralMesh. It’s encouraging to see infra folks and agent-OS folks converging on the same heading: compute is plentiful, state is precious.
Incredibly dense write up, and I appreciated your analysis.
"NVIDIA and WEKA have formalized the memory tiers for AI factories... Based on this classification, Weka would be in 1.5."
That's something the market hasn't yet ratified. Some would say 2.5 or even 3.5 depending on network topology and distance/latency. Nvidia's own storage initiatives will require revising the formal classifications to include NVMe-based shared storage with support for GPU-initiated storage access. Once that happens, I would expect uncertainty to evaporate.
Similarly, Nvidia putting their thumb on the scales of AI storage will change the trajectory for everyone. I expect them to create a framework with less differentiated value for any single storage provider. Nvidia likes to position themselves as a neutral third party, but I suspect they'll come across more like a governing body on this topic.
Wow, just wow. One of the most succinct, architecturally savvy, and business savvy analysis I have ever read. Hats off to you, man! Some of these architecture changes are a bit lucky from a system perspective. I think we need to design a computer system that is perfect for AI. It needs a lot of of these “tricks“ as system elements. You picked the right focus on the KV cash. An AI system would have to be designed around that. Normal CPU cashes have to deal with separate I&D, but KV cash is all D. I worked on an IO controller that had battery backed up ram a long time ago. This way it had the speed of Dram but persistence as well.
Since the big AI companies are all building their own data farms, will it be that they will use WEKA and there won’t be any other real market for it?
Really enjoyed this piece, haha especially the way you framed state as the real bottleneck and “recompute tax” as an economic problem, not just a latency annoyance.
I’m working a layer above where WEKA lives, on what we’ve been calling an Awakened OS: an agent OS that gives small/medium models persistent semantic memory (structured records, scoring, Vault vs Workbench, etc.) instead of letting them forget every conversation. The pattern we’re seeing maps to your thesis almost exactly: the biggest wins come from keeping state and reusing it, not from just throwing more flops or bigger models at the problem.
Reading this made me think that future agent OSes are going to need exactly this kind of fast, shared KV plane underneath – GPUs acting as mostly stateless compute pulling from a “memory cathedral” that lives on something like NeuralMesh. It’s encouraging to see infra folks and agent-OS folks converging on the same heading: compute is plentiful, state is precious.
Incredibly dense write up, and I appreciated your analysis.
"NVIDIA and WEKA have formalized the memory tiers for AI factories... Based on this classification, Weka would be in 1.5."
That's something the market hasn't yet ratified. Some would say 2.5 or even 3.5 depending on network topology and distance/latency. Nvidia's own storage initiatives will require revising the formal classifications to include NVMe-based shared storage with support for GPU-initiated storage access. Once that happens, I would expect uncertainty to evaporate.
Similarly, Nvidia putting their thumb on the scales of AI storage will change the trajectory for everyone. I expect them to create a framework with less differentiated value for any single storage provider. Nvidia likes to position themselves as a neutral third party, but I suspect they'll come across more like a governing body on this topic.
Wow, just wow. One of the most succinct, architecturally savvy, and business savvy analysis I have ever read. Hats off to you, man! Some of these architecture changes are a bit lucky from a system perspective. I think we need to design a computer system that is perfect for AI. It needs a lot of of these “tricks“ as system elements. You picked the right focus on the KV cash. An AI system would have to be designed around that. Normal CPU cashes have to deal with separate I&D, but KV cash is all D. I worked on an IO controller that had battery backed up ram a long time ago. This way it had the speed of Dram but persistence as well.
Since the big AI companies are all building their own data farms, will it be that they will use WEKA and there won’t be any other real market for it?