Discussion about this post

User's avatar
Synthetic Civilization's avatar

This is a great breakdown of why RL struggles but the deeper issue isn’t algorithmic.

It’s architectural.

RL tries to build intelligence from reward.

LLMs build it from prediction.

But both inherit the same limitation: they optimize inside a frozen objective.

General intelligence needs something neither paradigm provides:

a system that can update its own objective as its world model grows.

Understanding before optimization, not optimization pretending to be understanding.

Expand full comment
Neural Foundry's avatar

Excellent breakdown of RL's economic collapse. The cost-per-skill arithmetic is brutal and not talked about enough. Each narrow capability eating tens of millions in compute with zero transfer is basically like building a company where every new feature requires rebuilding the entire infra from scratch. The moment you realize AlphaGo's 45k years of experience cant even play checkers, the whole "scale to AGI" narrative kinda falls apart. Whatreally got me though is how foundation models inadvertently prove the point by doing all the heavy lifting through pretraining while RL just steers at the end.

Expand full comment
1 more comment...

No posts

Ready for more?