18 Comments
User's avatar
Andriy Batutin's avatar

Reread again this morning - great article! Point about going smaller model to regain control - spot on. Leverage explainability and observability of small-er LLM to control bigger one

Caitlin Marie Connors's avatar

I haven't stopped thinking about this - thank you for writing it. The shift to spatial geometry approaches to understanding and working with LLM intelligence is so huge - for use cases, for effective governance, and to public understanding of what we're actually engaging with.

My boyfriend asked me last week: 'I've had four conversations this week about people who think we're going to hit a capability plateau - what do you think?'.

I said, 'We're treating a lot of problems like 'big code' problems instead of geometry problems. When that opens up, we're going to get rid of so much of the noise, and what's actually possible with the capability is going to be a lot clearer to people. And that will move fast, because it will actually WORK, which will upend a lot of assumptions about cost, validity, transparency, governance. It will be this year.'

And yeah. 'This year' for sure, because it's already happening.

And delighted that you're sharing rather than hoarding this. Thank you.

Devansh's avatar

In the meantime would really appreciate any help you can do with publicity

Devansh's avatar

You might like the larger project we're working on. About the nature of ai intelligence. That touched on a lot of these themes much more rigorously.

Will also be openly sharing that because intelligence belongs to humanity, not to individuals.

Suhrab Khan's avatar

This is a masterclass in extracting reasoning from existing LLMs. The distinction between generation and judgment, making generation dumb and judgment smart, is a critical insight too few appreciate. The $0.50 proof-of-concept underscores that the real bottleneck isn’t model size or knowledge, it’s access and control. Your approach makes reasoning inspectable, controllable, and scalable, which is exactly the paradigm shift the field needs.

Juan's avatar

Interesting exploration. To train the judges, do you use synthetic data of the form - (query -> plan, score)?

Mark Vickers's avatar

Is there an academic paper to go with this?

Devansh's avatar

I'm not in academia so don't know the process there. But the GitHub and experiments are Linked so anyone can see it

ToxSec's avatar

Really interesting. Love the angle you took for it.

Devansh's avatar

Thank you. Please do share it around since we need more contributors

Lex Ovi's avatar

The question is if 50 would then just use the model to troll and make documentaries on P. Diddy.

Devansh's avatar

that's peak AGI

Christieinitaly's avatar

Thanks for this. I just published something called ‘prompts don’t matter, patterns do’ and would love your input!