26 Comments
User's avatar
Leon Liao's avatar

The key is to separate three things: the model, the narrative, and the workflow.

Mythos is not some magical fully autonomous hacker that suddenly appeared out of nowhere. A big part of what people are reacting to is actually a high-capability model embedded inside a much broader security research pipeline. So the real shift is not “one model woke up,” but that strong models plus strong security workflows are starting to produce genuinely useful offensive and defensive results.

That also means the moat is narrower than the hype suggests. Anthropic’s edge seems to be more about exploit development, multi-step reasoning, tool use, and reliability, not that only Anthropic can even detect these classes of vulnerabilities.

But the correction should not swing too far in the other direction. Even after adjusting for hype, the underlying capability jump is real. Frontier models are moving beyond just helping read CVEs or draft PoCs and toward something much closer to operational usefulness in software understanding and attack-path exploration. That is the real boundary, and also the real issue.

Devansh's avatar

Very true. That's why I said that mythos is impressive.

Mikey B's avatar

WOW!

Reads like a murder mystery.

Amazing information as provided by one or more brilliant AI minds.

What great fun to read and learn from.

Thank you Sir Devansh (Dev<3)

Devansh's avatar

I'm glad you enjoyed this

IndustryReport's avatar

This piece really sharpens something most people overlook: the illusion of control creators feel on platforms like Substack. On the surface, you’re given clean dashboards, views, open rates, subscriber growth, but those are outputs, not the underlying drivers. Even Substack itself emphasizes high-level metrics like revenue, subscribers, and engagement as the core lens to judge performance.

What’s interesting is how that shapes behavior. When visibility is limited to outcomes, creators naturally optimize what’s measurable: formats, hooks, posting cadence, while the deeper levers (distribution pathways, network effects, recommendation systems) remain abstract or invisible.

The real insight here is that platforms don’t just host contentthey co-author outcomes.

And unless you understand that invisible layer, you’re essentially iterating inside a system you don’t fully see, which makes progress feel random even when effort is consistent.

Devansh's avatar

Well said

Vaibhav's avatar

If scaffolding is at play here I wonder what they would do with analysis tools like joern in place.

Devansh's avatar

That's a good question

Gabe's avatar

Nice article Dev

Devansh's avatar

Thank you <3. Put a lot into it

Gabe's avatar

I know !! I have to read it again but it calm my curiosity itch 😁 Thank you for the great effort 🙏love it

Devansh's avatar

Nothing but the best for you

James's avatar

Incredible analysis!

Devansh's avatar

Thank you.

ScienceGrump's avatar

Thanks for an incredibly thorough, informative dive into the hype. I'm curious whether Anthropic published the false positive rate for Mythos prior to *any* human assessment. When they say x% validated, did any curation happen first? My suspicion is that model performance is not so much jagged as purely stochastic; there is no rhyme or reason for when a given model will seem brilliant or fall on its face. The high false positive rate in AISLE might not be so different from Mythos.

Devansh's avatar

They did no research.

Tanel's avatar

Thank you for this excellent expose. Way too many people think Anthropic are the "good guys".

Devansh's avatar

Yeah. I love that they stood against surveillance systems but they have a long history of sketchy behavior

Louisa Nicola's avatar

This is a solid correction to the narrative. The real takeaway isn’t that Anthropic faked capability, but that the edge is narrower than advertised. If smaller models plus good pipelines can replicate core findings, then the advantage is engineering, not intelligence, and that changes the entire competitive story.

Aseem Athale's avatar

I am unsure why everyone keeps praising Anthropic about their stance on surveillance systems. Their only problem with the DoD contract and everything was that they don't think their models are accurate enough to be used in that context, which might lead to unintended harm.

Which implies that they'd be fine with their use in surveillance systems, but only once their models are accurate enough.

Leslie's avatar

Been building a primary-source record of the governance and money side of this. hhhrecord.org. Your technical teardown is the other half.

Robert's avatar

I'm actually more concerned about this model being released than before I read this 😅

If Mythos is not much better at finding bugs while significantly better at building multi-stage exploits then its' benefits are weighted strongly towards the red team.

Austin Morrissey's avatar

when are we getting the constructing research vulnerability pipelines post ?

L. rufus's avatar

Am showing this to one of our clients on Monday. Their CIO called frantically after the Claude Mythos story “broke”. He was adamant we “do something”. After some discussion I offered to disconnect their network from the Internet and then we could go hit a bucket of balls at the nearby golf course. I think he actually considered disconnection (yikes).

ToeKneeEi's avatar

Shhh… The Mythos sky-is-falling hype is mostly a play to spook the White House to kiss and make up with Anthropic because national security, if you buy the hype, trumps the Hegseth feud. It’s bought Dario a meet with Susie Wiles and some words of defusememt from all. Look to see Anthropic’s federal contracts restored if Pennsylvania Avenue doesn’t read Devansh’s post too carefully.