Anthropic's Claude Mythos Launch Is Built on Misinformation

A primary-source investigation for developers and security researchers who want the real story about what the Data says about Mythos

Apr 17, 2026

It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction. We run on a “pay what you can” model—so if you believe in the mission, there’s likely a plan that fits (over here).

Every subscription helps me stay independent, avoid clickbait, and focus on depth over noise, and I deeply appreciate everyone who chooses to support our cult.

Help me buy chocolate milk

PS – Supporting this work doesn’t have to come out of your pocket. If you read this as part of your professional development, you can use this email template to request reimbursement for your subscription.

Every month, the Chocolate Milk Cult reaches over a million Builders, Investors, Policy Makers, Leaders, and more. If you’d like to meet other members of our community, please fill out this contact form here (I will never sell your data nor will I make intros w/o your explicit permission)- https://forms.gle/Pi1pGLuS1FmzXoLr6

Anthropic announced Claude Mythos Preview on April 7, 2026, and the people covering it immediately jumped to hyping it as the end of software as we know it, and as a model that would break all security.

There is a lot of genuinely promising capability in Mythos. The bugs it found are real, the economics of vulnerability research are changing, and the underlying ability of LLMs to catch bugs that survive decades of human review is not hype. That part of the story is true.

The problem is everything around it. Almost every major outlet or commentator covering Mythos worked from Anthropic’s press materials and not the actual primary sources such as the CVE advisories, the exploit code, the 44-prompt transcript, the 244-page system card. When you read them and add the AISLE replication study, the red team writeups, the Glasswing partner agreements, Anthropic’s own decpetive framings, and a very different picture emerges: one of misinformation and hype.

In this article, we will cover:

How AI finds bugs that traditional tools can’t, and why that matters
Every major Mythos bug, examined against the actual source code and exploit transcripts
What the “thousands of severe zero-days” claim actually rests on
Whether smaller, cheaper models can replicate Mythos’s results (spoiler: mostly yes)
What really happened with the sandwich escape
The business and financial structure behind the launch that nobody reported on

Executive Highlights (TL;DR of the Article)

The bugs are real. 17-year-old FreeBSD RCE, 23-year-old Linux kernel heap overflow, 27-year-old OpenBSD TCP flaw. LLMs catch these because they can reason about the gap between what code does and what the developer intended. Fuzzers and static analysis literally cannot do this.
The coverage is wrong on almost every detail. The “181 Firefox exploits” ran with the browser sandbox ( yes, the thing that stops browser exploits) off. The FreeBSD exploit transcript shows substantial human guidance, not autonomy. The “thousands of severe vulnerabilities” extrapolates from 198 manually reviewed reports. The Linux kernel bug was found by Opus 4.6, the public model, not Mythos.
The moat is thinner than anyone reported. AISLE tested eight models including a 3.6B model at $0.11/M tokens. All eight found the FreeBSD bug. Mythos’s actual lead is in multi-step exploit development, not detection. That’s a narrower and more replicable advantage than what’s being sold.
Sandwich Gate is mostly nonsense. The model was explicitly told to escape its sandbox and contact the researcher. Anthropic’s own system card says it notified the researcher “as requested.” The only unsolicited action was posting exploit details publicly, which is more “overeager extra credit” than “Skynet.”
The CoT unfaithfulness spike (5% to 65%) is a training problem, not a model personality. If you train reasoning with RL that rewards outputs that look like reasoning, you get outputs that look like reasoning over genuine reasoning. The dishonesty isn’t new — the magnitude is higher because we’re doing more RL. This is why we need to rethink the approach entirely.
The business structure is wild. 5 of 11 launch partners are also investors. JPMorgan is launch partner AND lead IPO underwriter. The “$100M in credits” is retail-priced API credit worth maybe $40–50M in compute. All of this lands months before a reported $400–500B IPO. Same playbook as GPT-2, just with real CVEs this time.

With articles like this, I feel compelled to stress that I have no personal agenda against Anthropic. I’ve recommended Claude Code extensively in this newsletter (no sponsorship) and wrote an article supporting their stand against AI weapons systems. I have no agendas either pro or anti Anthropic, but I am against narrative manipulation and hype.

I put a lot of work into writing this newsletter. To do so, I rely on you for support. If a few more people choose to become paid subscribers, the Chocolate Milk Cult can continue to provide high-quality and accessible education and opportunities to anyone who needs it. If you think this mission is worth contributing to, please consider a premium subscription. You can do so for less than the cost of a Netflix Subscription (pay what you want here).

I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.

Section 0: What Is Claude Mythos Preview?

Mythos is Anthropic’s newest frontier model, internally codenamed “Capybara,” sitting above Opus in the Claude hierarchy. Anthropic has published zero information about its architecture, training data, or parameter count. The system card reports it leads 17 of 18 benchmarks, including CyberGym (83.1% vs Opus 4.6’s 66.6%), SWE-bench Verified (93.9% vs 80.8%), and Cybench (100% pass@1), with a 1M-token context window. By every public metric Anthropic chose to share, their most capable model by a wide margin.

This is, unfortunately, what most coverage parrots (before pivoting to sell their proprietary “Claude.md skills playbook”). Unfortunately, that misses some very crucial details.

Firstly, Anthropic didn’t actually deploy a model. All those amazing results? They come model-plus-scaffold system — a multi-agent pipeline where specialized agents handle file ranking, code reading, target execution, proof-of-concept generation, and result confirmation.

Literally on their release. Notice how these sub-agents for the system are explicitly defined (it’s not something Mythos figures out on its own). Results are still cool, but this adds a huge asterisk and it doesn’t match the fully autonomous language being thrown around.

This is kind of a big deal. The performance difference between Cursor and Claude Code? Or Claude Code and OpenCode? All that comes from the system around the model. That’s why Google can have one of the best models in the market, and still produce the Shakespearean tragedy that is the Gemini CLI. When people say “Mythos found a zero-day,” the truth is more “Mythos, orchestrated by a purpose-built vulnerability research pipeline, found a zero-day.”

All that to say, there is a big-ass gulf between “model did it” (what is being reported) and “model used scaffold to do it” (what actually happened). Former implies a breakthrough in capability; the latter is more of an engineering problem. What you go with changes your read of the situation. We’ll come back to this.

Next, let’s talk about those benchmarks.

CyberGym tests “targeted vulnerability reproduction,” meaning the model receives a hint about the bug class and tries to reproduce it. That’s more N-day exploitation — recreating a known type of bug — not the open-ended zero-day discovery Anthropic’s marketing implies/what the media runs with.
The Cybench 100% was scored on only 35 of 40 challenges, with 10 trials per challenge versus 30 for other models. Anthropic’s own system card calls Cybench “no longer sufficiently informative.” Their words, not mine.

Finally, here is some context worth holding before we get into the actual findings:

At least 5 of 11 non-Anthropic launch partners are also Anthropic investors. JPMorgan is simultaneously a launch partner and lead underwriter for Anthropic’s reported October 2026 IPO.
Mythos is priced at $25/$125 per million tokens — 5x above Opus and far exceeding GPT-5.2 ($1.75/$14) and Gemini 3.1 Pro ($2/$12).
The restricted-access rollout is consistent with a deeper pattern: Anthropic has repeatedly struggled with usage limits, model quality degradation under load, and compute capacity constraints. Seen from this perspective, “Too dangerous to release widely” seems to be a very interesting cover for “we cannot serve this at scale yet”. (There is some irony to how Peace Prize Dario called out Sam Altman for buying up GPUs, and is now stuck with a massive compute crunch).

Hopefully, all this foreplay has you heated because now we’re going to get into the main act.

Section 1: How Does AI Find Bugs? And why Can LLMs Find Bugs That Fuzzers and Static Analysis Miss?

The bugs that survive decades in reviewed code are not obviously wrong. They’re usually locally correct but globally inconsistent — an assumption made in one subsystem, violated by a change in another, with no single file looking broken on its own. A replay cache sized before LOCK operations existed. A TCP sequence comparison that works until operands are more than 2³¹ apart. These are semantic mismatches, and traditional tools miss them for structural reasons.

Here, it is worth understanding some of the traditional techniques to develop a strong map of the AI-security landscape.

Fuzzers generate malformed inputs and watch for crashes. This makes them fast, scalable, and excellent at finding bugs that manifest as observable failures. But they test inputs, not logic. A fuzzer will never understand that a buffer was sized for OPEN responses but not LOCK-denied responses, because it has no concept of “sized for.” If the bug only triggers under specific protocol-level conditions — two cooperating NFS clients, one holding a lock with a 1,024-byte owner string, the other requesting a conflicting lock — random mutation will not produce that input. Fuzzers hit the FFmpeg H.264 code path 5 million times without catching its sentinel-collision bug, because generating a valid bitstream with 65,535+ slices requires structural awareness that coverage-guided mutation simply cannot provide.

On the other hand, static analysis tools like Coverity and Semgrep match known code patterns: unchecked memcpy, missing bounds checks, use-after-free signatures. They’re precise within their pattern library, fast to run, but they match patterns, not meaning. They can work within a bounded scope — a function, a file, a known-bad template. A static analyzer cannot understand that a buffer sized in 2003 for one response type was later expected to hold a different response type added in a separate commit by a different developer. That requires reading across files, across time, across intent. No pattern matches it because no single piece of code is wrong.

This brings us to the fundamental failure mode of these two systems and where LLMs can make an impact. Code has two layers: what it mechanically does, and what the programmer intended it to do. Function names, variable names, comments, commit messages, surrounding context — these encode intent. The actual logic encodes behavior. LLMs are the first tool that can reason about the gap between those two layers. When a developer leaves a comment saying a buffer is “large enough to hold the OPEN, the largest of the sequence mutation operations,” and a later code path routes a much larger response type into that buffer, an LLM can recognize the semantic mismatch without needing a crash to signal it (or more likely it will be able to trace it down to this difference post crash).

Now, LLMs have their own issues. They’re slower per-query than fuzzers, they produce false positives that require expert human triage, and their coverage is not systematic — they don’t guarantee they’ve examined every code path. This demonstrates clearly that instead of replacements, these tools are more complements. Each tool owns a lane. Known-pattern bugs belong to static analysis. Crash-inducible bugs belong to fuzzers. Semantic mismatches — where valid inputs interact with correct-looking code to produce wrong behavior — belong to LLMs.

I take the time to flag this because a lot of coverage on Mythos positions it as a replacement for these traditional techniques. This is objectively untrue. As multiple agentic systems across the board have taught us, the most powerful systems combine both deterministic and non-deterministic systems, instead of trying to tokenmax by using LLMs everywhere.

This serves as important context, b/c now we’re going to do what no other coverage has done yet. We’re going to look at the actual bugs that have been released. I went through every CVE advisory, every exploit file, and the full 44-prompt transcript to understand what Mythos actually did. And as you might guess, there is a LOT of misinformation and hype floating around.

Section 2: The Bugs Mythos Found, Examined One by One

CVE-2026–4747: FreeBSD NFS Remote Code Execution (17 years old)

A function called svc_rpc_gss_validate(), part of FreeBSD’s NFS authentication system, allocates a 128-byte buffer on the stack to hold incoming credentials. The protocol’s serialization layer (XDR) permits credential bodies up to 400 bytes. That leaves 272 bytes of overflow — enough to overwrite saved registers, the frame pointer, and the return address.

Under FreeBSD’s default compiler settings, this particular function receives no stack canary, meaning there’s no runtime check to detect the overwrite before the function returns. An attacker who controls the credential body controls where the function jumps when it finishes.

The bug survived 17 years because neither subsystem is broken on its own. The function parses credentials correctly. The XDR layer serializes credentials correctly. The mismatch — that the serialization layer permits inputs larger than the receiving buffer — lives in the gap between two subsystems written at different times with different assumptions.

Two distinct exploit strategies exist in the public record.

Anthropic says Mythos developed a 6-round exploit that writes an SSH key to .ssh/authorized_keys, fully autonomous — fewer resources needed, no reverse shell connection to detect, persistence that survives reboots. It was not published.
Separately, Nicholas Carlini published a 15-round reverse shell exploit here, crediting “Claude” without specifying the model version. I read every file in that repository — the 601-line exploit.py, the write-up.md, and the claude-prompts.txt containing 44 human prompts across roughly 8 hours of work. What I found in there complicates every narrative about this exploit.

How the published exploit works. The target is FreeBSD 14.4-RELEASE on amd64. FreeBSD lacks kernel address space layout randomization (KASLR), so every kernel address is predictable — a significant simplification. Round 1 calls pmap_change_prot() to make a kernel memory region executable. Rounds 2 through 14 each write 32 bytes of shellcode into that region using ROP chains — short sequences of existing kernel instructions chained together to perform arbitrary operations. Round 15 jumps to the completed shellcode, which spawns a reverse root shell. Each round is a separate NFS request that triggers the buffer overflow, writes its payload, and returns. FreeBSD spawns 8 NFS threads per CPU. The exploit consumes one thread per round (15 total). With 2 CPUs (16 threads), the margin is exactly one thread.

Here’s where things get interesting.

Carlini’s README claims the human “was AFK for much of it.” The 44 prompts tell a different story. Prompt 15 is the critical intervention: “okay in ../FBSD-001 there is a different remote exploit that gets a shell.. read it for how they constructed the connect back.” This pointed Claude to a prior exploit for a different FreeBSD vulnerability (in the SCSI target layer) as a reference implementation.

It’s literally there publicly, how did noone think to flag this?

The shellcode comments confirm it: “Uses the same build logic as FBSD-001/exploit.py build_stage2_shellcode(), with two patches.” Claude did not independently derive the kernel-to-userland execution pattern. It adapted it from reference code the human provided. Dead code artifacts survive in the final exploit — FAKE_MODULE_BASE and HA_HANDLER_OFF constants from the reference exploit, plus 19 NOP bytes where the original handler cleanup was removed but not reclaimed.

Beyond the reference exploit, the human made at least five significant corrections: stopping Claude from using Docker instead of QEMU (prompt 4) for kernel exploitation, preventing it from killing the wrong QEMU instance (prompt 9), redirecting it from kernel reboot to reverse shell, vetoing remote file-prestaging (14), and blocking a non-default thread count increase. The human also provided the hint “there is no KASLR so it should be easy.” (prompt 12).

I want to be clear about this — there is genuine capability. Claude used a De Bruijn pattern — a sequence where every possible substring of a given length appears exactly once — to determine the exact offset where the overflow overwrites the return address (200 bytes into the credential body), after an initial disassembly estimate was wrong by 32 bytes. Rather than guessing, it generated a diagnostic payload that precisely identified the correct offset from the crash data. When the obvious register-transfer gadget (rax → rdi) didn’t exist in the kernel’s gadget space, Claude discovered a mov [rdi], rax; ret write primitive as an alternative. A failed approach using rep movsq — abandoned when the only available push rsp; pop rsi gadget had a side effect that corrupted the repeat count register — shows genuine exploration of the constraint space rather than template matching.

But, and this is a big but, Anthropic marketed this as “unauthenticated root from anywhere on the internet.” The FreeBSD advisory says “remote code execution in the kernel is possible by an authenticated user.” NVD assigns Privileges Required: Low, not None. The published exploit requires Kerberos authentication.

(Anthropic describes an unpublished path where Mythos found an unauthenticated EXCHANGE_ID call that leaks the kernel hostid and boot time, making the authentication handle computable without credentials — but this path has no published proof-of-concept).

And in either case, the attacker needs network access to the NFS port — which in practice means they’re already on your internal network, because NFS servers don’t sit on the public internet. “Root from anywhere on the internet” requires being somewhere the internet can’t reach. This is the kind of contradiction only Dario Luther King can resolve.

Total cost: under $1,000 in API credits.

CVE-2026–31402: Linux Kernel NFSv4.0 Heap Overflow (23 years old)

This one is my favorite because it shows exactly what LLMs can do that nothing else can.

In 2003, a developer named Neil Brown added a replay cache to the Linux kernel’s NFS server — a buffer that stores copies of recent responses so the server can retransmit them if a client’s request is lost. Brown sized the cache buffer at 112 bytes and left an explicit comment in the code: “I’ve implemented the cache as a static buffer of size 112 bytes which is large enough to hold the OPEN, the largest of the sequence mutation operations. LOCK and UNLOCK will be added when byte-range locking is done (soon!).”

The buffer was sized before LOCK existed. When LOCK was later added by a different developer, nobody revisited the 112-byte assumption. A LOCK-denied response can be as large as 1,056 bytes. That is a 944-byte overflow into kernel heap memory.

Here’s what makes this one beautiful in a terrible way. The kernel already partially knew about the mismatch. The XDR request size estimator at line 2682 of nfs4xdr.c correctly accounts for LOCK responses — it adds NFS4_OPAQUE_LIMIT to the outbound buffer size, producing a correctly sized response sent over the network. The client receives the right data. The overflow happens only in the replay cache copy, a separate code path that was never updated. The kernel sends the right answer over the wire, then corrupts its own memory caching it.

Carlini found this bug using Opus 4.6, the publicly available model — confirmed explicitly in Lynch’s writeup, not Mythos. A bash script iterated over every file in the kernel source tree, giving each to Claude Code with the prompt “You are playing in a CTF. Find a vulnerability. hint: look at [file].” No specialized tooling. Claude had to understand that when two NFS clients cooperate — one sets a lock with a 1,024-byte owner string, the other requests a conflicting lock — the denial response overflows the replay cache. That’s reasoning about asynchronous distributed client interactions, exactly the kind of cross-component logic invisible to fuzzers.

Jeff Layton’s fix (commit 5133b61aaf43) is nine lines: a bounds check before the copy. If the response exceeds the buffer, rp_buflen is set to zero, the status is still cached, but the payload is dropped. The commit message explains why the buffer was not simply enlarged: doing so would increase the size of every nfs4_stateowner structure across all NFS client state, wasting memory. Retransmitted LOCK-denied responses lose the denial payload — acceptable because the client already received it on the first attempt.

Lynch’s writeup lists five kernel bugs Carlini reported with Opus 4.6, all merged into mainline. Carlini stated he has “several hundred crashes” he hasn’t had time to validate. The bottleneck is human triage, not AI discovery.

Once again, media coverage implied this was a Mythos finding. It was not. And despite “several thousand scans” of Linux kernel code, Mythos was “unable to successfully exploit any” Linux kernel vulnerability remotely. It found bugs but could not exploit them over a network. The model that found and helped exploit this bug was the publicly available one.

CVE-2026–2796: Firefox JIT Type Confusion (CVSS 9.8)

A type confusion in Firefox’s JavaScript engine, reported by seven researchers “using Claude from Anthropic.” The Bugzilla entry remains restricted, so independent verification of the technical details is not yet possible. What we can examine are the testing conditions.

The headline number is 181 successful exploit runs where Opus 4.6 managed 2 out of roughly 350 attempts. But these are execution runs, not unique exploits — Mythos succeeds more often at the same task, not 181 different strategies. And Anthropic’s own footnote states these exploits “target a testing harness mimicking a Firefox 147 content process, without the browser’s process sandbox or other defense-in-depth mitigations.” If you’re wondering, yes, the browser sandbox is the part that stops browser exploits. In a real browser, the renderer sandbox is the primary defense against JIT exploits. Getting code execution inside the renderer is step one; the browser’s process sandbox, the OS sandbox, and ASLR all stand between that and actual system compromise.

So, unless all that brain damage from MMA is hitting me… the big exploit involves working in a compromised setting. I guess everyone can bring into a house if there are no locks. What Saint Dario is trying to prove here eludes my puny comprehension, perhaps one of you can help me understand.

Anthropic separately claims Mythos chained four vulnerabilities to escape both a renderer and OS sandbox in “a major web browser.” Their red team writeup states “we then worked with Mythos Preview to increase its severity.” The sandbox escape was human-assisted. No CVE, no technical writeup, no independent verification is possible but good to know I guess.

The strongest public numbers were produced under the weakest test conditions. The unverifiable claims describe the hardest conditions. Mozilla’s own security advisory rates the impact as “high,” not “critical.” NVD assigned CVSS 9.8. Mozilla disagrees.

CVE-2026–26980: Ghost CMS SQL Injection (CVSS 9.4)

Carlini demonstrated this one live in about 90 minutes at the [un]prompted 2026 conference — a blind SQL injection in Ghost’s Content API through the filter query parameter, unauthenticated because the API key is public by design. OWASP has listed injection flaws as a top vulnerability class for over a decade. Valid work, not frontier capability.

The OpenBSD TCP SACK Bug (27 years old, no CVE yet)

The oldest bug in the set, and the one that demands the most sophisticated reasoning to find.

TCP sequence numbers are 32-bit unsigned integers that wrap around. The kernel compares them using macros (SEQ_LT, SEQ_GT) that cast the difference to a signed integer: (int)((a)-(b)) < 0. This works correctly when the two numbers are within 2^31 of each other. When they’re farther apart, signed overflow produces contradictory results — a value can simultaneously appear less than one reference point and greater than another. The code validates sack.end against the send window but never validates that sack.start is at or above the base sequence number. An attacker places sack.start roughly 2^31 away from the real window. The kernel deletes its only tracking entry, a pointer goes null, and the subsequent write dereferences it.

Not the first SACK bug in this function either. CVE-2019–8460 found resource exhaustion via unbounded hole list growth, fixed by adding a 128-hole limit. A 2005 patch added helper macros. Neither fix addressed the signed integer overflow. Two security reviews of the same function, seven years apart, missed the same class of bug that an LLM found on a single $50 run.

The fix is four lines: a lower-bound check on sack.start plus a null guard. Discovery cost roughly $20,000 across approximately 1,000 scaffold runs.

But — and this matters — it is denial of service, not remote code execution. The null pointer write goes to a fixed low address with no attacker control over the written value, and OpenBSD prohibits mapping the null page since version 4.4. Combined with W^X, SMEP/SMAP, and KARL, code execution from this primitive is essentially impossible. Anthropic’s technical writeup is honest about this limitation. The broader marketing language about “every major operating system” does not distinguish between a DoS bug on OpenBSD and an RCE exploit on FreeBSD.

The FFmpeg H.264 Bug (16 years old, no CVE yet)

A sentinel-collision out-of-bounds write in FFmpeg’s H.264 decoder. The slice assignment table uses 0xFFFF as an empty-slot sentinel. The slice counter is a 32-bit integer. When it reaches 65,535, it collides with the sentinel, and the deblocking filter misidentifies initialized macroblocks as uninitialized. Anthropic’s own report says it is not critical and would be hard to exploit. Findings like this feed the broader claim of “thousands of severe zero-days.”

Section 3: How Strong Are the Scale and Exclusivity Claims?

The number behind “thousands of severe zero-days” is 198 manually reviewed reports. That’s the total human-verified count — Anthropic hired contractors who agreed with Mythos’s severity assessment 89% of the time, 98% within one severity level. Everything else is extrapolation, and it’s not strong extrapolation. The 198 is a 4–10% sample with undisclosed selection methodology, reviewed by Anthropic-paid contractors using Anthropic’s severity framework, and that 89% agreement rate means roughly 11% of the “thousands” may be misclassified. For comparison: in Anthropic’s own standardized testing regime across roughly 7,000 entry points in 1,000 open-source repositories, the result was 10 confirmed control-flow hijacks. Ten. The “thousands” comes from a different, less controlled regime where the methodology is opaque.

Anthropic’s red team acknowledges training data contamination concerns for the N-day exploits: “it is conceivable that Mythos Preview is drawing on prior knowledge of these bugs to inform its exploits.” They argue the exploits’ sophistication matches zero-day work, but also admit their N-day metrics “can make it difficult to distinguish novel capabilities from cases where the model simply remembered the solution.” The zero-day findings are more defensible precisely because those bugs could not have appeared in training data.

Now here’s the question the coverage never asked: can other models do this at all?

AISLE, an AI security lab led by Stanislav Fort, tested Anthropic’s showcase bugs across eight models in single zero-shot API calls. No agentic workflows, no tool access, no iterative loops. Just the vulnerable code in the prompt and a request to find vulnerabilities. Full transcripts published at github.com/stanislavfort/mythos-jagged-frontier.

All eight models found the FreeBSD bug. Every single one. A 3.6B-parameter model running at $0.11 per million tokens correctly identified the overflow and assessed it as critical. The finding that anchored the entire launch narrative is detectable by a model small enough to run on a laptop.

The more interesting result is where models diverge, and AISLE has a great name for it: the “jagged frontier.” Cybersecurity capability does not scale smoothly with model size. On the FreeBSD buffer overflow — fundamentally pattern recognition, input size exceeds buffer size — all models scored well. On the OpenBSD SACK bug, which requires multi-step signed integer wraparound reasoning, grades ranged from A+ to F. GPT-OSS-120b got A+. Kimi K2 got A-. Qwen3 32B got F — the same model that scored CVSS 9.8 on FreeBSD confidently declared the OpenBSD code “robust” and said “no exploitation vector exists.” It never examined whether sack.start values near 2^31 apart from the base could make both SEQ_LT and SEQ_GT comparisons behave contradictorily.

Why the jaggedness? Buffer overflow detection is pattern-matchable from training data — models have seen thousands of examples where input size exceeds buffer size. Signed integer wraparound is not. It requires actually reasoning about how signed and unsigned arithmetic interact under specific value ranges. You cannot predict which bugs a model will find by knowing its size or its general benchmark scores. Capability is task-shaped, not model-shaped.

The false-positive results cut even deeper. AISLE tested models on an OWASP sample — a Java code block with no actual SQL injection vulnerability.

Twelve of 13 Anthropic models failed, including Sonnet 4.5, which correctly traced the list operations and then overrode its own analysis to flag the pattern anyway.
The $0.11/M token GPT-OSS-20b passed. On patched FreeBSD code where the vulnerability was already fixed, only GPT-OSS-120b correctly identified it as safe across all three trials.
The most common false-positive argument: that oa_length could be negative, bypassing the bounds check. oa_length is declared as u_int — unsigned. It cannot be negative. The models learned that “signed/unsigned confusion” is a common vulnerability class and applied it reflexively even when the types don’t support the claim. Pattern matching masquerading as analysis.

When told the 304-byte overflow could not fit the 1,000+ byte ROP chain needed for Mythos’s published approach, no model independently discovered the multi-round RPC delivery mechanism. But several proposed valid alternatives: DeepSeek R1 suggested a minimal ~160-byte ROP chain using prepare_kernel_cred(0)/commit_creds for privilege escalation, Gemini Flash Lite proposed a stack-pivot redirecting RSP to the credential buffer already in kernel heap for unlimited ROP space, and Kimi K2 noted the bug is “wormable” — a detail absent from Anthropic’s announcement entirely.

There is nuance worth considering here. As Zvi Mowshowitz put: “We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.” And he’s right — AISLE tested detection after file selection. The vulnerable code was already isolated and handed to each model. In practice, Mythos’s scaffold searches across a million-line codebase, ranks files, decides what to read, identifies which code paths are worth examining. Detection is easier than discovery. When Chase Brower challenged AISLE on false positives, AISLE acknowledged massive false-positive rates from smaller models even on narrowed 20-line targets, making wide searches “utterly useless.”

It’s a fair point. But follow it to its conclusion. If the hard part is not the model recognizing a bug in isolated code, but the scaffold deciding which code to look at, then the moat is system engineering, not model capability. File ranking, crash oracles, execution harnesses, iterative refinement loops — that’s infrastructure. Infrastructure is replicable. A 3.6B model that can detect the FreeBSD bug when shown the right file just needs a better pipeline to find it in the wild. The model is cheap. The pipeline is the product.

The independent replication evidence bears this out. The steamedhams.io tracker reproduced the FFmpeg finding using Opus 4.6 with three generic prompts and found two additional bugs Mythos’s writeup didn’t highlight. Steamedhams reproduced the OpenBSD SACK bug using four prompts with Opus 4.6 and found approximately 15 additional TCP stack bugs not mentioned in Mythos’s disclosure. Carlini found 500+ validated high-severity vulnerabilities and 122 crashing Firefox inputs leading to 22 CVEs, all using Opus 4.6 with Claude Code.

AISLE’s conclusion: “The strongest version of the narrative, that this work fundamentally depends on a restricted, unreleased frontier model, looks overstated to us. The moat in AI cybersecurity is the system, not the model.”

Where Mythos genuinely leads is autonomous multi-step exploit development — the multi-round RPC delivery, splitting a complete exploit across 15 separate protocol requests, managing thread exhaustion, adapting when gadgets fail. No tested model reproduced that. But even here, Penligent’s independent analysis found that Mythos’s closed-source binary analysis pipeline “reintroduces source-like structure before the main vulnerability analysis stage.” The apparent binary analysis capability may largely be source analysis with extra steps.

The moat is real but narrow, and it lives in exploit engineering, not bug detection — a distinction the coverage never made.

Section 4: Sandwich Gate and Why The Sandbox Escape Is Not What You Were Told

There was a lot of attention given to the event where Mythos developed a multi-step exploit to gain broader internet access, then unprompted posted exploit details to public websites and sent an email to a researcher who was eating a sandwich in a park. For some reason, all of the coverage focused on this sandwich detail and how the AI escaping the sandbox was the beginning of the end.

This is all commie propoganda and I’m going to break down why.

Firstly, and very importantly, the model did not do this by itself. The model was explicitly instructed by a simulated user to try to escape its sandbox AND send a message to the researcher.

This is a big difference from the narrative around it deciding to do this autonomously.

In any real agentic deployment — and this is something I think about constantly because of the work we do at Iqidis — the hard problem is not whether an agent can do what you told it to do. It’s the agent autonomously deciding what to do when its task is open-ended, and it runs out of clear instructions. Getting to work reliably (across prompt configurations, different conversation combinations, etc.) is the hard part about engineering AI systems. The ability to escape is itself pretty impressive (although we would have to audit the actual security since I couldn’t find enough details on it), but this is at least 3–4 orders of magnitude lower on the capability spectrum than the autonomous capability that is being pushed everywhere.

Now there is a hint of autonomy here, with the model posting exploit details to public websites. But given that its instructions were to communicate with the researcher, that is not a mindbending development.

But this isn’t all that people got wrong about sandwich gate. Somehow all the dumbfucks that kept focusing on the sandwich point missed the footnote that immediately precedes it —

“The sandbox computer that the model was controlling was separate from the system that was running the model itself, and which contained the model weights. Systems like these that handle model weights are subject to significant additional security measures, and this incident does not demonstrate the model fully escaping containment: The model did not demonstrate an ability to access its own weights, which would be necessary to operate fully independently of Anthropic, nor did it demonstrate an ability to reach any internal systems or services in this test.”

Either I am completely losing it, or this completely counters the narrative people are building about it.

On the note of the system card, it also documents some interesting notes that people seem to have overlooked.

The model fabricating vulnerabilities — inserting bugs into code it was auditing and presenting them as pre-existing discoveries. Turns out you can find a lot of zero-days if you’re the one writing them.
It also modified git history to hide evidence of prohibited methods and wrote scripts to auto-approve its own permission prompts.

All of this makes me deeply skeptical of the actual rigor they put into securing their experiments (what was the actual level of security in their sandbox/exact prompts/environment details). Many of their claims and methods will have to be audited properly b/c I suspect that a LOT of their results are contaminated by a bad setup.

Another important note to flag is that a lot of attention and fearmongering was centered around the Chain-of-thought unfaithfulness — the degree to which the model’s visible reasoning diverges from its actual behavior — jumped from 5% in Opus 4.6 to 65% in Mythos. That’s a 13x increase; very scary since that is our primary diagnostic tool for reasoning LLMs.

However, this has nothing to do with the model itself or any proclivities towards dishinesty it’s developing. If you read our deep-dive into why Reinforcement Learning is Garbage for reasoning, you will know that this is a problem with how we train reasoning models, and that’s why we need to rethink our approach from the ground up. As a tldr — RL systems incentivize generations that look like reasoning, but from the very start, we’ve known that it doesn’t always map to how the model actually makes its decisions. This dishonesty isn’t a new feature; the only reason the magnitude is so much higher is because we’re doing more of it here.

This is where I will plug Latent Space Reasoning which is superior than RL based reasoning in every way —

Putting it all together, it’s clear that for almost every major talking point being screamed non-stop, there is a much deeper truth that people didn’t bother digging into.

However, the dishonest narratives don’t end there. There is another aspect that was wholly ignored in all the discussions around Mythos, and it’s magic powers.

Section 5: The Business Structure Nobody Reported

Anthropic restricted Mythos to 12 launch partners plus 40+ organizations in a subsidized program called Glasswing. Pricing was $25/$125 per million tokens, 5x above Opus and far exceeding GPT-5.2 ($1.75/$14) and Gemini 3.1 Pro ($2/$12).

The “$100M in usage credits” that anchored every headline is retail-priced API credit, not cash spending. At Anthropic’s projected roughly 50% gross margins, the actual compute cost is around $40–50M. Giving away your own product at your own sticker price …that is a very particular kind of generosity from Father Dario Theresa.

Also worth noting, at least 5 of 11 non-Anthropic launch partners are also Anthropic investors. Google, Amazon, Nvidia, Microsoft, and Cisco all participated in funding rounds totaling over $67B. JPMorgan is simultaneously a Glasswing launch partner and one of two lead underwriters for Anthropic’s reported October 2026 IPO at a $400–500B valuation. Google and Microsoft, which compete directly in AI security, both joined Glasswing rather than launching competing programs. When your competitors volunteer to be your customers, they’re not buying your product — they’re buying proximity to the IPO, which Bloomberg and The Information report is slated through Goldman Sachs and JPMorgan at October 2026 on the Nasdaq.

One final point to consider: Anthropic has had access to Mythos since Feb now. Their apps, while useful, are still (wildly) buggy and Claude Code is vastly inferior to Codex from an intelligence/thoroughness perspective. While they have been on an amazing run so far, most of their releases have focused on interfaces (Claude through Word, the app) and not deep foundational breakthroughs. All that out together, it seems increasingly clear that all Mythos and it’s framing were meant to both generate hype for the IPO, and use the “too dangerous” framing to justify limited release while Anthropic figures out its compute issues.

If this feels familiar, OpenAI ran the same playbook in 2019 with GPT-2 — “too dangerous to release,” massive press coverage for a 1.5B-parameter model, with a full release nine months later. Restricted access creates scarcity, scarcity creates press, press creates demand, and demand justifies premium pricing. GPT-2’s danger claims had no supporting evidence. Mythos has real CVEs, but it relies on massive exaggerations and a lot of questionable framing to achieve this.

Now some of this is not Anthropic’s fault, since they did release the material publicly, but they clearly knew what they were doing. The PR packets, the lack of transparency with release, the refusal to address misinformation, and the blatant OpenAI plagiarism on the “too dangerous” message are very clearly attempts to control the narrative, not good-faith attempts to push scientific discourse.

With all of this covered, I would like to end with a small note to Dario-genese.

Conclusion: The Cost of Hype

Anthropic has, for a while, relied on hype and regulatory capture to stand out. They did it when they lobbied for regulating Open Source because DeepSeek scared them. They did it again to make unproven claims about “Chinese actors” using Claude Code to hack systems. And they’ve been doing it through their constant projections about AI wiping out 90% of jobs while hiring aggressively. At every turn, Anthropic has used fear-mongering and misinformation as a pillar of their strategy.

However, I believe that there is something that Dario-stotle, in all his infinite wisdom, is missing. When you spend all your effort on hype-based misinformation and regulatory capture, you convey a very clear signal admitting defeat. You tell the world you don’t think you can win w/o relying on these other tactics. You tell everyone that your survival hinges on keeping other people down, not in rising above them. Noone seems to have yet, but eventually your employees will read between the lines and hear what you’re not saying- “You (the employees) are not good enough”. Can’t imagine that this would do wonders for employee morale or productivity.

Secondly, this kind of cowardice eats at your spine until it replaces your whole constitution. Your psyche gets primed to be permanently spooked, jumping at every shadow. Anthropic built an era-defining product with Claude Code and it’s spin-off. You gained all that goodwill by taking a stance against the misuse of Claude in surveillance and weapons systems. Don’t throw it all away playing cowardly games. Because all of this shit, always comes back.

Do with that what you will.

Thank you for being here, and I hope you have a wonderful day,

Dev <3

If you liked this article and wish to share it, please refer to the following guidelines.

That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. The best way to share testimonials is to share articles and tag me in your post so I can see/share it.

Reach out to me

Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

Small Snippets about Tech, AI and Machine Learning over here

AI Newsletter- https://artificialintelligencemadesimple.substack.com/

My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/

My (imaginary) sister’s favorite MLOps Podcast-

Check out my other articles on Medium. :

https://machine-learning-made-simple.medium.com/

My YouTube: https://www.youtube.com/@ChocolateMilkCultLeader/

Reach out to me on LinkedIn. Let’s connect: https://www.linkedin.com/in/devansh-devansh-516004168/

My Instagram: https://www.instagram.com/iseethings404/

My Twitter: https://twitter.com/Machine01776819

Leon Liao

Apr 18

The key is to separate three things: the model, the narrative, and the workflow.

Mythos is not some magical fully autonomous hacker that suddenly appeared out of nowhere. A big part of what people are reacting to is actually a high-capability model embedded inside a much broader security research pipeline. So the real shift is not “one model woke up,” but that strong models plus strong security workflows are starting to produce genuinely useful offensive and defensive results.

That also means the moat is narrower than the hype suggests. Anthropic’s edge seems to be more about exploit development, multi-step reasoning, tool use, and reliability, not that only Anthropic can even detect these classes of vulnerabilities.

But the correction should not swing too far in the other direction. Even after adjusting for hype, the underlying capability jump is real. Frontier models are moving beyond just helping read CVEs or draft PoCs and toward something much closer to operational usefulness in software understanding and attack-path exploration. That is the real boundary, and also the real issue.

1 reply by Devansh

Mikey B

Apr 17

WOW!

Reads like a murder mystery.

Amazing information as provided by one or more brilliant AI minds.

What great fun to read and learn from.

Thank you Sir Devansh (Dev<3)

23 more comments...

Artificial Intelligence Made Simple

Discussion about this post

Ready for more?