It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction. We run on a “pay what you can” model—so if you believe in the mission, there’s likely a plan that fits (over here).
Every subscription helps me stay independent, avoid clickbait, and focus on depth over noise, and I deeply appreciate everyone who chooses to support our cult.
PS – Supporting this work doesn’t have to come out of your pocket. If you read this as part of your professional development, you can use this email template to request reimbursement for your subscription.
Every month, the Chocolate Milk Cult reaches over a million Builders, Investors, Policy Makers, Leaders, and more. If you’d like to meet other members of our community, please fill out this contact form here (I will never sell your data nor will I make intros w/o your explicit permission)- https://forms.gle/Pi1pGLuS1FmzXoLr6
Thanks to everyone for showing up the live-stream. Mark your calendars for 8 PM EST, Sundays, to make sure you can come in live and ask questions.
Bring your moms and grandmoms into my cult.
Community Spotlight:
If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.
Additional Recommendations (not in Livestream)
HSMAI Foundation Case Study: How Generative AI is Reshaping Executive Hiring in Hospitality.
“Cache-to-Cache: Direct Semantic Communication Between Large Language Models”: “Multi-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains unattainable by a single model. In existing designs, LLMs communicate through text, forcing internal representations to be transformed into output token sequences. This process both loses rich semantic information and incurs token-by-token generation latency. Motivated by these limitations, we ask: Can LLMs communicate beyond text? Oracle experiments show that enriching the KV-Cache semantics can improve response quality without increasing cache size, supporting KV-Cache as an effective medium for inter-model communication. Thus, we propose Cache-to-Cache (C2C), a new paradigm for direct semantic communication between LLMs. C2C uses a neural network to project and fuse the source model’s KV-cache with that of the target model to enable direct semantic transfer. A learnable gating mechanism selects the target layers that benefit from cache communication. Compared with text communication, C2C utilizes the deep, specialized semantics from both models, while avoiding explicit intermediate text generation. Experiments show that C2C achieves 8.5-10.5% higher average accuracy than individual models. It further outperforms the text communication paradigm by approximately 3.0-5.0%, while delivering an average 2.0x speedup in latency. Our code is available at this https URL.”
we need to talk about the Phil Foden situation...: good conversation on the media biases and how it messes up players + how race plays a factor.
The Most Dangerous Myth in Medicine: The Average Patient by
Making Software for Your Non-Tech Small Business by
.
Companion Guide to the Livestream
This guide expands the core ideas and structures them for deeper reflection. Watch the full stream for tone, nuance, and side-commentary.
1. Gemini 3 Pro — Good, Not Great
The Event — Google released Gemini 3 Pro with deep think mode, expanded agentic capabilities through Antigravity, and benchmark results that had the usual suspects declaring victory. The model is fast, handles multimodal inputs well, and appears to incorporate diffusion-based architectures somewhere in the pipeline.
Why it matters less than you think — The benchmarks look impressive. The actual experience is mid. In direct testing, Gemini 3 Pro fails to consistently outperform Gemini 2.5 Pro across use cases that matter. The suggestions are reasonable, the insights occasionally useful, but nothing hits out of the park. Compare this to Claude 4.5 Sonnet’s release, which felt like a genuine step function, or even GPT 5.0’s initial promise before the instruction-following collapse.
The diffusion angle — If Gemini 3 does incorporate diffusion-based text generation, that’s genuinely exciting—not for what the model does today, but for what it signals about architectural diversity. Google has been researching text diffusion for years, and a production deployment would validate an entire research direction. This is similar to how Kimi’s release earlier this year mattered more as a signal (muon activations, synthetic data pipelines) than as a standalone product. Gemini 3 may be the same: a proof of concept dressed as a flagship.
Insight — The model itself is a B+. The architecture underneath might be an A. Don’t confuse the two.
Read more—
Google’s Nano Banana is the start of a Massive AI Trend [Markets]
It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction.
2. GPT 5.1 — The Instruction-Following Collapse
The Event — OpenAI shipped GPT 5.1 with improved base capabilities and a new Codex Max tier for agentic coding. Initial impressions suggested the model was competitive with Claude and Gemini on raw intelligence. Then people started using it for production workloads.
What went wrong — GPT’s core value proposition was never intelligence. It was reliability. You could dump a system prompt with fifteen tool definitions, specify exact calling conventions, and GPT would follow them consistently. The same instructions, the same outputs, every time. That made it the default choice for agentic orchestration—not because it was smart, but because it was predictable.
GPT 5.1 broke this. The base model improved, sometimes matching or exceeding Claude and Gemini on suggestions and recommendations. But the instruction-following degraded catastrophically. OpenAI apparently tried to compete on vibes and forgot why people were actually using their API.
Strategic implications — This is a self-inflicted wound. The reliability moat was real and defensible. Now developers have to choose between a model that’s smart but unreliable (GPT 5.1) or one that’s smart and follows instructions (Claude). That’s not a hard choice. OpenAI may have permanently ceded the agentic orchestration market by chasing the wrong benchmark.
Insight — Intelligence without reliability is a demo. Reliability without intelligence is a product. OpenAI chose wrong.
How to Build Agentic AI 2 (with frameworks) [Agents]
It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction.
3. The Pre-Training Hierarchy — What the Labs Won’t Tell You
The Claim — Gemini has the best pre-training of the major labs. OpenAI has the worst. Claude sits in the middle. This isn’t speculation—it’s testable.
The Test — Give any model a PDF and ask it to reproduce a chapter word-for-word. Specify exact fidelity. No paraphrasing, no summarization, just print the text. GPT will fail. It will capture 100% of the concepts and ideas but lose the precise wording. Gemini will get much closer. The reason is compression loss during the embedding process—GPT’s pre-training introduces more information loss when converting text to latent representations.
Why this matters — Pre-training quality determines the ceiling for everything else. Post-training (RLHF, instruction tuning, tool use) can only refine what the base model already understands. OpenAI has compensated with aggressive post-training recipes, which is why their models often feel polished despite the weaker foundation. But you can’t post-train your way out of fundamental compression artifacts.
This also explains GPT 5.1’s multimodal instability. When you feed it images, the encoding inconsistencies compound. The model becomes unpredictable because the foundation is lossy.
Insight — Google has the best pre-training. OpenAI has the best post-training. Anthropic is trying to win on alignment. The question is which layer matters most as models commoditize.
4. Google’s Internal Fragmentation — Why Three Products Exist Where One Should
The Event — Google released Antigravity (agentic platform), Jules (coding agent), and Gemini CLI within weeks of each other. All three occupy overlapping territory. None of them integrate cleanly.
The backstory — Google has been in a slow-motion leadership crisis for over a year. Internal factions have been positioning for influence, with some senior leaders openly questioning whether Sundar Pichai should remain CEO. This isn’t gossip—it’s affecting product strategy in visible ways.
Antigravity isn’t competing with Cursor or Claude Code. It’s competing with Notebook LM. The Notebook LM team gained significant internal influence after their viral success, and now other teams are scrambling to ship agentic products that can claim similar wins. Jules and the CLI team have their own incentives. The result is three products that should be one platform, built by teams that may be actively rooting for each other’s failure.
Why this matters — Google’s technical research remains best-in-class. Their pre-training is superior, their infrastructure is unmatched, their talent density is extraordinary. But none of that matters if the organization can’t ship coherent products. The internal power dynamics are now visible in the product portfolio, and that’s a serious problem.
Insight — Google’s biggest competitor isn’t OpenAI or Anthropic. It’s Google.
5. Anthropic’s Security Theater — The Chinese Cyberattack That Wasn’t
The Event — Anthropic publicly claimed that Claude Code was used in an attempted AI-orchestrated cyberattack by a Chinese state group, which they heroically detected and stopped. Dario Amodei positioned this as evidence of both the danger of advanced AI and Anthropic’s responsible stewardship.
Why this is bullshit — Anthropic has a documented history of narrative engineering. They funded protesters to amplify x-risk fears. They lobbied aggressively for export controls that would disadvantage Chinese competitors. When DeepSeek demonstrated that frontier capabilities don’t require frontier compute, Anthropic’s strategic position weakened considerably. Manus AI—a Chinese product—is now widely considered the leading agentic system. Open-source deployment data shows Chinese models gaining significant share.
The “cyberattack” story arrives at a suspiciously convenient moment. It reinforces the China-threat narrative that justifies export controls. It positions Anthropic as a responsible partner for government contracts. It creates fear that benefits closed-source incumbents over open-source alternatives. The pattern matches previous Anthropic PR operations perfectly.
What’s actually happening — Anthropic is scared. Their moat was supposed to be safety and alignment, but that’s hard to monetize when open-source models are catching up on capabilities. The pivot to government/defense positioning requires a threat narrative, and “Chinese state hackers using AI” is the perfect story. Whether there was any actual attack, whether it was sophisticated or trivial, whether Anthropic’s response was meaningful or routine—none of that matters. The PR value is in the claim itself.
Insight — When a company’s strategic position depends on fear, be skeptical of the fears they amplify.
6. Infrastructure Developments — Skepticism Required
XAI + Saudi Arabia — Elon Musk announced a 500MW data center buildout in Saudi Arabia with nationwide Grok deployment. The scale is impressive on paper. The execution probability is low. Both parties have extensive histories of announcing mega-projects that never materialize. Saudi Arabia in particular has a graveyard of failed infrastructure initiatives that ignored basic logistics—like water availability for cooling in a desert. File this under “interesting if true, probably not true.”
D-Matrix Raises $275M for In-Memory Inference — Digital in-memory computing promises faster, more energy-efficient inference by eliminating data movement between memory and compute. The pitch is compelling. The question is total cost of ownership. In-memory chips require specialized manufacturing at scale. If the production costs are high enough, the inference savings may not justify the capital expenditure. The industry tends to quote inference cost per token while ignoring amortized development and manufacturing costs. Skepticism is warranted until we see full economic breakdowns.
NVIDIA Enters Quantum Research — NVIDIA announced a quantum computing partnership with Japan’s Riken institute. The strategic logic is diversification beyond GPUs, but the technical fit is unclear. GPUs excel at parallel computation; quantum computing’s bottlenecks are elsewhere. This may be more about narrative positioning than actual research synergy.
Insight — Infrastructure announcements are cheap. Deployed infrastructure is expensive. Discount accordingly.
7. Platform and Tooling Updates — The Agentic Stack Shakes Out
Codex Max (GPT 5.1) — Early reports suggest solid performance for coding tasks. Worth monitoring, especially given GPT’s instruction-following issues in other contexts. The jury is still out on whether the coding-specific fine-tuning compensates for base model instability.
Claude Code — Remains excellent despite Anthropic’s broader issues. The Sonnet 4.5 backbone is genuinely good for development work. The subscription is worth the money for serious users.
Augment Code — Currently the best option for developers who want agentic coding assistance. Less hype than Cursor, better actual performance.
Cursor — Raised significant funding for an “agentic coding OS.” The product remains underwhelming relative to the valuation. Expect security vulnerabilities as they scale—the architecture has visible gaps that will become exploitable under pressure.
Antigravity — Buggy. Follows instructions but weak as an editor. The Google fragmentation problem discussed above applies directly here.
Insight — The agentic coding market is consolidating around Claude Code and Augment for quality, Codex for OpenAI loyalists, and Cursor for people who read TechCrunch. Choose accordingly.
8. Regulatory Landscape — Motion Without Movement
US AI Litigation Task Force — Blocked. Insufficient political support for dedicated AI enforcement infrastructure.
EU AI Act Rollback — The enforcement mechanisms are being softened before they’re even implemented. The pattern is familiar: ambitious regulation followed by quiet accommodation of industry concerns.
UN AI Red Lines Initiative — Proposes prohibitions on dangerous AI applications by 2026. The signatories include organizations funded by defense contractors actively deploying AI for surveillance and weapons systems. The cynicism writes itself.
Insight — Regulation follows power, not principles. Watch what gets funded, not what gets signed.
9. Apple-Google Siri Partnership — The Dual-Stack Future
The Event — Apple confirmed that Siri will use Google’s Gemini for complex queries, with Apple retaining all user data in their private cloud infrastructure. Simple tasks remain on-device; complex reasoning goes to the cloud.
Why it matters — This is textbook LLM architecture: route cheap queries to cheap compute, expensive queries to expensive compute. The interesting question is handoff logic. How does Siri decide what’s “complex”? The seams between on-device and cloud processing will determine user experience. Get it wrong and you have jarring latency spikes. Get it right and you have an assistant that feels seamlessly intelligent.
Strategic context — Apple gets frontier capabilities without building frontier models. Google gets distribution into the most valuable device ecosystem on the planet. Both companies avoid direct competition while strengthening their respective moats. This is the kind of deal that makes sense for everyone except the companies not invited to the table.
Insight — The future of AI deployment is hybrid. On-device for latency and privacy, cloud for capability. Whoever solves the routing problem elegantly wins the interface layer.
Subscribe to support AI Made Simple and help us deliver more quality information to you-

Flexible pricing available—pay what matches your budget here.
Thank you for being here, and I hope you have a wonderful day.
Dev <3
If you liked this article and wish to share it, please refer to the following guidelines.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. The best way to share testimonials is to share articles and tag me in your post so I can see/share it.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
My (imaginary) sister’s favorite MLOps Podcast-
Check out my other articles on Medium. :
https://machine-learning-made-simple.medium.com/
My YouTube: https://www.youtube.com/@ChocolateMilkCultLeader/
Reach out to me on LinkedIn. Let’s connect: https://www.linkedin.com/in/devansh-devansh-516004168/
My Instagram: https://www.instagram.com/iseethings404/
My Twitter: https://twitter.com/Machine01776819


![Google’s Nano Banana is the start of a Massive AI Trend [Markets]](https://substackcdn.com/image/fetch/$s_!Z3wD!,w_1300,h_650,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20b67c9-d32e-4a16-8099-4de9af7e3724_477x498.gif)
![How to Build Agentic AI 2 (with frameworks) [Agents]](https://substackcdn.com/image/fetch/$s_!Hquj!,w_1300,h_650,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a8dd5a-5e4c-4bbb-a3e4-8450777bffa1_780x790.png)







