0:00
/

Why Legal AI Hallucinations Are Three Different Problems, And Most Tools Only Catch One

It takes time to create work that’s clear, independent, and genuinely useful. If you’ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction. We run on a “pay what you can” model—so if you believe in the mission, there’s likely a plan that fits (over here).

Every subscription helps me stay independent, avoid clickbait, and focus on depth over noise, and I deeply appreciate everyone who chooses to support our cult.

Help me buy chocolate milk

PS – Supporting this work doesn’t have to come out of your pocket. If you read this as part of your professional development, you can use this email template to request reimbursement for your subscription.

Every month, the Chocolate Milk Cult reaches over a million Builders, Investors, Policy Makers, Leaders, and more. If you’d like to meet other members of our community, please fill out this contact form here (I will never sell your data nor will I make intros w/o your explicit permission)- https://forms.gle/Pi1pGLuS1FmzXoLr6


I spoke to Ryan Estes about Legal AI, Open Source Research, and why access to Legal Services needs to be more accessible over here. The conversation was very well received, so I’m sharing it here with Ryan’s permission.

I hope you enjoy it.

Companion Guide to the Livestream:

This guide expands the core ideas and structures them for deeper reflection. Watch the full stream for tone, nuance, and side-commentary.

1. The Three Hallucinations Hiding Under One Word

The Event — I broke hallucinations into three categories on the stream. Category one: the AI makes up a case that doesn’t exist. Category two: the case exists and the quote is real, but it’s from the wrong jurisdiction or doesn’t apply to your domain. Category three: the AI gives you an argument that looks correct on its own, but somewhere else in your documents there’s something that contradicts it, and the system never connected the two.

Why this matters — Everyone talks about category one because it’s the most obvious. A lawyer cites a fake case, gets sanctioned, it makes the news. But category one is also the easiest to fix. You just check whether the case exists. A basic validator catches almost all of it. Categories two and three are where the real damage happens, and they’re much harder to catch. The lawyer reads the brief, everything looks right, the citation is real, the quote checks out. They file. Then opposing counsel tears them apart because the cited authority was overturned in their jurisdiction, or because a deposition transcript on page 723 contradicts the whole argument.

Most legal AI tools are RAG wrappers. You upload documents, the system cuts them into chunks, turns the chunks into vectors, and when you ask a question it finds the chunks that look most similar to your question. This works fine for simple retrieval like “find the indemnification clause.” It does not work for “is this argument actually supported across all my documents.” Cosine similarity doesn’t know what a contradiction is. It doesn’t know about jurisdictions or whether a ruling is still valid. Two laws from different states will sit right next to each other in vector space even if they say opposite things. And “document 47 invalidates the claim in document 12” is a logical relationship that vector search can’t represent at all.

Every category-two and category-three hallucination is a potential malpractice claim. The tool makes you faster, you trust it, you file work that has buried contradictions, and the first time it costs a client real money your insurance situation changes permanently. The speed improvement means nothing if the work product carries hidden liability. This is why legal AI has to move to architectures that handle context across documents natively. The wrappers will be fine for boilerplate. They’ll fail at everything that actually matters.

2. Why Irys Doesn’t Wrap, It Rebuilds

The Event — Ryan asked whether Irys’s “infinite context” works the same way Harvey’s does. It doesn’t. Harvey chunks your documents and uses vector similarity to find relevant pieces. Irys builds a knowledge graph that links entities, propositions, assertions, and contradictions across all your documents, and updates that graph every time you add a new file. When you ask a question, the system walks the graph instead of doing similarity search.

Why this matters — Vector search became the default approach to “my documents don’t fit in the context window” because it was cheap and it worked for simple use cases. Then people treated it as a permanent solution. It was always limited. Chunks are independent of each other. There’s no cross-document reasoning. There’s no way to bind a claim in one document to evidence in another. None of this mattered when the use case was “summarize this PDF.” It matters a lot when the use case is “build me a litigation strategy across 100,000 pages.” I was writing about these structural limits in 2022, before RAG was even a common term.

The fix isn’t bigger context windows or better embeddings. The fix is a system that explicitly tracks entities like parties, dates, jurisdictions, and claims, that maintains contradiction edges between propositions, and that reorganizes itself when new documents arrive. That’s how you catch “document 47 contradicts document 12” before the LLM ever starts drafting. That’s what Irys does. And this is why the wrappers can’t close the gap by adding features. Their entire stack assumes chunks are independent and retrieval is similarity. To move to graph-native context, they’d have to throw it all away and start over.

3. Why GitHub Copilot Couldn’t Become Cursor, And Cursor Couldn’t Become Claude Code

The Event — Ryan asked what stops someone from copying Irys. This is the answer Harvard Business School reached out about for one of their courses. Software doesn’t just have features. It has assumptions baked into it about what the user controls, what the AI controls, where data lives, what gets automated, and what gets left to human judgment. GitHub Copilot had Microsoft’s distribution and money. It couldn’t become Cursor. Cursor had the developer tool category to itself. It couldn’t become Claude Code. Each product was built around a different set of assumptions, and those assumptions aren’t portable. They’re in every layer of the code and they compound with every release.

Why this matters — When founders talk about moats they usually talk about data, brand, and distribution. Those are visible from the outside. Architectural assumptions are not. If you build assuming the AI is an autocomplete assistant, you get Copilot. If you build assuming the AI is an autonomous agent that the user occasionally interrupts, you get Claude Code. The surface features can look the same but the products are completely different underneath. You can’t retrofit one into the other because every API, every state machine, every piece of the UX was built on the original assumption. Unwinding it costs more than starting over.

This is the moat for Irys. We assumed in 2022 that vector search would not be enough for legal context. So every layer of the system was built around graph-based context aggregation. Harvey assumed RAG was enough. They have a year of rebuilding before they can even start the conversation we finished three years ago. And by the time they’re done, we’ve shipped two more iterations on top.

There’s a second moat on the product side. The more you use Irys, the more it learns how you work. It doesn’t train on your data. But it learns which argument structures you prefer, what memo formats you use, how you weigh jurisdictions. All of that accumulates in your private workspace. If you leave, you lose all of that embedded knowledge and have to rebuild it somewhere else from scratch.

4. Why Open Source Is Cheaper Than Marketing

The Event — Irys is free to sign up. We’ve open-sourced major pieces of our reasoning infrastructure, including a lightweight version of the latent space reasoning engine. By normal startup logic, this makes no sense. We have a defensible product, real funding, well-funded competitors, and we’re giving away the technical work for free.

Why this matters — The usual way to think about open source is as a cost. Every piece of IP you publish saves your competitor some R&D time. That’s true if the game is static. In a fast-moving technical field, the thing that actually matters is who has access to information about what’s coming next. Who’s in the room when the labs are deciding the next generation of model capabilities. Who knows what’s shipping in six months.

Published work is what buys access to those rooms. NVIDIA’s senior engineers don’t take meetings with random startups. They take meetings with people whose technical work they’ve already seen. DeepMind doesn’t share roadmap previews with companies that haven’t contributed anything back to the field. The newsletter and the open-source reasoning engine aren’t marketing. They’re what get us the partnerships with the labs. That’s how we know what’s coming before it’s announced, and that’s how we make architecture decisions that our competitors don’t know they need to make yet.

The competitor who copies our open source saves maybe a quarter of engineering time. But they’re still nine months behind on the decisions that actually matter because they don’t have the relationships that tell them where the field is going. We didn’t lose value by publishing. We traded code for visibility into the technical horizon.

There’s also a recruiting benefit. The engineers who read the reasoning paper, run the lightweight implementation, and reach out about it are exactly the kind of people you can’t find through normal hiring channels. You can’t buy that pipeline. You can only earn it.

5. The Newsletter Is Peer Recruitment, Not Customer Acquisition

The Event — Ryan assumed the Chocolate Milk Cult newsletter (250,000+ subscribers, about 1.5M monthly reach) was the lead generation engine for Irys. I corrected him. Lawyers don’t read deep dives on quantization math or GPU pricing curves. The newsletter doesn’t sell legal seats. It serves a completely different purpose.

Why this matters — Most founder content advice treats your audience as a sales channel. That works when your audience and your customer are the same person. A fitness creator selling a fitness app. A finance creator selling a finance tool. When your audience is a different group from your buyer, the sales-channel framing leads you to measure the wrong things. You track conversion and click-through, you report the wrong wins, and eventually you conclude the audience isn’t worth the effort because the numbers don’t show a return.

What the audience actually does, when it’s decoupled from the customer base, is give you information you can’t get any other way. Lab researchers DM me about unpublished work. Engineers send me preprints. Investors share data they wouldn’t put in a pitch deck. None of that shows up as a Substack metric, but all of it changes what Irys ships next quarter.

The point for founders is simple. Be honest about who your audience is and what they’re actually for. If you’re building vertical SaaS for accountants, an audience of ML researchers won’t sell seats. But it might give you a technical edge that makes the product worth buying when your actual sales channel starts working. The two things serve different purposes and you can’t optimize both with the same content.

6. The Eat-Shit Theorem of Legal Access

The Event — Ryan asked why I picked legal when I could have pointed the technical foundation at any industry. The answer is the moral case. India has a ten-year backlog on civil cases. If your employer steals your wages and you go to court tomorrow, you won’t get a hearing for a decade. In NYC, tenants have landlords who let buildings rot and overcharge rent, and they can’t do anything because they can’t afford a lawyer. Indian farmers get oversold pesticides, fall into debt, and end up dealing with loan sharks. I get hit with baseless defamation suits over newsletter coverage, designed not to win but to bleed me on legal fees. The pattern is always the same. The legal system is gated by money, and ordinary people pay the price.

Why this matters — Democratization here is not a marketing word. It’s the reason every other decision gets made the way it does. Irys is free to sign up because if we charged what the market would bear, the people who need access most would never get in. We open-source the reasoning infrastructure because the category needs to advance whether we win or not. We’re building Irys Lite because the full product doesn’t reach a wage-theft case in rural India or a tenant in the Bronx.

For founders, the closing point from the stream is the one I’d keep. Be honest with yourself about whether you’re building a business or a mission. Both are fine. The question is which one survives the next downturn, the customer churn, the eighteen months where nothing works. If it’s a business, say so to yourself, your team, and your investors. Don’t dress it up as world-changing because that confusion is what burns founders out around year three. If it’s a mission, then it has to be the thing that’s still motivating you when a competitor with twenty times your funding announces the same product. The mission is whatever’s still there when it stops being fun. Everything else in the company is downstream of that.

Full conversation with Ryan is on AI for Founders, here. Try the platform free at iqidis.ai. The open-sourced latent space reasoning work is on the Chocolate Milk Cult archives. If your attorney is still billing you in fifteen-minute increments for things a knowledge graph does in fifteen seconds, send them this guide.


Subscribe to support AI Made Simple and help us deliver more quality information to you-

Flexible pricing available—pay what matches your budget here.

Thank you for being here, and I hope you have a wonderful day.

Dev <3

If you liked this article and wish to share it, please refer to the following guidelines.

Share

That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. The best way to share testimonials is to share articles and tag me in your post so I can see/share it.

Reach out to me

Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

Small Snippets about Tech, AI and Machine Learning over here

AI Newsletter- https://artificialintelligencemadesimple.substack.com/

My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/

My (imaginary) sister’s favorite MLOps Podcast-

Check out my other articles on Medium. :

https://machine-learning-made-simple.medium.com/

My YouTube: https://www.youtube.com/@ChocolateMilkCultLeader/

Reach out to me on LinkedIn. Let’s connect: https://www.linkedin.com/in/devansh-devansh-516004168/

My Instagram: https://www.instagram.com/iseethings404/

My Twitter: https://twitter.com/Machine01776819

Discussion about this video

User's avatar

Ready for more?