New to the infra side, so genuine question: if cognition/agent design enforces stable prompts, strict output schemas, and routing to small models for easy tasks, does that meaningfully boost the same wins you’re getting from batching/graphs/quant (cache hits, shorter contexts, fewer retries)? Curious where you’ve seen that complement your Tier-1/2 work.
Policy "Tuna" and Chocolate Milk is tomorrows lunch. Only high performance thank you
😂😂
New to the infra side, so genuine question: if cognition/agent design enforces stable prompts, strict output schemas, and routing to small models for easy tasks, does that meaningfully boost the same wins you’re getting from batching/graphs/quant (cache hits, shorter contexts, fewer retries)? Curious where you’ve seen that complement your Tier-1/2 work.