Critique of blind multi-agent coding workflows and the underrated need for agent output auditability

The 100-Agent Problem Nobody Wants to Talk About

Boris Cherny, the creator of Claude Code, said something at a recent Sequoia AI session that stopped me mid-scroll. “100% of my code is written by Claude Code. I run around 100 agents at one time.” He’s an Anthropic engineer reportedly compensated at $750K per year. He built the tool. And he trusts it completely.

I believe him. The output is probably good. But I think what he’s describing, and what the broader community is rushing to copy, has a blind spot that we’re going to regret.

The Auditability Cliff

Here’s a simple truth about agentic coding workflows. When you run one agent, you read its output. When you run five, you spot-check. When you run a hundred, you’re not reviewing anything. You’re reading a summary at best, and trusting your test suite at worst.

That’s not a workflow. That’s delegation without oversight.

I’m not being precious about it. Tests catch bugs. Linters catch style violations. CI/CD pipelines catch regressions. But none of those tools tell you why a decision was made, what alternatives the agent considered, or whether the architecture that passed tests is the one you’d have chosen with full context. You find that out six months later when someone has to modify the code and has no idea what it’s doing or why.

Speed Is Not the Same as Control

The framing around 100-agent workflows is always about throughput. Look how much I shipped. Look how fast the codebase grew. And yes, that’s real. One person with good agentic tooling can build what used to require a team. I’ve experienced this. It changes what’s possible.

But there’s a difference between moving fast and moving blind. Fast with visibility is a superpower. Fast without it is just accumulating a debt you haven’t invoiced yet.

The thing that worries me isn’t that Claude Code writes bad code. It often writes excellent code. What worries me is that at 100 parallel agents, you’ve made a structural decision to stop being the author and become the approver, and most people making that switch haven’t thought through what approval actually requires at that scale.

What Auditability Actually Means

This isn’t about reading every line. That’s not realistic at scale, and it’s not what I’m arguing for. Auditability means you can reconstruct the reasoning behind a decision. It means that when something breaks in production, you have a path back to why the code was written the way it was. It means your agents leave a trail, not just a diff.

Right now, most agentic coding setups don’t do this well. You get the output. Sometimes you get a brief summary. You rarely get structured logs of what the agent tried, what it rejected, and what tradeoffs it made. That’s the gap.

Some teams are starting to think about this. Structured agent traces, decision logging, intermediate output checkpointing. None of it is standard yet. It should be.

Where This Goes Wrong in Practice

The failure mode isn’t a spectacular crash. It’s a slow accumulation of code that nobody fully understands, written by agents that nobody was watching, in a codebase that passes tests but resists modification. It’s the kind of thing that looks fine until you need to change it.

I’ve seen this pattern with offshore teams, with copy-paste coding, and now I’m watching it set up to happen again with agents. The tool changes. The dynamic doesn’t.

Boris Cherny is probably the right person to run 100 Claude Code agents simultaneously. He built the system. He knows its failure modes intimately. That’s not a workflow most people should copy without a serious plan for what happens when something goes wrong and they need to understand why.

The productivity gains from agentic coding are real and I’m not walking that back. But the next thing this space needs to build isn’t more agents. It’s better observability for the ones we already have.

Sources

#AIEngineering #ClaudeCode #SoftwareEngineering #AgenticAI #MLOps

Sources & Further Reading

Boris Cherny Claude Code Sequoia AI session (via @0xMovez)

Critique of blind multi-agent coding workflows and the underrated need for agent output auditability

Sources & Further Reading

Insight on what GPT-5.5’s ‘new way of getting computer work done’ framing means for builders and product moats

Contrarian take on the real skill gap in AI-assisted engineering: problem framing, not prompting

Nvidia releases Nemotron-3 Super 120B MoE model built for AI agents, open source

Decoding the Delivery Hype Score: A Comprehensive Guide to Evaluating Investment Opportunities in the Tech Sector

Claude Code multi-agent architecture and what proper configuration actually unlocks for solo developers

AI agent reliability failures happen at transition points between steps, not in the core reasoning loop

Leave a Reply Cancel reply

Sources & Further Reading

Similar Posts

Leave a Reply Cancel reply