LLMs write plausible code not correct code, and what that distinction means for engineers in production

Plausible Is Not the Same as Correct

There is a distinction circulating in engineering circles right now that I think deserves more attention than it’s getting. It comes from a piece by @KatanaLarp that’s been making rounds on X, and the framing is sharp enough that I want to expand on it here.

LLMs don’t write correct code. They write plausible code.

That gap, small as it sounds, is where production systems go to die.

What Plausible Actually Means

Plausible code compiles. It runs. On the happy path, with clean inputs and low load, it produces output that looks right. You read it and think, yeah, that looks like something I’d write.

That’s by design. These models are trained to predict the next token in a sequence, optimizing for what good code looks like statistically, not for whether the code is actually good. Those are different objectives, and pretending otherwise is how you end up with subtle data corruption at 2am on a Tuesday when traffic spikes.

The @KatanaLarp piece focuses on SQL specifically, which is a great domain to examine because SQL bugs are so quiet. A query can return results that are slightly wrong in ways that only surface when you’re joining across millions of rows, or when a transaction boundary is placed one statement off, or when an index assumption baked into the query stops holding under a different data distribution. The code looked fine in review. It ran fine in staging. It failed slowly, in production, in ways that were hard to trace.

Where Engineers Go Wrong

The failure mode I see most often isn’t engineers blindly trusting AI output. It’s more subtle than that. Engineers are reading the generated code, it looks familiar, and they’re applying pattern-matching review instead of semantic review. They’re asking “does this look like correct code” instead of “is this correct code.” The model has already biased them toward the former.

This is not a knock on engineers. It’s a predictable consequence of how these tools integrate into workflows. When you’re moving fast and the code looks right, the mental cost of deep verification is high. The model has offloaded drafting but not judgment, and judgment is the hard part.

What Good Looks Like in Practice

There’s a useful thread from @BharukaShraddha on structuring Claude Code projects that gets at part of the solution without quite naming it this way. The idea of putting local CLAUDE.md files near “sharp edges” of a codebase (auth, persistence, migrations) is essentially a way of encoding semantic constraints that the model otherwise won’t know to respect. You’re telling the model: here is where correctness requirements are unusually strict. Don’t just write plausible code here.

That’s a reasonable engineering response to a real limitation. It doesn’t fix the underlying problem, but it narrows the blast radius.

The broader principle is that human verification has to be proportional to risk, not proportional to how confident the generated code looks. Confident-looking code from an LLM should probably make you more careful, not less, because the model has no mechanism for expressing appropriate uncertainty about edge cases it can’t see.

What I Actually Think

I’ve watched teams adopt AI coding tools and gradually stop asking “why does this work” in code review. That’s the skill erosion that worries me. Not that the tools are bad, but that they’re training engineers to accept plausibility as a proxy for correctness, and that habit will outlast any single tool.

OpenAI just shipped Codex Security, framed as an agent that finds vulnerabilities and proposes fixes. That’s useful. But a security agent catching bugs downstream doesn’t change the fact that the bug was accepted into the codebase by an engineer who should have caught it upstream. The tooling is racing to patch the gap, but the gap keeps getting wider.

The answer isn’t to use AI coding tools less. It’s to be precise about what they’re actually doing. They’re drafting. You’re engineering. Don’t confuse the two.

Sources

#AIEngineering #SoftwareEngineering #LLMs #CodingWithAI #ProductionSystems

Watch the full breakdown on YouTube

LLMs write plausible code not correct code, and what that distinction means for engineers in production

Sources & Further Reading

SpaceX and Tesla announce TERAFAB project targeting 1 terawatt of annual compute production

GitNexus open-source codebase knowledge graph tool and the real bottleneck being code comprehension, not code generation

Introducing BookWriterAI: Revolutionizing the World of AI-Driven Book Writing

“Unlocking Infinite Gaming Worlds: Harnessing AI for Real-Time Game Content Generation”

“Decoding Morality: The Ethics of AI in Video Game Development”

Vibe coding democratizes custom software: dad builds personalized piano learning app for daughter using Claude

Leave a Reply Cancel reply

Sources & Further Reading

Similar Posts

Leave a Reply Cancel reply