The agent that built it is the worst one to check it

TL;DR

Claude builds something, says it's done, and it looks right. Ask it "are you sure?" and it re-reads with the same mind that made the mistake and says yes — grading its own homework. The fix isn't a faster builder; it's an independent reviewer of a different make. Let Claude build and Codex try to break it: different training, different blind spots, so between them far less slips through. Split by function — one builds, one judges — and never let the author sign off on its own work.

Claude builds something — a piece of logic, a migration, a paragraph of copy. It finishes, says "done," and it looks right. You ask it, "Are you sure this is correct?" It re-reads its own work and says yes. You ship. It breaks on exactly the case it never thought to question — and there's the trap: the reason it never questioned that case is the same reason it couldn't catch it on review. It was grading its own homework, with the very pencil that made the mistake.

An agent checking its own output brings back the exact assumptions that produced it. If Claude misread the requirement the first time, it misreads it the same way on review — confidently. A single model has correlated failure modes: ask it twice and you get the same miss twice, dressed up as a second opinion. "Check your work" feels like a safeguard, but a reviewer that shares the builder's mind isn't a second opinion at all. It's the first opinion, repeated louder.

✕ Claude checks its own work

Claude built it with assumption X

▼

Reviews using the same assumption X

▼

"Looks right" — the flaw stays invisible

✓ Codex checks it

Claude built it with assumption X

▼

Codex reads without assumption X

▼

The flaw stands out — caught before shipping

02A second agent, with a different mind

Add a second agent whose role is not to build, but to disagree — to review, to audit, to try to break what Claude made. This is where Codex earns its place: a different maker, different training, different habits, different blind spots. Where two agents of the same make miss the same things, two of different makes miss different things — so between them, far less slips through. The point isn't Codex specifically; it's the difference. The second agent isn't a spare pair of hands; it's the pair of eyes the first one structurally cannot have.

Claude, the builder

Produces the work — and carries the assumptions that made it. It is not asked to judge its own output.

Codex, the reviewer

A different-make agent whose only job is to find what's wrong. It reads, raises findings, and never merges its opinion as fact.

You reconcile

Triage the findings — not all are real — decide what's true, and hold the gate: nothing merges until the other set of eyes has passed it.

03Split by function, not by area

This is a different cut from splitting one big task across many agents for speed. That split is by area — each agent its own non-overlapping patch of work, so two never touch the same spot. This split is by function: Claude produces, Codex judges the same thing. The reviewer reads only and raises findings; the builder fixes and never approves its own fix. The single rule that makes the pairing work: whoever wrote it doesn't get to sign off on it. The moment the author is also the approver, you're back to grading homework — and the blind spot wins. It flips cleanly, too: the day Codex is the one that wrote something, Claude takes the reviewer's chair. The reviewer is always the one who didn't hold the pen.

04When the second pair of eyes earns its keep

It isn't free. A reviewer costs a round of coordination, and most throwaway work doesn't need one. It earns its keep when correctness matters more than raw speed, when the work is hard to eyeball, and especially when something that "looked right" has burned you before. That last one is the tell: if you keep shipping things that passed your glance and broke anyway, you don't need a faster builder — you need an independent checker the builder can't talk over.

05The prompts: cast the reviewer, then set it loose

Two prompts carry this. The first casts Codex as the reviewer — paste it at the top of a session you keep separate from the builder:

You're the reviewer here, not the builder. You didn't write this and you won't rewrite
it. Your only job is to find what's wrong. Read it against the spec below, raise
findings as a list — each with where it is and why it's a problem — and don't fix
anything or sign anything off. When unsure, flag it rather than assume it's fine.
[paste the spec / the requirement it's meant to meet]

Then the work itself — framed to hunt for problems, not for praise:

Here's the change to review: [paste the diff / output].
Try to break it: which input or edge case makes it fail, what did the author assume
that isn't guaranteed, where might it quietly do the wrong thing? Be specific — and if
you genuinely can't find a problem, tell me what you checked.

06Putting it to work

Give Codex a review-only role and the standard to check against — the spec, the requirement, what good looks like. A reviewer with no yardstick only nitpicks.
Keep the two of different makes. Same model reviewing same model shares the very blind spot you're trying to escape — the difference is the whole point, not the brand.
Frame it adversarially: ask Codex to find what's wrong or try to break this — not "is this good?" The second framing fishes for approval; the first fishes for problems.
Triage the findings yourself. Codex surfaces; you decide what's real and what's noise — a different-make reviewer is confident in different places, not always right.
Loop the roles: Claude patches, Codex re-checks, and nothing merges until the other set of eyes has passed it.

A team of two with split roles catches a whole class of errors a lone agent is blind to — cheaply, before they reach anyone else. The cost is a little coordination; the payoff is that you stop shipping confident-but-wrong work. It comes back to the thread under all of this: an agent can't see its own blind spots. A second perspective isn't something Claude will grow on its own — it's something you add, and the surest way to add it is a second agent that doesn't think like the first.

The agent that built it is the worst one to check it

01The author can't see its own blind spot

✕ Claude checks its own work

✓ Codex checks it

02A second agent, with a different mind

03Split by function, not by area

04When the second pair of eyes earns its keep

05The prompts: cast the reviewer, then set it loose

06Putting it to work

When the whole team has agents: rules in one head don't scale to a team

Ask one agent for ten ideas and you get one idea ten times

01The author can't see its own blind spot

✕ Claude checks its own work

✓ Codex checks it

02A second agent, with a different mind

03Split by function, not by area

04When the second pair of eyes earns its keep

05The prompts: cast the reviewer, then set it loose

06Putting it to work

When the whole team has agents: rules in one head don't scale to a team

Ask one agent for ten ideas and you get one idea ten times

Get new pieces by email