Forty minutes. That's how long the agent ran before I opened the result and saw it was wrong — wrong from somewhere early, then carrying that wrongness through everything that followed.
My first reflex: blame the agent. It isn't good enough for this.
But when I traced it back, the agent had followed the exact process I gave it. The break wasn't in its ability. It was in the way I'd let the leash out — a way that gave that early mistake room to grow with nobody there to stop it. And it wasn't the first time. I'd hit this same shape before; I just hadn't named it yet.
01Broken autonomy has a shape — it isn't bad luck
When you sit beside an agent and review each step, errors rarely travel far: you see it veer and you pull it back on the spot. Autonomy removes that exact pull-it-back-now moment. And once that moment is gone, errors don't dissolve in random ways — they collapse into a few very stable shapes.
I count three. Almost every time autonomy bites back, it lands in one of them.
02One — it fills the gap with a guess that sounds completely reasonable
Every task you hand an agent has something left unsaid. When you're sitting beside it, the agent hits that fuzzy spot and stalls — or you catch it about to take a wrong turn. When it runs alone, there's no one to ask, so it picks the default that sounds most reasonable and keeps going.
The problem: "most reasonable" to an agent usually means most common, not most-right-for-your-case. It doesn't fail loudly. It fails smoothly — one tilted foundational assumption, and everything built on top of it is immaculate except the footing.
03Two — it drifts, and nobody sees the moment it started
When the agent is a long chain — each step feeding the next — a small deviation at step two doesn't sit still. It becomes the input to step three, then step four. By the time you look at the final result, you're staring at a mistake multiplied five times over, so far from its source it's hard to trace back.
The danger here isn't the size of the original error — it's usually tiny. It's the distance between where it happened and where you looked. The longer the chain, the wider that distance, and the later you find out.
04Three — a brilliant first run sets a trap for the next
The first time the agent runs something on its own, it comes back beautiful. You raise your trust. Next time you hand it something bigger, brief it less, review it looser. Then a run comes back mediocre — and now you're stuck: you've promised (your boss, your team, yourself) that this one runs itself.
What stings isn't the average output. It's the gap between what you expected and what you got — and the fact that you're the one who has to explain that gap. The root: you anchored your trust on the best sample, not the average one.
Notice none of the three is an "agent is dumb" failure. All three are structural — and structural failures get fixed by rebuilding the setup, not by blaming the tool.
05Each shape has one sentence that blocks it
The good news about "has a shape": what you can predict, you can pre-empt. Each shape has one preventive move you make before you let go — and none of them slows you down by much.
Before you let it run free: "List every assumption you're relying on for this task. For anything you're unsure of, stop and ask — don't pick a default." A fuzzy spot said out loud is one you can block; one sitting silently in the agent's head is not.
"After each major step, pause for one line: what the last step produced, what the next one intends to do." A checkpoint mid-chain catches the deviation while it's still small — far cheaper than reading the final output and tracing back six steps.
Don't raise your trust on one beautiful result. Let it run the same kind of task three or four times before you lock in an autonomy level — you're measuring the floor, not the ceiling. Trust built on the average survives a bad run; trust built on the peak doesn't.
These three sentences don't shrink your autonomy. They make it more trustworthy — because now it stands on the exact three spots that used to make it fall.
06Blaming the agent skips the one place you can actually fix
The next time an agent runs for forty minutes and comes back wrong, the question worth asking isn't "is the agent good enough?" It's "which of the three shapes did it just fall into?"
Because if the fault is in the agent, all you can do is wait for a better model. But if the fault is in how you set it up — and it almost always is — that's the place you can fix, today, with a single sentence placed up front.