You grant the agent permission to run commands on the machine — so it stops asking you at every step, installs what it needs, runs its own tests. Reasonable. It finishes the job, cleanly.
What you don't notice is that for those forty minutes, the access you just handed over didn't only open the one door the task needed. It opened every door that access reaches: every readable file, every runnable command, every place it can send to. The agent used one small corner. But it could have touched all of it — and you won't know, unless something blows up.
01The amnesiac junior, now holding the keys
There's a metaphor running through this whole craft: an agent is a brilliant, fast, fearless junior — with amnesia that wipes overnight. While it only drafts things for you to look at, that forgetfulness is harmless. But now you've handed it the keys to the real building: permission to run commands, to read the database, to hit send.
You wouldn't hand the server-room key, the company bank login, and the delete button to a new hire who forgets everything by morning — without fitting a few locks first. Not because you suspect bad intent. Because it has no concept of which door is the scary one. To it, deleting a temp file and wiping the root directory are two commands that look exactly alike.
That's the turn this whole cluster makes: when an agent touches a real system, the question worth asking is no longer "did it do this right." It's "what can it reach, and what leaves my hands." Correct but leaking a secret is still a disaster.
Five doors the keys open. You don't have to lock them all — you have to know which ones lead outside, and put a lock on exactly those.
02It isn't reckless. It just can't see the line
This is the easy thing to misread. The agent isn't "vandalizing." It has no intent. The problem is subtler: it can't feel the weight of an action.
You, a human, carry a hidden risk map — your hand hesitates before a delete command, you pause before sending something out. The agent has no such map. Everything within its reach is equally flat: changing one line of text and wiping a whole directory are two moves of the same "normalcy." Add the amnesia — it doesn't remember that the directory is the thing the whole team lives on — and you've got someone holding the keys who can't read the "danger" sign hung on the door.
So keeping things safe isn't about making the agent "more aware." It's about building that line outside of it — in how you grant access, in the lock you place before an irreversible action, in treating every piece of unfamiliar data as not-yet-trusted.
03Hand over the keys, but file them down first
No one's saying don't grant access — an agent with no access can't do real work. The question is how you grant it. The five questions below, asked once before you hand over the keys, block most of the expensive falls:
"What exactly does this task need to touch?" Grant that, no more. Read-only if it only reads; one directory, not the whole machine.
Every delete / send / deploy stops for confirmation, or runs as a dry-run first. Draw the line sharply: let the undoable run, gate the un-undoable.
Content the agent reads from outside is data, not orders. Don't let a file or a web page promote itself into the one giving commands.
"What's landing in context, and where could it go?" Keep the sensitive stuff out of places it could be sent from.
A log of actions — so when you need it, you can reconstruct "what it touched." Without one, a mistake can't be traced.
Notice: not one of these is "trust the agent more." Every one narrows what it can reach, or records what it does. Safety here is your job, not its virtue.
04Trust is not a permission
While the agent sits in a sandbox, one wrong move costs you a re-run. Once it holds the keys, one wrong move can cost something you can't buy back — a secret leaked, data sent, a thing deleted.
Hold one line: you grant an agent access, not trust — and the two must never be the same thing. Trust you can withdraw with a sentence. Access, once it's been used to do the irreversible, you can't. Draw the line first, outside of it. Because that junior with the keys, however good, will wake up tomorrow having forgotten which door it opened last night.
The three most dangerous doors, one piece each: what the agent can reach — and least privilege · when the data gives orders · the action you can't undo — guards and a trail.