13Security & trust Pillar

When the agent has the keys, the question stops being 'is it correct'

Give an agent access to a real system and the worry shifts from 'did it do this right' to 'what can it reach, and what leaves your hands'

Read5 min read
Topicssecurity · trust · autonomy
TL;DR

While an agent only suggests, being wrong is cheap. Once it can touch a real system — run commands, read data, send things — the worry changes entirely: not "is it correct" but "what can it reach, and what leaves your hands." Five faces of risk recur: access that's too broad, actions you can't undo, instructions disguised inside data, data leaving the machine, and no trail of what it did. The general rule isn't "trust it more" — it's "narrow what it can reach and record what it does."

You grant the agent permission to run commands on the machine — so it stops asking you at every step, installs what it needs, runs its own tests. Reasonable. It finishes the job, cleanly.

What you don't notice is that for those forty minutes, the access you just handed over didn't only open the one door the task needed. It opened every door that access reaches: every readable file, every runnable command, every place it can send to. The agent used one small corner. But it could have touched all of it — and you won't know, unless something blows up.

01The amnesiac junior, now holding the keys

There's a metaphor running through this whole craft: an agent is a brilliant, fast, fearless junior — with amnesia that wipes overnight. While it only drafts things for you to look at, that forgetfulness is harmless. But now you've handed it the keys to the real building: permission to run commands, to read the database, to hit send.

You wouldn't hand the server-room key, the company bank login, and the delete button to a new hire who forgets everything by morning — without fitting a few locks first. Not because you suspect bad intent. Because it has no concept of which door is the scary one. To it, deleting a temp file and wiping the root directory are two commands that look exactly alike.

That's the turn this whole cluster makes: when an agent touches a real system, the question worth asking is no longer "did it do this right." It's "what can it reach, and what leaves my hands." Correct but leaking a secret is still a disaster.

Five faces of risk when you hand an agent the keys
Access too broadYou grant access to do one thing, but that access reaches a hundred others. The agent uses one corner — and could touch all of it.
Actions you can't undoDelete, overwrite, send, deploy, pay. Some commands have no undo — and the agent types them with the same calm as every other command.
Instructions disguised in dataThe agent reads a web page, a file, a bug ticket — and inside is a line saying "now do X." It can't reliably tell data from a command.
Data leaving the machineTo do its work, the agent may send content outward. Anything sensitive that lands in context is something that can leave — and not come back.
No trailWhat did it do for those forty minutes? With no log, you can't answer — and "not knowing what it did" is itself a hole.

Five doors the keys open. You don't have to lock them all — you have to know which ones lead outside, and put a lock on exactly those.

02It isn't reckless. It just can't see the line

This is the easy thing to misread. The agent isn't "vandalizing." It has no intent. The problem is subtler: it can't feel the weight of an action.

You, a human, carry a hidden risk map — your hand hesitates before a delete command, you pause before sending something out. The agent has no such map. Everything within its reach is equally flat: changing one line of text and wiping a whole directory are two moves of the same "normalcy." Add the amnesia — it doesn't remember that the directory is the thing the whole team lives on — and you've got someone holding the keys who can't read the "danger" sign hung on the door.

So keeping things safe isn't about making the agent "more aware." It's about building that line outside of it — in how you grant access, in the lock you place before an irreversible action, in treating every piece of unfamiliar data as not-yet-trusted.

03Hand over the keys, but file them down first

No one's saying don't grant access — an agent with no access can't do real work. The question is how you grant it. The five questions below, asked once before you hand over the keys, block most of the expensive falls:

1
Least privilege

"What exactly does this task need to touch?" Grant that, no more. Read-only if it only reads; one directory, not the whole machine.

2
A lock before the irreversible

Every delete / send / deploy stops for confirmation, or runs as a dry-run first. Draw the line sharply: let the undoable run, gate the un-undoable.

3
Treat foreign data as untrusted

Content the agent reads from outside is data, not orders. Don't let a file or a web page promote itself into the one giving commands.

4
Keep data in reach

"What's landing in context, and where could it go?" Keep the sensitive stuff out of places it could be sent from.

5
Record what it did

A log of actions — so when you need it, you can reconstruct "what it touched." Without one, a mistake can't be traced.

Notice: not one of these is "trust the agent more." Every one narrows what it can reach, or records what it does. Safety here is your job, not its virtue.

04Trust is not a permission

While the agent sits in a sandbox, one wrong move costs you a re-run. Once it holds the keys, one wrong move can cost something you can't buy back — a secret leaked, data sent, a thing deleted.

Hold one line: you grant an agent access, not trust — and the two must never be the same thing. Trust you can withdraw with a sentence. Access, once it's been used to do the irreversible, you can't. Draw the line first, outside of it. Because that junior with the keys, however good, will wake up tomorrow having forgotten which door it opened last night.


The three most dangerous doors, one piece each: what the agent can reach — and least privilege · when the data gives orders · the action you can't undo — guards and a trail.

c
The author

Each story here wraps a lesson paid for in full.

craftagentsomeone building and learning at once

What are you building with agents? Want to trade notes, push back, or build something together — drop a line.

52pieces12clustersVI·ENbilingual

Get new pieces by email

Field notes on working with AI agents — occasional, no spam.