13Security & trustDeep dive ①

An agent works with everything it can reach — so shrink what it can reach

Staying safe isn't telling it 'don't touch that' — it's putting 'that' out of reach in the first place

Read4 min read
Topicssecurity · access · autonomy
TL;DR

An agent works with everything in its reach, and to it everything in that reach is equally flat — none of it is "don't touch." So the fence "tell it not to touch that" is the weakest one: it leans on the agent remembering and complying, and the agent forgets. The strong fence puts "that" out of reach from the start — read-only by default, scoped to one place, and secrets kept out of what it can see.

You ask the agent to clean up a pile of old junk files. To let it move fast, you grant it permission to delete files. It clears exactly what needed clearing, neatly.

Then a beat later it lands on you: the "delete files" permission you just handed over doesn't stop at that junk pile. It reaches every file that account can touch. The agent only used the permission for the job you asked — this time. But you just handed scissors to someone who can't tell a loose thread from the cord holding something heavy.

01To an agent, everything in reach is equally flat

The root isn't that the agent is reckless. It has no intent to break things. The root is that it can't feel where the off-limits zone is.

You carry a hidden risk map: you know this folder is junk, that one is what the whole team lives on, and your hand hesitates before the second. The agent has no such map. Everything in its reach sits on one flat plane: a junk file and a foundational file look identical through the eye of a "may delete" permission. It doesn't see the "danger" sign — because to it the sign doesn't exist.

Add the amnesia. Even if you say "don't touch folder X" in one sentence, a different branch of thought a few steps later can forget that instruction entirely. A fence built out of "remember this for me" is a fence built on exactly the thing the agent is worst at.

02"Don't touch" is a lock made of words. Lock it with a wall

So the thinking has to flip. Don't try to make the agent more aware of the forbidden zone. Make the forbidden zone out of reach — so that even if it forgets, misreads, or gets talked into something by a stray line of data, that door simply won't open.

Grant broad, then say don't touch

Access that reaches everything, plus a "don't touch X"
The fence lives in the agent's memory — the thing that wipes each branch
One lapse, one misread, and the forbidden door is wide open

Shrink what it can reach

Grant only what the job needs — X isn't in reach at all
The fence lives in how access was granted — not in anyone's memory
Forgets or not, same result: the forbidden door doesn't exist for it

The core difference: the left fence lives in the agent's head (where everything evaporates overnight); the right one lives in the configuration (where it stays put even when no one's thinking about it).

03Three moves to file the reach down

Shrinking doesn't mean tying the agent down to uselessness. It means granting the right size. These three moves apply to nearly every grant:

1
Read-only by default

Most work only needs to read, not change. Start with read-only; open write/delete only when the job truly demands it, and only where it demands it.

2
Scope it to one place

"What exactly does this need to reach?" Give it one folder, not the whole drive; one project, not the whole account. A narrower reach is a narrower blast radius.

3
Keep secrets out of its sight

Real passwords, keys, tokens shouldn't sit in the context the agent reads. What it can't see is what it can't accidentally use — or accidentally send.

All three answer the same question — "does this really need to be in reach?" — with a default answer of "no, until the job demands it." Granting in a grant-as-you-go way is far safer than granting broad and clawing back.

04The cost of one over-reach

Why bother to this degree, when most runs pass without incident? Because of the asymmetry. Granting narrow costs you a few extra seconds each time, and occasionally one mid-task widening. Granting broad sails through ninety-nine times — then on the hundredth, one misread or one line of data-borne coaxing turns that broad access into something deleted, overwritten, sent.

You're not choosing between "loose enough to work" and "tight enough to be safe." Scoping the reach gives you both: the agent still runs free inside its scope, and what's outside the scope needs no watching, because it was never in reach. This is just the real-world version of reading the blast radius before you edit — except here you draw that radius up front, with the very key you hand over. The junior with the scissors doesn't need you to name which thread not to cut. It needs you to hand it only the loose end.

c
The author

Each story here wraps a lesson paid for in full.

craftagentsomeone building and learning at once

What are you building with agents? Want to trade notes, push back, or build something together — drop a line.

52pieces12clustersVI·ENbilingual

Get new pieces by email

Field notes on working with AI agents — occasional, no spam.