Skill · Prompt

Probe with a small task before you trust it in a new domain

A prompt that makes the agent do one small slice and show how it thinks — before you hand over the whole task in a field it hasn't proven itself in.

TypePrompt

LanguageĐọc bản tiếng Việt →

Your agent is good at the familiar — you've watched it a hundred times. Then you hand it a task in a completely new domain and assume it's just as good there. Same agent, but good at code review doesn't mean good at legal analysis or reading numbers from an unfamiliar industry. You find out only after it's done the whole big task — wrong in a way you didn't see coming.

People who've done this a while don't trust a new domain straight off. They run one small slice first, watch how the agent approaches it, and only then set how much to trust it. Not out of suspicion — because the same model behaves differently by domain, and a small slice tells you where it differs before that difference gets expensive.

✕ Hand over the big task

Familiar in the old domain → assume good in the new

▼

It runs the whole thing, wrong in domain-specific ways

▼

You find out too late, on the work that mattered

✓ Probe a small slice first

One representative slice + how it thinks, where unsure

▼

You check the approach vs what the domain needs

▼

Set the trust level, then scale to the real task

Paste it the first time you hand over a new kind of task, not every time. The trick is making it show its thinking, not just the result — because what you're measuring isn't "is the answer right," it's "does its approach match what this domain actually needs." A small slice done the right way is more trustworthy than a big one that came out right by luck.

Paste into the chat

This is a new kind of task for you, so let's calibrate before I hand over the real thing:

1. I'll give you a small, representative slice first — do just that, not the whole task.
2. Show your work: not only the result, but how you reached it and where you were unsure.
3. Then stop and wait. I'll check whether your approach matches what this domain actually needs.

Don't scale up to the full task until I've seen the small one and said go. A good result on a tiny slice — reached the right way — tells me how much to trust you on the big one.

Prefer your agent install it?

Add this rule to my CLAUDE.md, verbatim:

This is a new kind of task for you, so let's calibrate before I hand over the real thing:

1. I'll give you a small, representative slice first — do just that, not the whole task.
2. Show your work: not only the result, but how you reached it and where you were unsure.
3. Then stop and wait. I'll check whether your approach matches what this domain actually needs.

Don't scale up to the full task until I've seen the small one and said go. A good result on a tiny slice — reached the right way — tells me how much to trust you on the big one.