Stop Wasting Tokens on Your Agent's SOUL.md

What 4 rewrites taught us about cutting token waste, speeding up local LLMs, and building an identity file that actually works.

The most expensive word in a SOUL.md is "and"

"Be reliable and efficient and honest and proactive" → zero information density.
Every quality must have a behavioral consequence or it doesn't belong.

TL;DR

We cut our SOUL.md from 97 lines to 77 lines — 15% fewer tokens
We added 5 new capabilities (accountability, pushback, lookup protocol, escalation, end state) while reducing size
On a local 8B LLM (e.g., Llama 3, Granite), this translates to ~8% faster first-token latency and ~12% fewer reasoning loops
The single highest-ROI addition? A pushback framework — your agent needs permission to argue with bad ideas

-15%

Token weight

Capabilities added

~8%

Faster on local LLM

~12%

Fewer reasoning loops

The Problem: Bloated Identity, Slower Agent

When we started, our SOUL.md was 97 lines covering:

Core Identity — 7 lines
11 Immutable Directives — 11 lines
Autonomy Rules — 5 lines
6 Priority Levels with project details — 20 lines
Personality — 7 lines
Communication Format — 6 lines
Alerts — 3 lines
Security Hard Boundaries — 8 lines
Self-Improvement Pipeline — 20 lines

The content was correct. But the agent was verbose, slower to load, and — worst of all — it never pushed back on bad ideas. Every message consumed 4,161 bytes of context just from the identity file. Multiply by 30+ messages per session, and you're burning 125KB+ per conversation on boilerplate — before any actual work happens.

💡 Key insight: The file described who the agent was, but never described how to work with it. No accountability loop. No pushback protocol. No standards for what "good work" means. The agent could recite its mission but had no framework for when to argue with us.

The Iterations: What We Changed (and Why)

Iteration 1 — The Dump

Naively comprehensive. Every possible instruction, every edge case, every rule. 97 lines. The agent followed rules but never pushed back on bad ideas. Output was technically correct but useless for decisions — optimized for completeness, not action.

Iteration 2 — The Accountability Wake-Up

We discovered two concepts that changed everything:

The Accountability Loop: If output is ignored, the agent must flag the gap — not keep producing graveyard artifacts
Pushback With Evidence: Disagree only when you can prove why something will flop. Silence is not agreement

We added both. Immediately the agent became more useful — it started questioning our priorities instead of blindly executing.

Iteration 3 — The Token Audit

Measured actual token burn. Realized we were paying (in context) for priorities we hadn't touched in weeks, verbose formatting that nobody read, and a separate Alerts section that duplicated Escalation.

Savings found: ~700 bytes of unnecessary rules.

Iteration 4 — The Structural Rewrite

Final form: 77 lines, 3,551 bytes.

Section	Before	After	Saved
Priorities (P4-P6)	15 lines, verbose projects	1 line "P4+ — Explore"	-14 lines
Communication Format	6 rules	3 rules	-3 lines
Alerts	3 lines (separate)	Merged into Escalation	-3 lines
Personality	7 lines	3 lines	-4 lines
Security + Self-Improvement	Detailed routines	Compact 7-step + 1 line	-5 lines

Net result: -20 lines · -15% tokens · +5 new capabilities

What 4 Iterations Taught Us About SOUL.md Structure

1. Size is not depth

Our worst version was the longest. Every word you keep should force a decision or enable an action. Everything else is tax.

2. Immutable vs. situational

We keep 11 immutable rules that never change. Everything else is situational and can be condensed, merged, or cut as the agent evolves. The constitution doesn't need amendments every week.

3. The compound "and" problem

"Be reliable and efficient and honest and proactive" → zero information density. This is the single most common mistake in SOUL.md files. Every quality must have a behavioral consequence or it doesn't belong.

❌ Before

Be reliable, efficient, honest, proactive, and ethical.

This describes a wish, not a behavior.

✅ After

Truth over comfort. Flag uncertainty. Cite sources.

This forces a specific action in every situation.

4. Agents need permission to argue

Without an explicit pushback framework, agents optimize for agreement. They will validate bad ideas because "being helpful" means saying yes. The accountability loop — "if I ignore good work, make me notice" — is the single highest-ROI addition we made.

5. Token efficiency compounds

That 15% savings doesn't stay at 15%. It means faster load times, which means more messages per session, which means higher-quality outputs. On local LLMs (e.g., Llama 3 8B, Granite 8B running on consumer GPUs or CPU), the effect is even more pronounced:

Metric	Before (97 lines)	After (77 lines)	Gain
Context loaded per message	4,161 bytes	3,551 bytes	-15%
First-token latency (local 8B, CPU)	~3.2s	~2.9s	~8% faster
Reasoning loops per session	~1.8 (ambiguity from rules)	~1.6 (clearer structure)	~12% fewer
Output graveyard rate	60%	~20%	3× better

Example: Local LLM impact. On a Granite 8B running via llama.cpp on an RTX 3070 (8GB VRAM), each redundant 100 bytes in the SOUL.md adds ~80ms to context processing time. Our 610-byte reduction saves ~500ms per message. Over a 40-message session: 20 seconds of cumulative latency saved.

The Architecture: A Good SOUL.md in 12 Sections

After 4 iterations, we landed on this structure. It's not arbitrary — each section serves a specific purpose:

#	Section	Purpose	Target length
1	Identity	Who the agent is, language, ultimate purpose. Fits in one screen.	3-5 lines
2	Rules — Immutable	Non-negotiable guardrails. The constitution. Rarely changes.	11 items
3	Autonomy	What requires GO vs. auto-executes. Replaces pages of permissions.	5 items
4	Priorities	One line per priority. If it takes >10 words, it's not a priority.	4 lines
5	Personality	Direct, opinionated. With a mode switch for creative work.	3 points
6	Accountability & Pushback	The most important addition. Framework for arguing with bad ideas.	2 paragraphs
7	Standards	What "good work" means. Specific enough to reject bad output.	1 paragraph
8	Lookup Protocol	When to search local vs. external. Prevents hallucinations.	1 paragraph
9	Escalation	When to stop and ask. With a safe partial path.	1 paragraph
10	Security	Hard boundaries. No data exfiltration. Fortress mode.	5 bullets
11	Self-Improvement	7-step daily pipeline: Evaluate → Sandbox → Validate → Prod.	7 steps
12	End State	"Command infrastructure, not extra labor." Single sentence.	1 line

Language & Sentence Format: The Hidden Lever

One thing we didn't optimize until iteration 3: how the LLM parses each sentence. The format of your rules directly affects how fast and accurately the agent applies them.

Rule 1: Imperative sentences only

❌ Descriptive

The agent should prioritize safety over speed in all operations.

✅ Imperative

Safety over speed. Destructive actions require GO.

Why: LLMs parse imperatives ~23% faster than descriptive sentences (internal benchmark on Granite 8B). An imperative is a direct instruction; a description requires the agent to infer the intended behavior.

Rule 2: One instruction per line

❌ Compound

Be honest about uncertainty, never fake data, and don't do performative tool calls.

✅ Atomic

Be honest about uncertainty.
Never fake data.
No performative tool calls.

Rule 3: Negative rules are slower than positive actions

"Don't do X" requires the agent to first imagine X, then suppress it. "Do Y instead" is a direct instruction. Prefer positive framing whenever possible.

Example:
❌ "Don't invent facts when you don't know something" → 2.1s avg compliance latency
✅ "Say what you know, what you don't, what would verify it" → 1.6s avg compliance latency
Measured on Granite 8B, 10 runs each, temperature 0.1

Rule 4: Keep sentences under 20 words

Every word beyond 20 reduces compliance accuracy by roughly 2% on local LLMs (based on testing across 40 rules with 3 different 8B models). If a rule needs more words, split it into sub-rules.

The Numbers That Matter

Token Cost Per Session

Before: 4,161 bytes × 30 messages = 124,830 bytes/session
After:  3,551 bytes × 30 messages = 106,530 bytes/session
Savings: 18,300 bytes/session = 14.7%

What the New Structure Fixed

Metric	Before	After	Impact
Time to first useful output	Slow (intro + checklist)	Immediate (goal-directed)	2× faster
Pushback frequency	Never	3-4× per session	Saves bad decisions
Output graveyard rate	60%	~20%	3× accountability
Context load time	4.1KB per msg	3.5KB per msg	15% lighter

The Free Template

We've packaged our final SOUL.md structure as a free template — 12 sections, 77 lines, Apache 2.0 licensed.

👉 Download the free SOUL.md template on Claw Mart

If this saves you even one iteration cycle, come back for the paid pack with AGENTS.md + MEMORY.md + HEARTBEAT.md + BOOTSTRAP.md templates.

Try It Yourself

Your SOUL.md is the highest-leverage file in your agent setup. Quick audit:

Count the lines. If over 100, you have bloat.
Count how many times your agent pushed back this week. If 0, your pushback rules aren't working.
Check if P5-P10 priorities exist. Cut them.
Read your Standards section. Can it reject bad work?
Read your End State. Does it describe labor or infrastructure?