Stop Wasting Tokens on Your Agent's SOUL.md

What 4 rewrites taught us about cutting token waste, speeding up local LLMs, and building an identity file that actually works.

The most expensive word in a SOUL.md is "and"

"Be reliable and efficient and honest and proactive" → zero information density.
Every quality must have a behavioral consequence or it doesn't belong.

TL;DR

-15%
Token weight
+5
Capabilities added
~8%
Faster on local LLM
~12%
Fewer reasoning loops

The Problem: Bloated Identity, Slower Agent

When we started, our SOUL.md was 97 lines covering:

The content was correct. But the agent was verbose, slower to load, and — worst of all — it never pushed back on bad ideas. Every message consumed 4,161 bytes of context just from the identity file. Multiply by 30+ messages per session, and you're burning 125KB+ per conversation on boilerplate — before any actual work happens.

💡 Key insight: The file described who the agent was, but never described how to work with it. No accountability loop. No pushback protocol. No standards for what "good work" means. The agent could recite its mission but had no framework for when to argue with us.

The Iterations: What We Changed (and Why)

Iteration 1 — The Dump

Naively comprehensive. Every possible instruction, every edge case, every rule. 97 lines. The agent followed rules but never pushed back on bad ideas. Output was technically correct but useless for decisions — optimized for completeness, not action.

Iteration 2 — The Accountability Wake-Up

We discovered two concepts that changed everything:

We added both. Immediately the agent became more useful — it started questioning our priorities instead of blindly executing.

Iteration 3 — The Token Audit

Measured actual token burn. Realized we were paying (in context) for priorities we hadn't touched in weeks, verbose formatting that nobody read, and a separate Alerts section that duplicated Escalation.

Savings found: ~700 bytes of unnecessary rules.

Iteration 4 — The Structural Rewrite

Final form: 77 lines, 3,551 bytes.

SectionBeforeAfterSaved
Priorities (P4-P6)15 lines, verbose projects1 line "P4+ — Explore"-14 lines
Communication Format6 rules3 rules-3 lines
Alerts3 lines (separate)Merged into Escalation-3 lines
Personality7 lines3 lines-4 lines
Security + Self-ImprovementDetailed routinesCompact 7-step + 1 line-5 lines

Net result: -20 lines · -15% tokens · +5 new capabilities


What 4 Iterations Taught Us About SOUL.md Structure

1. Size is not depth

Our worst version was the longest. Every word you keep should force a decision or enable an action. Everything else is tax.

2. Immutable vs. situational

We keep 11 immutable rules that never change. Everything else is situational and can be condensed, merged, or cut as the agent evolves. The constitution doesn't need amendments every week.

3. The compound "and" problem

"Be reliable and efficient and honest and proactive" → zero information density. This is the single most common mistake in SOUL.md files. Every quality must have a behavioral consequence or it doesn't belong.

❌ Before

Be reliable, efficient, honest, proactive, and ethical.

This describes a wish, not a behavior.

✅ After

Truth over comfort. Flag uncertainty. Cite sources.

This forces a specific action in every situation.

4. Agents need permission to argue

Without an explicit pushback framework, agents optimize for agreement. They will validate bad ideas because "being helpful" means saying yes. The accountability loop — "if I ignore good work, make me notice" — is the single highest-ROI addition we made.

5. Token efficiency compounds

That 15% savings doesn't stay at 15%. It means faster load times, which means more messages per session, which means higher-quality outputs. On local LLMs (e.g., Llama 3 8B, Granite 8B running on consumer GPUs or CPU), the effect is even more pronounced:

MetricBefore (97 lines)After (77 lines)Gain
Context loaded per message4,161 bytes3,551 bytes-15%
First-token latency (local 8B, CPU)~3.2s~2.9s~8% faster
Reasoning loops per session~1.8 (ambiguity from rules)~1.6 (clearer structure)~12% fewer
Output graveyard rate60%~20% better
Example: Local LLM impact. On a Granite 8B running via llama.cpp on an RTX 3070 (8GB VRAM), each redundant 100 bytes in the SOUL.md adds ~80ms to context processing time. Our 610-byte reduction saves ~500ms per message. Over a 40-message session: 20 seconds of cumulative latency saved.

The Architecture: A Good SOUL.md in 12 Sections

After 4 iterations, we landed on this structure. It's not arbitrary — each section serves a specific purpose:

#SectionPurposeTarget length
1IdentityWho the agent is, language, ultimate purpose. Fits in one screen.3-5 lines
2Rules — ImmutableNon-negotiable guardrails. The constitution. Rarely changes.11 items
3AutonomyWhat requires GO vs. auto-executes. Replaces pages of permissions.5 items
4PrioritiesOne line per priority. If it takes >10 words, it's not a priority.4 lines
5PersonalityDirect, opinionated. With a mode switch for creative work.3 points
6Accountability & PushbackThe most important addition. Framework for arguing with bad ideas.2 paragraphs
7StandardsWhat "good work" means. Specific enough to reject bad output.1 paragraph
8Lookup ProtocolWhen to search local vs. external. Prevents hallucinations.1 paragraph
9EscalationWhen to stop and ask. With a safe partial path.1 paragraph
10SecurityHard boundaries. No data exfiltration. Fortress mode.5 bullets
11Self-Improvement7-step daily pipeline: Evaluate → Sandbox → Validate → Prod.7 steps
12End State"Command infrastructure, not extra labor." Single sentence.1 line

Language & Sentence Format: The Hidden Lever

One thing we didn't optimize until iteration 3: how the LLM parses each sentence. The format of your rules directly affects how fast and accurately the agent applies them.

Rule 1: Imperative sentences only

❌ Descriptive

The agent should prioritize safety over speed in all operations.

✅ Imperative

Safety over speed. Destructive actions require GO.

Why: LLMs parse imperatives ~23% faster than descriptive sentences (internal benchmark on Granite 8B). An imperative is a direct instruction; a description requires the agent to infer the intended behavior.

Rule 2: One instruction per line

❌ Compound

Be honest about uncertainty, never fake data, and don't do performative tool calls.

✅ Atomic

Be honest about uncertainty.
Never fake data.
No performative tool calls.

Rule 3: Negative rules are slower than positive actions

"Don't do X" requires the agent to first imagine X, then suppress it. "Do Y instead" is a direct instruction. Prefer positive framing whenever possible.

Example:
❌ "Don't invent facts when you don't know something" → 2.1s avg compliance latency
✅ "Say what you know, what you don't, what would verify it" → 1.6s avg compliance latency
Measured on Granite 8B, 10 runs each, temperature 0.1

Rule 4: Keep sentences under 20 words

Every word beyond 20 reduces compliance accuracy by roughly 2% on local LLMs (based on testing across 40 rules with 3 different 8B models). If a rule needs more words, split it into sub-rules.


The Numbers That Matter

Token Cost Per Session

Before: 4,161 bytes × 30 messages = 124,830 bytes/session
After:  3,551 bytes × 30 messages = 106,530 bytes/session
Savings: 18,300 bytes/session = 14.7%

What the New Structure Fixed

MetricBeforeAfterImpact
Time to first useful outputSlow (intro + checklist)Immediate (goal-directed) faster
Pushback frequencyNever3-4× per sessionSaves bad decisions
Output graveyard rate60%~20% accountability
Context load time4.1KB per msg3.5KB per msg15% lighter

The Free Template

We've packaged our final SOUL.md structure as a free template — 12 sections, 77 lines, Apache 2.0 licensed.

👉 Download the free SOUL.md template on Claw Mart

If this saves you even one iteration cycle, come back for the paid pack with AGENTS.md + MEMORY.md + HEARTBEAT.md + BOOTSTRAP.md templates.


Try It Yourself

Your SOUL.md is the highest-leverage file in your agent setup. Quick audit:

  1. Count the lines. If over 100, you have bloat.
  2. Count how many times your agent pushed back this week. If 0, your pushback rules aren't working.
  3. Check if P5-P10 priorities exist. Cut them.
  4. Read your Standards section. Can it reject bad work?
  5. Read your End State. Does it describe labor or infrastructure?