What 4 rewrites taught us about cutting token waste, speeding up local LLMs, and building an identity file that actually works.
The most expensive word in a SOUL.md is "and"
"Be reliable and efficient and honest and proactive" → zero information density.
Every quality must have a behavioral consequence or it doesn't belong.
When we started, our SOUL.md was 97 lines covering:
The content was correct. But the agent was verbose, slower to load, and — worst of all — it never pushed back on bad ideas. Every message consumed 4,161 bytes of context just from the identity file. Multiply by 30+ messages per session, and you're burning 125KB+ per conversation on boilerplate — before any actual work happens.
Naively comprehensive. Every possible instruction, every edge case, every rule. 97 lines. The agent followed rules but never pushed back on bad ideas. Output was technically correct but useless for decisions — optimized for completeness, not action.
We discovered two concepts that changed everything:
We added both. Immediately the agent became more useful — it started questioning our priorities instead of blindly executing.
Measured actual token burn. Realized we were paying (in context) for priorities we hadn't touched in weeks, verbose formatting that nobody read, and a separate Alerts section that duplicated Escalation.
Savings found: ~700 bytes of unnecessary rules.
Final form: 77 lines, 3,551 bytes.
| Section | Before | After | Saved |
| Priorities (P4-P6) | 15 lines, verbose projects | 1 line "P4+ — Explore" | -14 lines |
| Communication Format | 6 rules | 3 rules | -3 lines |
| Alerts | 3 lines (separate) | Merged into Escalation | -3 lines |
| Personality | 7 lines | 3 lines | -4 lines |
| Security + Self-Improvement | Detailed routines | Compact 7-step + 1 line | -5 lines |
Net result: -20 lines · -15% tokens · +5 new capabilities
Our worst version was the longest. Every word you keep should force a decision or enable an action. Everything else is tax.
We keep 11 immutable rules that never change. Everything else is situational and can be condensed, merged, or cut as the agent evolves. The constitution doesn't need amendments every week.
"Be reliable and efficient and honest and proactive" → zero information density. This is the single most common mistake in SOUL.md files. Every quality must have a behavioral consequence or it doesn't belong.
Be reliable, efficient, honest, proactive, and ethical.
This describes a wish, not a behavior.
Truth over comfort. Flag uncertainty. Cite sources.
This forces a specific action in every situation.
Without an explicit pushback framework, agents optimize for agreement. They will validate bad ideas because "being helpful" means saying yes. The accountability loop — "if I ignore good work, make me notice" — is the single highest-ROI addition we made.
That 15% savings doesn't stay at 15%. It means faster load times, which means more messages per session, which means higher-quality outputs. On local LLMs (e.g., Llama 3 8B, Granite 8B running on consumer GPUs or CPU), the effect is even more pronounced:
| Metric | Before (97 lines) | After (77 lines) | Gain |
| Context loaded per message | 4,161 bytes | 3,551 bytes | -15% |
| First-token latency (local 8B, CPU) | ~3.2s | ~2.9s | ~8% faster |
| Reasoning loops per session | ~1.8 (ambiguity from rules) | ~1.6 (clearer structure) | ~12% fewer |
| Output graveyard rate | 60% | ~20% | 3× better |
After 4 iterations, we landed on this structure. It's not arbitrary — each section serves a specific purpose:
| # | Section | Purpose | Target length |
| 1 | Identity | Who the agent is, language, ultimate purpose. Fits in one screen. | 3-5 lines |
| 2 | Rules — Immutable | Non-negotiable guardrails. The constitution. Rarely changes. | 11 items |
| 3 | Autonomy | What requires GO vs. auto-executes. Replaces pages of permissions. | 5 items |
| 4 | Priorities | One line per priority. If it takes >10 words, it's not a priority. | 4 lines |
| 5 | Personality | Direct, opinionated. With a mode switch for creative work. | 3 points |
| 6 | Accountability & Pushback | The most important addition. Framework for arguing with bad ideas. | 2 paragraphs |
| 7 | Standards | What "good work" means. Specific enough to reject bad output. | 1 paragraph |
| 8 | Lookup Protocol | When to search local vs. external. Prevents hallucinations. | 1 paragraph |
| 9 | Escalation | When to stop and ask. With a safe partial path. | 1 paragraph |
| 10 | Security | Hard boundaries. No data exfiltration. Fortress mode. | 5 bullets |
| 11 | Self-Improvement | 7-step daily pipeline: Evaluate → Sandbox → Validate → Prod. | 7 steps |
| 12 | End State | "Command infrastructure, not extra labor." Single sentence. | 1 line |
One thing we didn't optimize until iteration 3: how the LLM parses each sentence. The format of your rules directly affects how fast and accurately the agent applies them.
The agent should prioritize safety over speed in all operations.
Safety over speed. Destructive actions require GO.
Why: LLMs parse imperatives ~23% faster than descriptive sentences (internal benchmark on Granite 8B). An imperative is a direct instruction; a description requires the agent to infer the intended behavior.
Be honest about uncertainty, never fake data, and don't do performative tool calls.
Be honest about uncertainty.
Never fake data.
No performative tool calls.
"Don't do X" requires the agent to first imagine X, then suppress it. "Do Y instead" is a direct instruction. Prefer positive framing whenever possible.
Every word beyond 20 reduces compliance accuracy by roughly 2% on local LLMs (based on testing across 40 rules with 3 different 8B models). If a rule needs more words, split it into sub-rules.
Before: 4,161 bytes × 30 messages = 124,830 bytes/session
After: 3,551 bytes × 30 messages = 106,530 bytes/session
Savings: 18,300 bytes/session = 14.7%
| Metric | Before | After | Impact |
| Time to first useful output | Slow (intro + checklist) | Immediate (goal-directed) | 2× faster |
| Pushback frequency | Never | 3-4× per session | Saves bad decisions |
| Output graveyard rate | 60% | ~20% | 3× accountability |
| Context load time | 4.1KB per msg | 3.5KB per msg | 15% lighter |
We've packaged our final SOUL.md structure as a free template — 12 sections, 77 lines, Apache 2.0 licensed.
👉 Download the free SOUL.md template on Claw Mart
If this saves you even one iteration cycle, come back for the paid pack with AGENTS.md + MEMORY.md + HEARTBEAT.md + BOOTSTRAP.md templates.
Your SOUL.md is the highest-leverage file in your agent setup. Quick audit: