In building AI systems, we often obsess over model capabilities, prompt engineering, and token efficiency. But there's a hidden cost that compounds silently: context pollution. And most AI assistants today don't let you do much about it.
The Problem: Everything is Always Loaded
Imagine this scenario. You're using Claude Code (or similar assistants) and you want to write a blog post using the write-blog skill. That skill talks to the Blogger API via an MCP (Model Context Protocol). Simple enough.
But here's the catch: Claude Code loads every single skill and every single MCP into your context window by default. This means when you're writing a blog, the write-linkedin-post skill is also in context. And it loads the LinkedIn MCP. Along with tools for email, databases, APIs you might never touch.
Your context window is now polluted with information irrelevant to your current task. These tools sit there, consuming tokens, complicating the model's decision-making, and increasing the likelihood of hallucinations. The model has to process and reason about tools it doesn't need.
The Cost
Let's quantify this. Say each unused skill and MCP adds 500-1000 tokens to your context. You have 20 skills loaded. That's 10,000-20,000 tokens per invocation. Across 100 invocations in a week, you're burning 1-2 million tokens unnecessarily. That's not just wasted money—it's wasted latency, wasted reasoning capacity, and wasted reliability.
And it gets worse. More context = more tokens to process = more opportunities for the model to confuse itself and hallucinate.
Context Engineering: The Discipline
Context engineering is the discipline of being intentional about every bit of context you feed an AI system. It asks: "Does the model actually need this information to do the job?"
Most of the time, the answer is no.
For the write-blog task, you need:
- The
write-blogskill - The Blogger MCP
- Nothing else
The write-linkedin-post skill, the email MCP, the database tools? They're noise. They're distractions. They cost tokens and increase confusion.
Custom Harness
Claude Code is itself a harness—but a monolithic one that loads everything. What we need are custom harnesses that are task-specific and lean.
A custom harness is a lightweight layer that sits between your workflow and the AI system. Instead of loading everything, it loads only what's needed for that specific task.
Think of it like this:
Without a custom harness:
The model has to navigate a crowded context window to figure out what matters.
With a custom harness:
The model gets a clean, focused context. Only the tools that matter. Only the system instructions that apply.
The Flexibility
Here's what makes a custom harness powerful: you control the context per workflow. You're not locked into one system configuration.
When you invoke the write-blog workflow:
- Load:
write-blogskill, Blogger MCP, minimal system context - Don't load:
write-linkedin-post, email MCP, database tools, etc.
When you invoke the write-linkedin-post workflow:
- Load:
write-linkedin-postskill, LinkedIn MCP - Don't load:
write-blog, Blogger MCP, email tools, etc.
Each custom harness is tailored. With focused context, your cost per invocation drops by 50-70%. Your latency improves. Your reliability improves because the model isn't distracted by tools it shouldn't be using.
Context Precision vs Context Pollution
The tension here is real. Monolithic assistants like Claude Code, Cursor, or Codex try to be everything for everyone. They load all tools, hoping you'll use a subset. It's convenient until it's not.
A custom harness trades convenience for precision. You lose the "everything everywhere" model. You gain control. You gain efficiency. You gain clarity.
Here's what improves:
| Aspect | Without Harness | With Harness |
| Tokens per invocation | 10,000-20,000 wasted | Only essential tokens |
| Model confusion | High (too many options) | Low (clear focus) |
| Cost per workflow | Expensive | Optimized |
| Hallucination risk | Higher (noisy context) | Lower (clean context) |
| Latency | Slower (more to process) | Faster (less noise) |
The Conclusion
Building AI systems at scale requires more than just good models—it requires good context engineering. Most AI assistants today don't give you this control. They throw everything at the model and hope for the best.
Custom harnesses give you that control. They let you be intentional about context. They let you optimize per workflow instead of settling for a one-size-fits-all approach.
The result? Faster, cost-effective, more reliable AI systems. And as AI becomes more integrated into engineering workflows, this kind of precision becomes table stakes.
Context engineering isn't a nice-to-have anymore. It's foundational.

Comments
Post a Comment