Context Engineering & AI Harness: The Secret to Building Cost-Effective AI Systems

In building AI systems, we often obsess over model capabilities, prompt engineering, and token efficiency. But there's a hidden cost that compounds silently: context pollution. And most AI assistants today don't let you do much about it.

The Problem: Everything is Always Loaded

Imagine this scenario. You're using Claude Code (or similar assistants) and you want to write a blog post using the write-blog skill. That skill talks to the Blogger API via an MCP (Model Context Protocol). Simple enough.

But here's the catch: Claude Code loads every single skill and every single MCP's referance data like the description of a skill, description of each MCP tool etc., into your context window by default. This means when you're writing a blog, the write-linkedin-post skill is also in context. And it loads the LinkedIn MCP. Along with tools for email, databases, APIs you might never touch. While these meta information are negligible compared to code base or doc base worth tokens, it is also these small drops that make the ocean.

Your context window is now polluted with information irrelevant to your current task. These tools sit there, consuming tokens, complicating the model's decision-making, and increasing the likelihood of hallucinations. The model has to process and reason about tools it doesn't need.

The Cost

Let's quantify this. Say each unused skill reference and MCP adds 50-100 tokens to your context. You have 200 skills/mcp loaded. That's 10,000-20,000 tokens per invocation. Across 100 invocations in a week, you're burning 1-2 million tokens unnecessarily. That's not just wasted money—it's wasted latency, wasted reasoning capacity, and wasted reliability.

And it gets worse. More context = more tokens to process = more opportunities for the model to confuse itself and hallucinate.

Context Engineering: The Discipline

Context engineering is the discipline of being intentional about every bit of context you feed an AI system. It asks: "Does the model actually need this information to do the job?"

Most of the time, the answer is no.

For the write-blog task, you need:

The write-blog skill
The Blogger MCP
Nothing else

The write-linkedin-post skill, the email MCP, the database tools? They're noise. They're distractions. They cost tokens and increase confusion.

Custom Harness

Claude Code is itself a harness—but a monolithic one that loads everything. What we need are custom harnesses that are task-specific and lean.

A custom harness is a lightweight layer that sits between your workflow and the AI system. Instead of loading everything, it loads only what's needed for that specific task.

Think of it like this:

Without a custom harness:

User Request → [All Skills] + [All MCPs] + [System Prompt] → Claude → Response

The model has to navigate a crowded context window to figure out what matters.

With a custom harness:

User Request → Custom Harness (loads only write-blog + Blogger MCP) → Claude → Response

The model gets a clean, focused context. Only the tools that matter. Only the system instructions that apply.

The Flexibility

Here's what makes a custom harness powerful: you control the context per workflow. You're not locked into one system configuration.

When you invoke the write-blog workflow:

Load: write-blog skill, Blogger MCP, minimal system context
Don't load: write-linkedin-post, email MCP, database tools, etc.

When you invoke the write-linkedin-post workflow:

Load: write-linkedin-post skill, LinkedIn MCP
Don't load: write-blog, Blogger MCP, email tools, etc.

Each custom harness is tailored. With focused context, your cost per invocation drops by 50-70%. Your latency improves. Your reliability improves because the model isn't distracted by tools it shouldn't be using.

Context Precision vs Context Pollution

The tension here is real. Monolithic assistants like Claude Code, Cursor, or Codex try to be everything for everyone. They load all tools, hoping you'll use a subset. It's convenient until it's not.

A custom harness trades convenience for precision. You lose the "everything everywhere" model. You gain control. You gain efficiency. You gain clarity.

Here's what improves:

Aspect	Without Harness	With Harness
Tokens per invocation	10,000-20,000 wasted	Only essential tokens
Model confusion	High (too many options)	Low (clear focus)
Cost per workflow	Expensive	Optimized
Hallucination risk	Higher (noisy context)	Lower (clean context)
Latency	Slower (more to process)	Faster (less noise)

The Conclusion

Building AI systems at scale requires more than just good models—it requires good context engineering. Most AI assistants today don't give you this control. They throw everything at the model and hope for the best.

Custom harnesses give you that control. They let you be intentional about context. They let you optimize per workflow instead of settling for a one-size-fits-all approach.

The result? Faster, cost-effective, more reliable AI systems. And as AI becomes more integrated into engineering workflows, this kind of precision becomes table stakes.

Context engineering isn't a nice-to-have anymore. It's foundational.

Experiments at work

Search This Blog