Skip to main content

Context Engineering & AI Harness: The Secret to Building Cost-Effective AI Systems

In building AI systems, we often obsess over model capabilities, prompt engineering, and token efficiency. But there's a hidden cost that compounds silently: context pollution. And most AI assistants today don't let you do much about it.

The Problem: Everything is Always Loaded

Imagine this scenario. You're using Claude Code (or similar assistants) and you want to write a blog post using the write-blog skill. That skill talks to the Blogger API via an MCP (Model Context Protocol). Simple enough.

But here's the catch: Claude Code loads every single skill and every single MCP into your context window by default. This means when you're writing a blog, the write-linkedin-post skill is also in context. And it loads the LinkedIn MCP. Along with tools for email, databases, APIs you might never touch.

Your context window is now polluted with information irrelevant to your current task. These tools sit there, consuming tokens, complicating the model's decision-making, and increasing the likelihood of hallucinations. The model has to process and reason about tools it doesn't need.

The Cost

Let's quantify this. Say each unused skill and MCP adds 500-1000 tokens to your context. You have 20 skills loaded. That's 10,000-20,000 tokens per invocation. Across 100 invocations in a week, you're burning 1-2 million tokens unnecessarily. That's not just wasted money—it's wasted latency, wasted reasoning capacity, and wasted reliability.

And it gets worse. More context = more tokens to process = more opportunities for the model to confuse itself and hallucinate.

Context Engineering: The Discipline

Context engineering is the discipline of being intentional about every bit of context you feed an AI system. It asks: "Does the model actually need this information to do the job?"

Most of the time, the answer is no.

For the write-blog task, you need:

  • The write-blog skill
  • The Blogger MCP
  • Nothing else

The write-linkedin-post skill, the email MCP, the database tools? They're noise. They're distractions. They cost tokens and increase confusion.

Custom Harness

Claude Code is itself a harness—but a monolithic one that loads everything. What we need are custom harnesses that are task-specific and lean.

A custom harness is a lightweight layer that sits between your workflow and the AI system. Instead of loading everything, it loads only what's needed for that specific task.

Think of it like this:

Without a custom harness:

User Request → [All Skills] + [All MCPs] + [System Prompt] → Claude → Response

The model has to navigate a crowded context window to figure out what matters.

With a custom harness:

User Request → Custom Harness (loads only write-blog + Blogger MCP) → Claude → Response

The model gets a clean, focused context. Only the tools that matter. Only the system instructions that apply.

The Flexibility

Here's what makes a custom harness powerful: you control the context per workflow. You're not locked into one system configuration.

When you invoke the write-blog workflow:

  • Load: write-blog skill, Blogger MCP, minimal system context
  • Don't load: write-linkedin-post, email MCP, database tools, etc.

When you invoke the write-linkedin-post workflow:

  • Load: write-linkedin-post skill, LinkedIn MCP
  • Don't load: write-blog, Blogger MCP, email tools, etc.

Each custom harness is tailored. With focused context, your cost per invocation drops by 50-70%. Your latency improves. Your reliability improves because the model isn't distracted by tools it shouldn't be using.

Context Precision vs Context Pollution

The tension here is real. Monolithic assistants like Claude Code, Cursor, or Codex try to be everything for everyone. They load all tools, hoping you'll use a subset. It's convenient until it's not.

A custom harness trades convenience for precision. You lose the "everything everywhere" model. You gain control. You gain efficiency. You gain clarity.

Here's what improves:

Aspect Without Harness With Harness
Tokens per invocation 10,000-20,000 wasted Only essential tokens
Model confusion High (too many options) Low (clear focus)
Cost per workflow Expensive Optimized
Hallucination risk Higher (noisy context) Lower (clean context)
Latency Slower (more to process) Faster (less noise)

The Conclusion

Building AI systems at scale requires more than just good models—it requires good context engineering. Most AI assistants today don't give you this control. They throw everything at the model and hope for the best.

Custom harnesses give you that control. They let you be intentional about context. They let you optimize per workflow instead of settling for a one-size-fits-all approach.

The result? Faster, cost-effective, more reliable AI systems. And as AI becomes more integrated into engineering workflows, this kind of precision becomes table stakes.

Context engineering isn't a nice-to-have anymore. It's foundational.

Comments

Popular posts from this blog

Confluence: 5 quick things that you need

As part of my work experiments, this week I would like to write down the things that one needs to know in confluence that can up-skill their documentation works. I will cover the following 5 things, How to Anchor link a title? How to Anchor link to a section? How to create a dashing dashboard? Panel - Confluence Macro Layouts - Confluence Tools Content by Label - Confluence Macro 1. How to Anchor link a title? This is the most required thing. Most useful when one has to refer to a section internally on the same confluence page. Let's consider you have a page with three different sections and titles as shown below, In this, if you want to add an internal anchor from a text in paragraph 3 to a title in paragraph 1, you can add it as follows, Choose the word that needs Anchor Click on the link icon from the Toolbar above In the link box, enter #Page Title 1 Click Insert That is it. Your anchor from the selected text to Page Title 1 is ready. This can be tested out in the preview itsel...

npm-link | What NPM won't tell you!

Hello readers. So back with another easy yet unexplored feature of npm/yarn packages. We as frontend developers / SDK developers, have to deal with more than one repositories where we actually code contribute. Most SDK developers would know this already or they use the not-so-well documented 'npm link' command . /**  *  @disclaimer  * Please read this post fully before executing any command. My scenario might not be the same as yours.  * This article below uses a repo from my current workplace as an example which is open-source and does not violate the Cisco Confidentiality Policies */ To make this article easier to understand, some representations, Host - Package that needs another local package as part of its node modules. Let's assume the path to this package is  ~/Documents/Repos/demo-app Adhoc - Local package that is added into another package as a dependency. Let's assume the path to this package is  ~/Documents/Repos/semver-monorepo What is...

Git magic - Squash commits

Back with another git magic. When it comes to merging a pull request on Github, there are two options Rebase and Merge Squash and Merge What is Rebase and Merge? When one chooses Rebase and Merge, all the commits in the PR are added to the target branch. For example, if the PR has 5 commits, all of those commits will be visible in the PR history of the target branch. What is Squash and Merge? When a PR is merged by choosing Squash and Merge, all the commits in the PR are combined into one PR and then added to the target branch. Once again, if the PR has 5 commits or any number of commits, they are combined and added to the target branch. Therefore, this is what Squash means. Combining 'n' different commits into one single commit is called squashing. In this blog post, we will go through the commands that can squash commits.  Advantages of Squashing commits No more redundant commits In a pull request, one may have 'n' different commits for one change. They might have bee...