Skip to main content

Building AI Apps - The basics you need to know

So you want to build an AI assistant? Maybe for customer support, maybe to search through documents, or maybe just to automate some boring stuff at work. You keep hearing words like "prompting", "RAG", "fine-tuning" thrown around in meetings. But what do they actually mean and which one do you need?

Let me walk you through building a customer support AI assistant and show you how each of these concepts fits in.

/**
 * @disclaimer
 * This is an educational post based on my learnings.
 * I'll share actual experiments in future posts.
 * Your use case might be different, so take what applies to you.
*/

What even is an LLM?

Before we jump into the techniques, let's start with the basics. What is an LLM?

Think of a Large Language Model (LLM) as someone who has read most of the internet but has zero memory of your conversation once it ends. Models like GPT-4, Claude, or Llama have been trained on massive amounts of text data. They know a lot about general things but nothing specific about your business.

Here's the important bit - LLMs work with text at their core. Yes, modern LLMs like GPT-4 and Claude can now accept images, audio, and other media as input. But here's what happens behind the scenes - they all get converted into tokens (which are just numerical representations of text and content).

The LLM doesn't "see" an image the way you do. It processes it as tokens. Same with audio. Same with everything else. At the end of the day, it's all tokenized data that the model processes as text-like information.

Your entire job when building AI apps is to convert the real world into the right format that the LLM can process. Keep this in mind as we go through the rest of the post.

What is Prompting?

Prompting is basically the instructions you give the LLM to tell it how to behave.

Let's say you're building that customer support assistant. You want it to be friendly, keep answers short, and always ask for an order number before helping. Here's what your prompt might look like:

You are a helpful customer support assistant for an e-commerce company.
Always:
- Be friendly and empathetic
- Keep responses under 3 sentences when possible
- Ask for the order number if the customer mentions an issue

When responding to shipping questions, check the shipping status first.

That's it. You just wrote a prompt. The LLM now knows how to behave.

Why does this matter? Because prompting is the fastest way to shape behavior. No data collection, no model training. You write instructions and it works immediately. Use it to set personality, define workflows, and give context about your business.

What is RAG?

RAG stands for Retrieval Augmented Generation. Fancy name for a simple concept - fetch information from your knowledge base and give it to the LLM.

Here's a scenario. A customer asks "What's your return policy for electronics?"

Your LLM doesn't know your company's return policy. It might make something up or give generic advice. Not good.

So what do you do? RAG to the rescue.

How RAG works:

  1. Customer asks the question
  2. Your system searches through your knowledge base - help docs, policies, FAQs (usually stored in something called a vector database)
  3. It finds the top 3 most relevant documents
  4. These documents get added to the prompt
  5. LLM reads your actual policy and answers correctly

Here's what that looks like in code:

// Conceptual example
const query = "return policy for electronics";
const relevantDocs = await vectorDB.search(query, { topK: 3 });

const prompt = `
Using the following company policies:
${relevantDocs}

Answer the customer's question: ${query}
`;

Why is RAG useful? Because your knowledge base changes all the time. New products launch, policies update, FAQs change. With RAG, you just update the documents. No need to retrain any model.

But there's a catch. Someone has to actually update those documents. If your return policy changed yesterday and nobody updated the knowledge base, the LLM will give outdated info.

What is Fine-tuning?

Fine-tuning means retraining the LLM on your specific data to teach it your patterns, tone, or domain knowledge.

Back to our customer support assistant. Let's say your company uses very specific jargon like "SKU-level inventory check" or "tier-2 escalation protocol". Even with good prompts, the LLM sounds generic or doesn't understand these terms properly.

What you do is collect 1,000 examples of real support conversations - questions and ideal answers - and fine-tune a model on them:

// Training data format
const trainingData = [
  {
    prompt: "Customer: My order says 'processing' for 5 days",
    completion: "I understand the concern. Let me check your order status. Can you provide your order number? I'll perform a SKU-level inventory check to see if there's a warehouse delay."
  },
  // ... 1000 more examples
];

After fine-tuning, your model naturally speaks in your company's language and follows your support patterns. No need for super long prompts.

Why fine-tune? Because it bakes knowledge right into the model. Great for teaching tone, style, or domain-specific stuff that's hard to explain in a prompt.

But here's the problem. Fine-tuning is expensive, takes time, and needs to be re-done when things change. If your return policy updates tomorrow, the fine-tuned model won't know about it. You'd have to retrain.

When should you fine-tune? Only when you have lots of training data and need consistent behavior that you can't get with just prompting.

What are MCP Servers?

MCP stands for Model Context Protocol. MCP Servers let your LLM fetch live data from external systems like databases, APIs, or internal tools right when it needs them.

Here's a new scenario. Customer asks "Where's my order?"

RAG won't help here. Your knowledge base doesn't have real-time tracking info. Fine-tuning won't help either.

You need MCP.

How MCP works:

  1. LLM recognizes it needs live data
  2. It calls your MCP server: get_order_status(order_id="12345")
  3. MCP server queries your database or API
  4. Returns: {"status": "Out for delivery", "eta": "Today by 8 PM"}
  5. LLM responds: "Great news! Your order is out for delivery and should arrive by 8 PM today."

Here's what that looks like:

// MCP Server example (conceptual)
mcpServer.tool("get_order_status", async (orderId) => {
    // Query live database with parameterized query
    const order = await db.query(
        `SELECT * FROM orders WHERE id = ${orderId}`,
        [orderId]
    );

    return {
        status: order.status,
        eta: order.estimated_delivery
    };
});

Why does MCP matter? Because unlike RAG (which uses static documents) or fine-tuning (which bakes in old knowledge), MCP gives your LLM access to real-time data. Current inventory levels, user account info, live API data - anything that changes frequently.

But wait, there's more. MCP isn't just about reading data. It can also do actions. Your LLM can:

  • Create a support ticket in your system
  • Send an email to the customer
  • Update order status in the database
  • Trigger a refund process
  • Schedule a callback

It's reactive. The LLM can not only fetch information but also make things happen in your systems.

Let me put it this way:

  • RAG: "Here's what we wrote about returns last month"
  • Fine-tuning: "We've trained you on thousands of past conversations"
  • MCP: "Go check the database right now for this customer's order, and if they're eligible, issue a refund"

See the difference?

What ties all of this together?

Remember what I said at the start? LLMs work with text at their core. Everything eventually becomes text (or tokens).

Let's see how this applies to what we learned:

  • Prompting: Text instructions you write
  • RAG: Text documents you inject as context
  • Fine-tuning: Text examples you use for training
  • MCP: Real-time data gets converted to text/JSON before the LLM processes it

Your entire job when building AI apps is converting the real world into the right format. Database results? Format as structured text. Images? Modern LLMs can take them directly, but they get tokenized. Code outputs? Capture as text. API responses? JSON that becomes tokens.

The clearer and more structured your input is, the better your AI assistant works. Simple as that.

So which one do you need?

Let me give you a quick guide for the customer support assistant we've been talking about:

What you need to do Use this
Set personality and basic rules Prompting
Answer questions from company policies/FAQs RAG
Teach company-specific tone and jargon Fine-tuning
Check order status, inventory, live user data MCP Servers

Here's the thing though. Most real-world AI apps don't use just one. They use all of them together:

  1. Start with prompting to define how your assistant behaves
  2. Add RAG when you need to answer from knowledge base
  3. Use MCP for anything that needs real-time data
  4. Fine-tune only if you have lots of data and really need it

Wrapping up

So now you know what's out there in the AI app building world. Prompting, RAG, Fine-tuning, and MCP Servers. Each has its place.

The key takeaway? LLMs process everything as tokens. Whether it's text, images, or real-time data, your job is to get it into the right format. Do that well and your AI assistant will work well.

Stay tuned for future posts!

Comments

Popular posts from this blog

Confluence: 5 quick things that you need

As part of my work experiments, this week I would like to write down the things that one needs to know in confluence that can up-skill their documentation works. I will cover the following 5 things, How to Anchor link a title? How to Anchor link to a section? How to create a dashing dashboard? Panel - Confluence Macro Layouts - Confluence Tools Content by Label - Confluence Macro 1. How to Anchor link a title? This is the most required thing. Most useful when one has to refer to a section internally on the same confluence page. Let's consider you have a page with three different sections and titles as shown below, In this, if you want to add an internal anchor from a text in paragraph 3 to a title in paragraph 1, you can add it as follows, Choose the word that needs Anchor Click on the link icon from the Toolbar above In the link box, enter #Page Title 1 Click Insert That is it. Your anchor from the selected text to Page Title 1 is ready. This can be tested out in the preview itsel...

npm-link | What NPM won't tell you!

Hello readers. So back with another easy yet unexplored feature of npm/yarn packages. We as frontend developers / SDK developers, have to deal with more than one repositories where we actually code contribute. Most SDK developers would know this already or they use the not-so-well documented 'npm link' command . /**  *  @disclaimer  * Please read this post fully before executing any command. My scenario might not be the same as yours.  * This article below uses a repo from my current workplace as an example which is open-source and does not violate the Cisco Confidentiality Policies */ To make this article easier to understand, some representations, Host - Package that needs another local package as part of its node modules. Let's assume the path to this package is  ~/Documents/Repos/demo-app Adhoc - Local package that is added into another package as a dependency. Let's assume the path to this package is  ~/Documents/Repos/semver-monorepo What is...

Git magic - Squash commits

Back with another git magic. When it comes to merging a pull request on Github, there are two options Rebase and Merge Squash and Merge What is Rebase and Merge? When one chooses Rebase and Merge, all the commits in the PR are added to the target branch. For example, if the PR has 5 commits, all of those commits will be visible in the PR history of the target branch. What is Squash and Merge? When a PR is merged by choosing Squash and Merge, all the commits in the PR are combined into one PR and then added to the target branch. Once again, if the PR has 5 commits or any number of commits, they are combined and added to the target branch. Therefore, this is what Squash means. Combining 'n' different commits into one single commit is called squashing. In this blog post, we will go through the commands that can squash commits.  Advantages of Squashing commits No more redundant commits In a pull request, one may have 'n' different commits for one change. They might have bee...