So you want to build an AI assistant? Maybe for customer support, maybe to search through documents, or maybe just to automate some boring stuff at work. You keep hearing words like "prompting", "RAG", "fine-tuning" thrown around in meetings. But what do they actually mean and which one do you need?
Let me walk you through building a customer support AI assistant and show you how each of these concepts fits in.
/**
* @disclaimer
* This is an educational post based on my learnings.
* I'll share actual experiments in future posts.
* Your use case might be different, so take what applies to you.
*/
What even is an LLM?
Before we jump into the techniques, let's start with the basics. What is an LLM?
Think of a Large Language Model (LLM) as someone who has read most of the internet but has zero memory of your conversation once it ends. Models like GPT-4, Claude, or Llama have been trained on massive amounts of text data. They know a lot about general things but nothing specific about your business.
Here's the important bit - LLMs work with text at their core. Yes, modern LLMs like GPT-4 and Claude can now accept images, audio, and other media as input. But here's what happens behind the scenes - they all get converted into tokens (which are just numerical representations of text and content).
The LLM doesn't "see" an image the way you do. It processes it as tokens. Same with audio. Same with everything else. At the end of the day, it's all tokenized data that the model processes as text-like information.
Your entire job when building AI apps is to convert the real world into the right format that the LLM can process. Keep this in mind as we go through the rest of the post.
What is Prompting?
Prompting is basically the instructions you give the LLM to tell it how to behave.
Let's say you're building that customer support assistant. You want it to be friendly, keep answers short, and always ask for an order number before helping. Here's what your prompt might look like:
You are a helpful customer support assistant for an e-commerce company.
Always:
- Be friendly and empathetic
- Keep responses under 3 sentences when possible
- Ask for the order number if the customer mentions an issue
When responding to shipping questions, check the shipping status first.
That's it. You just wrote a prompt. The LLM now knows how to behave.
Why does this matter? Because prompting is the fastest way to shape behavior. No data collection, no model training. You write instructions and it works immediately. Use it to set personality, define workflows, and give context about your business.
What is RAG?
RAG stands for Retrieval Augmented Generation. Fancy name for a simple concept - fetch information from your knowledge base and give it to the LLM.
Here's a scenario. A customer asks "What's your return policy for electronics?"
Your LLM doesn't know your company's return policy. It might make something up or give generic advice. Not good.
So what do you do? RAG to the rescue.
How RAG works:
- Customer asks the question
- Your system searches through your knowledge base - help docs, policies, FAQs (usually stored in something called a vector database)
- It finds the top 3 most relevant documents
- These documents get added to the prompt
- LLM reads your actual policy and answers correctly
Here's what that looks like in code:
// Conceptual example const query = "return policy for electronics"; const relevantDocs = await vectorDB.search(query, { topK: 3 }); const prompt = ` Using the following company policies: ${relevantDocs} Answer the customer's question: ${query} `;
Why is RAG useful? Because your knowledge base changes all the time. New products launch, policies update, FAQs change. With RAG, you just update the documents. No need to retrain any model.
But there's a catch. Someone has to actually update those documents. If your return policy changed yesterday and nobody updated the knowledge base, the LLM will give outdated info.
What is Fine-tuning?
Fine-tuning means retraining the LLM on your specific data to teach it your patterns, tone, or domain knowledge.
Back to our customer support assistant. Let's say your company uses very specific jargon like "SKU-level inventory check" or "tier-2 escalation protocol". Even with good prompts, the LLM sounds generic or doesn't understand these terms properly.
What you do is collect 1,000 examples of real support conversations - questions and ideal answers - and fine-tune a model on them:
// Training data format const trainingData = [ { prompt: "Customer: My order says 'processing' for 5 days", completion: "I understand the concern. Let me check your order status. Can you provide your order number? I'll perform a SKU-level inventory check to see if there's a warehouse delay." }, // ... 1000 more examples ];
After fine-tuning, your model naturally speaks in your company's language and follows your support patterns. No need for super long prompts.
Why fine-tune? Because it bakes knowledge right into the model. Great for teaching tone, style, or domain-specific stuff that's hard to explain in a prompt.
But here's the problem. Fine-tuning is expensive, takes time, and needs to be re-done when things change. If your return policy updates tomorrow, the fine-tuned model won't know about it. You'd have to retrain.
When should you fine-tune? Only when you have lots of training data and need consistent behavior that you can't get with just prompting.
What are MCP Servers?
MCP stands for Model Context Protocol. MCP Servers let your LLM fetch live data from external systems like databases, APIs, or internal tools right when it needs them.
Here's a new scenario. Customer asks "Where's my order?"
RAG won't help here. Your knowledge base doesn't have real-time tracking info. Fine-tuning won't help either.
You need MCP.
How MCP works:
- LLM recognizes it needs live data
- It calls your MCP server: get_order_status(order_id="12345")
- MCP server queries your database or API
- Returns: {"status": "Out for delivery", "eta": "Today by 8 PM"}
- LLM responds: "Great news! Your order is out for delivery and should arrive by 8 PM today."
Here's what that looks like:
// MCP Server example (conceptual) mcpServer.tool("get_order_status", async (orderId) => { // Query live database with parameterized query const order = await db.query( `SELECT * FROM orders WHERE id = ${orderId}`, [orderId] ); return { status: order.status, eta: order.estimated_delivery }; });
Why does MCP matter? Because unlike RAG (which uses static documents) or fine-tuning (which bakes in old knowledge), MCP gives your LLM access to real-time data. Current inventory levels, user account info, live API data - anything that changes frequently.
But wait, there's more. MCP isn't just about reading data. It can also do actions. Your LLM can:
- Create a support ticket in your system
- Send an email to the customer
- Update order status in the database
- Trigger a refund process
- Schedule a callback
It's reactive. The LLM can not only fetch information but also make things happen in your systems.
Let me put it this way:
- RAG: "Here's what we wrote about returns last month"
- Fine-tuning: "We've trained you on thousands of past conversations"
- MCP: "Go check the database right now for this customer's order, and if they're eligible, issue a refund"
See the difference?
What ties all of this together?
Remember what I said at the start? LLMs work with text at their core. Everything eventually becomes text (or tokens).
Let's see how this applies to what we learned:
- Prompting: Text instructions you write
- RAG: Text documents you inject as context
- Fine-tuning: Text examples you use for training
- MCP: Real-time data gets converted to text/JSON before the LLM processes it
Your entire job when building AI apps is converting the real world into the right format. Database results? Format as structured text. Images? Modern LLMs can take them directly, but they get tokenized. Code outputs? Capture as text. API responses? JSON that becomes tokens.
The clearer and more structured your input is, the better your AI assistant works. Simple as that.
So which one do you need?
Let me give you a quick guide for the customer support assistant we've been talking about:
| What you need to do | Use this |
|---|---|
| Set personality and basic rules | Prompting |
| Answer questions from company policies/FAQs | RAG |
| Teach company-specific tone and jargon | Fine-tuning |
| Check order status, inventory, live user data | MCP Servers |
Here's the thing though. Most real-world AI apps don't use just one. They use all of them together:
- Start with prompting to define how your assistant behaves
- Add RAG when you need to answer from knowledge base
- Use MCP for anything that needs real-time data
- Fine-tune only if you have lots of data and really need it
Wrapping up
So now you know what's out there in the AI app building world. Prompting, RAG, Fine-tuning, and MCP Servers. Each has its place.
The key takeaway? LLMs process everything as tokens. Whether it's text, images, or real-time data, your job is to get it into the right format. Do that well and your AI assistant will work well.
Stay tuned for future posts!

Comments
Post a Comment