Key AI Concepts (2)#

In this chapter, we cover key concepts you’ll frequently encounter when turning a pre-trained LLM into a real product: fine-tuning, RAG, function calling, and MCP.


Fine-tuning#

Fine-tuning is the process of further training a pre-trained model to better fit a specific purpose. Thanks to open-source culture, training datasets and pre-trained models are often publicly available, and you can fine-tune them to improve performance or specialize in a particular domain.

For example, if you build an AI chatbot for an e-commerce store selling clothing and use a public model as-is, it may handle general conversation but fail to answer product or policy questions because it doesn’t know your internal information. In such cases, fine-tuning can be one possible option.

There are approaches that retrain the entire model (full fine-tuning) and more efficient approaches that update only a subset of parameters (parameter-efficient fine-tuning, PEFT). The former is expensive and can push the model in unintended directions, so in practice, PEFT-style methods are widely studied and used.


RAG (Retrieval-Augmented Generation)#

RAG is a method where, before generating an answer, the LLM first retrieves relevant external materials and attaches them as reference context, then generates the response. (Here, a prompt is the question/instruction given to the AI, and context is background information or documents provided for the AI to reference.) In other words, RAG lets the model consult external documents (knowledge), improving freshness, accuracy, and the ability to cite evidence.

As ChatGPT became widely adopted, a commonly discussed limitation was that LLMs struggle to guarantee accurate, well-grounded answers for up-to-date information outside their training window or for internal organizational/domain knowledge. Services built on pre-trained models may produce shaky answers or fail to provide sources when asked about data the model never learned. RAG emerged as one of the representative approaches to address this gap.

To summarize, there are two major ways to improve a model’s output for content beyond the pre-trained scope: fine-tuning and RAG. Fine-tuning updates the model itself, while RAG keeps the model unchanged and instead feeds it retrieved documents as context.

Using an analogy: fine-tuning is like giving a customer support agent additional training, while RAG is like giving the agent a laptop with internet search. RAG can add latency because retrieval and context construction happen before generation, but it can incorporate fresh information and domain knowledge without the cost and risk of retraining the model. Since each approach has trade-offs, you should choose based on the situation and purpose—and they are not mutually exclusive, so combining both is also possible.


Function Calling#

If RAG is “find and attach reference materials,” function calling (also called tool calling in some contexts) is “actually execute the necessary work and bring back the result.” The LLM decides to call predefined tools (functions) during response generation, receives their outputs, and then produces the final answer. (Here, “tools” can be API calls, database queries, calculation functions, or internal system actions wired up by the developer.)

When to use RAG vs. Function Calling#

  • RAG: when you need to answer based on documents/knowledge, such as policies, rules, or explanations
  • Function Calling: when you need retrieval/calculation/execution, such as looking up current values or performing an action

In one sentence:

  • RAG = answer by reading (reference documents)
  • Function Calling = answer by executing (call systems)

Example 1) Monthly sales summary (work/business)#

If the user asks, “Summarize this month’s sales,” the LLM can:

  • Call a DB/BI query tool → fetch this month’s sales data
  • Call a calculation/aggregation tool → compute totals/averages/changes
  • Summarize the result in natural language → output a readable report

Example 2) Everyday/work-friendly examples#

  • Scheduling: “Book a meeting tomorrow at 3pm” → call a calendar tool → confirm and respond
  • Customer support: “What’s the refund policy?” → (RAG) retrieve and quote policy docs → summarize
    “Process the refund too” → (function calling) call order/payment system → report the result
  • Real-time info: “What’s the weather in Seoul today?” → call a weather API → summarize current/forecast
  • Document writing: “Fill this contract draft using our template” → call a templating/document tool → generate draft and ask for review

Why it matters#

Function calling reduces cases where a model answers “plausibly” without real grounding, because responses can be built from actual system data or external API results. Combined with RAG, you can use documents to support policies/manuals/knowledge (RAG) and use tools to handle real-time lookups, calculations, and operations (function calling), enabling more practical AI applications.


MCP (Model Context Protocol)#

To use function calling broadly in production, you ultimately face the question: “How do we connect tools in a consistent way?” MCP is a standard protocol designed to solve that problem, providing a consistent specification for connecting LLMs to tools and data sources.

An analogy: in the past, device ports were all different and required many adapters—but once USB became a standard, the peripheral ecosystem exploded. Similarly, MCP standardizes how LLMs connect to external capabilities, making it easier to build “connectable tools” and attach them quickly.

In practice, APIs for services like ChatGPT and Gemini have supported function calling for a long time. However, the “protocol” details—how to define functions and how to feed results back—varied across providers, and function calling wasn’t widely spotlighted for long, so adoption wasn’t as fast as it is now.

With MCP, tool-connection patterns are becoming standardized and an ecosystem of connectable tools is forming more quickly. As a result, function calling is evolving from a vendor-specific feature into a baseline interface for attaching services/systems to an LLM. And as MCP makes it easier to attach more and richer tools, LLM-powered products can go beyond simple Q&A to handle lookups, calculations, operations, and automation, significantly improving what the overall service can do.

A (simplified) function-calling flow#

  • (User) “What’s the weather today? (Available functions: getWeather(), getStock().)”
  • (LLM) “I need the result of getWeather().”
  • (System) getWeather() result: “Sunny, 10% chance of rain, 27°C”
  • (LLM) “Today will be sunny, with a 10% chance of rain, and a temperature of 27°C.”

The key point is that function calling enables communication with external systems (lookup/calculation/processing) that RAG alone cannot fully solve.

How does MCP relate to Function Calling?#

  • Function calling: the action of the LLM deciding to “call a tool (function)”
  • MCP: the standard for “how those tools are connected and exposed”

If function calling is “making a phone call,” MCP is closer to “the phone network standard.”

MCP Client / MCP Server#

To connect an LLM to the outside world via MCP, you typically need an MCP client and an MCP server.

  • MCP client: receives the user request, packages it with the connected MCP servers’ capability specs so the LLM can decide what to do, executes the selected tool calls, and returns results back to the LLM
    • Examples: tools like Claude and Cursor can embed an MCP client
  • MCP server: exposes specific external capabilities (e.g., document search/edit, data queries, operational workflows) via the MCP protocol
    • These can be provided officially by a service, or implemented and deployed by individuals/teams

In many cases, users work inside an app that includes an MCP client, then discover and connect MCP servers as needed. For example, one approach is browsing MCP servers from places like https://smithery.ai/ and connecting the ones that match your goals.

Example flow (conceptual)#

For example, if a user asks the Claude app (an LLM client app with an embedded MCP client) to “connect a Notion-related MCP server and summarize a specific Notion document,” the flow might look like this:

  • (User) “Find chatbot-related tasks assigned to me in Notion and summarize them”
  • (App/client) packages the request so the LLM can decide, including the connected MCP servers’ capability specs
  • (LLM) “I need to call the Notion MCP server’s document discovery/read capability.”
  • (Client) calls that capability on the MCP server and retrieves the result
  • (LLM) writes a summary based on the result
  • (App) outputs the final answer to the user

In short, you can quickly combine existing MCP clients/servers to use capabilities, implement an MCP client in your own backend to attach external tools, or implement an MCP server to expose your own data/capabilities to others.

© 2026 Ted Kim. All Rights Reserved.