Version: v0.5.0

Tool Calling

KAITO supports tool calling, allowing you to integrate external tools into your inference service. This feature enables the model to call APIs or execute functions based on the input it receives, enhancing its capabilities beyond text generation.

Requirements

Supported Inference Runtimes

Currently, tool calling is only supported with the vLLM inference runtime.

Supported Models & Chat Templates

Proper chat templates are required for tool calling. The following models and their corresponding chat templates are supported:

Model Family	Chat Templates	Tool Parser in vLLM
Phi 4	tool-chat-phi4-mini.jinja	`phi4_mini_json`
Llama 3	tool-chat-llama3.1-json.jinja	`llama3_json`
Mistral	tool-chat-mistral.jinja	`mistral`

Inference Configurations

Create the following ConfigMap before deploying KAITO workspace, with tool-call-parser and chat-template set to the appropriate values for your model:

apiVersion: v1
kind: ConfigMap
metadata:
  name: tool-calling-inference-config
data:
  inference_config.yaml: |
    # Maximum number of steps to find the max available seq len fitting in the GPU memory.
    max_probe_steps: 6

    vllm:
      cpu-offload-gb: 0
      swap-space: 4
      tool-call-parser: "phi4_mini_json" | "llama3_json" | "mistral"
      chat-template: "/workspace/chat_templates/<chat_template_name>.jinja"

In the Workspace configuration, set .inference.config to the name of the ConfigMap you created. For example:

apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
  name: workspace-phi-4-mini-tool-call
resource:
  instanceType: "Standard_NC24ads_A100_v4"
  labelSelector:
    matchLabels:
      apps: phi-4
inference:
  preset:
    name: phi-4-mini-instruct
  config: tool-calling-inference-config

For more details on the inference configuration, refer to vLLM tool calling documentation.

Examples

Port-forward the inference service to your local machine:

kubectl port-forward svc/workspace-phi-4-mini-tool-call 8000

Named Function Calling

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

def get_weather(location: str, unit: str):
    return f"Getting the weather for {location} in {unit}..."
tool_functions = {"get_weather": get_weather}

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
    tools=tools,
    tool_choice="auto"
)

tool_call = response.choices[0].message.tool_calls[0].function
print(f"Function called: {tool_call.name}")
print(f"Arguments: {tool_call.arguments}")
print(f"Result: {tool_functions[tool_call.name](**json.loads(tool_call.arguments))}")

Expected output:

Function called: get_weather
Arguments: {"location": "San Francisco, CA", "unit": "fahrenheit"}
Result: Getting the weather for San Francisco, CA in fahrenheit...

Model Context Protocol (MCP)

With the right client framework, inference workload provisioned by KAITO can also call external tools using the Model Context Protocol (MCP). This allows the model to integrate and share data with external tools, systems, and data sources.

In the following example, we will use uv to create a Python virtual environment and install the necessary dependencies for AutoGen to call the DeepWiki MCP service and ask questions about the KAITO project.

mkdir kaito-mcp
cd kaito-mcp
# Create and activate a virtual environment
uv venv create && source .venv/bin/activate
# Install dependencies
uv pip install "autogen-ext[openai]" "autogen-agentchat" "autogen-ext[mcp]"

Create a Python script test.py with the following content:

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_core import CancellationToken
from autogen_core.models import ModelFamily, ModelInfo
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.tools.mcp import (StreamableHttpMcpToolAdapter,
                                   StreamableHttpServerParams)
from openai import OpenAI


async def main() -> None:
    # Create server params for the remote MCP service
    server_params = StreamableHttpServerParams(
        url="https://mcp.deepwiki.com/mcp",
        timeout=30.0,
        terminate_on_close=True,
    )

    # Get the ask_question tool from the server
    adapter = await StreamableHttpMcpToolAdapter.from_server_params(server_params, "ask_question")

    model = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy").models.list().data[0].id
    model_info: ModelInfo = {
        "vision": False,
        "function_calling": True,
        "json_output": True,
        "family": ModelFamily.UNKNOWN,
        "structured_output": True,
        "multiple_system_messages": True,
    }

    # Create an agent that can use the ask_question tool
    model_client = OpenAIChatCompletionClient(base_url="http://localhost:8000/v1", api_key="dummy", model=model, model_info=model_info)
    agent = AssistantAgent(
        name="deepwiki",
        model_client=model_client,
        tools=[adapter],
        system_message="You are a helpful assistant.",
    )

    await Console(
        agent.run_stream(task="In the GitHub repository 'kaito-project/kaito', how many preset models are there?", cancellation_token=CancellationToken())
    )


if __name__ == "__main__":
    asyncio.run(main())

To run the script, execute the following command in your terminal:

uv run test.py

Expected output:

---------- TextMessage (user) ----------
In the GitHub repository 'kaito-project/kaito', how many preset models are there?
---------- ToolCallRequestEvent (deepwiki) ----------
[FunctionCall(id='chatcmpl-tool-4e22b15c32d34430b80078a3acc41f0d', arguments='{"repoName": "kaito-project/kaito", "question": "How many preset models are there?"}', name='ask_question')]
Unknown SSE event: ping
---------- ToolCallExecutionEvent (deepwiki) ----------
[FunctionExecutionResult(content='[{"type": "text", "text": "There are 16 preset models in the Kaito project.  These models are defined in the `supported_models.yaml` file  and registered programmatically within the codebase. ...", "annotations": null, "meta": null}]', name='ask_question', call_id='chatcmpl-tool-4e22b15c32d34430b80078a3acc41f0d', is_error=False)]
---------- ToolCallSummaryMessage (deepwiki) ----------
[{"type": "text", "text": "There are 16 preset models in the Kaito project.  These models are defined in the `supported_models.yaml` file  and registered programmatically within the codebase. ...", "annotations": null, "meta": null}]

Requirements​

Supported Inference Runtimes​

Supported Models & Chat Templates​

Inference Configurations​

Examples​

Named Function Calling​

Model Context Protocol (MCP)​