Skip to main content

Quick Start

After installing the Headlamp-KAITO plugin, you can quickly deploy and interact with AI models through Headlamp's intuitive interface.

Prerequisites

Before starting, ensure you have:

  • Headlamp installed with the Headlamp-KAITO plugin enabled
  • A Kubernetes cluster with the KAITO controller deployed
  • Sufficient GPU resources (Standard_NC24ads_A100_v4 or Standard_NC96ads_A100_v4 instances recommended)

Demo

Step 1: Explore the Model Catalog

Model Catalog Features

Model Catalog

The model catalog provides a list of KAITO Preset models with filtering and search capabilities:

FeatureDescription
SearchFilter models by name
Category FilterFilter by company (Meta, Microsoft, etc.)

Model Search

Step 2: Deploy Your First Model

Deploying a Model

Model Selection

  1. Select a model from the catalog based on your requirements
  2. Click "Deploy" to open the YAML editor dialog
  1. Review the generated Workspace YAML which includes:

    • instanceType (automatically selected based on model size)
    • preset.name (the model identifier)
    • presetOptions (for models requiring access tokens)

Model Yaml

  1. Modify the YAML if needed (namespace, resource requests, etc.)

    • Note: You need to edit the YAML in order to deploy it, even if that means just removing a space at the end!
  2. Click "Apply" to deploy the Workspace resource to Kubernetes. Wait a few minutes and there will be a visual indicator if the workspace was successfully created.

    Model Deploy Success

Check Workspace Status

Navigate to the Kaito Workspaces via the left sidebar. The workspace list displays critical status information:

ColumnDescription
Resource ReadyGPU nodes provisioned
Inference ReadyModel pods running
Job StartedDeployment job active
Workspace SucceededOverall success status

Workspaces

Workspaces Detail Features

Click into any of your workspaces to see the following features:

FeatureDescription
Workspace DetailsView Workspace name, Creation details, Annotations
ResourcesView Count, Instance Type, Preferred Nodes, Node Selector
InferenceView Preset Name and Image, Config details, Adapters
StatusSee real-time deployment and health status
ConditionsView all status conditions and their messages for troubleshooting
EventsView recent events and logs for each workspace

Workspace Details

Step 3: Chat with Deployed Model

Once your workspace shows "Inference Ready", you can interact with the model through the chat interface.

Starting a Chat Session

Starting a Chat Session from the Chat Page

  1. Navigate to Chat on the left sidebar Chat Page

  2. Select a workspace from the dropdown, then select a model from that workspace Chat Workspace

  3. Click "Go" Chat Start

  4. Configure model settings if desired, then chat with your model! Chat Settings Chat Settings Modify

Starting a Chat Session from the Workspaces Page

  1. Click into your workspace to view its details
  2. Click the Chat icon on the upper right Chat in Workspace
  3. Configure model settings if desired, then chat with your model! Chat Go

Chat Interface Features

The ChatUI component provides a full-featured chat experience:

FeatureDescriptionImplementation
Message StreamingReal-time response displaystreamText() with textStream
Model SelectionChoose from available modelsAutocomplete with /v1/models data
Message HistoryConversation persistencemessages state array
Markdown SupportRich text formattingReactMarkdown component
Error HandlingFallback responsesTry-catch with connection timeout logic

Legacy CLI Method (Optional)

For users familiar with kubectl, you can also deploy models using YAML:

phi-3.5-workspace.yaml
apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: workspace-phi-3-5-mini
resource:
instanceType: 'Standard_NC24ads_A100_v4'
labelSelector:
matchLabels:
apps: phi-3-5
inference:
preset:
name: phi-3.5-mini-instruct

Apply this configuration to your cluster:

kubectl apply -f phi-3.5-workspace.yaml

Monitor Deployment

Track the workspace status to see when the model has been deployed successfully:

kubectl get workspace workspace-phi-3-5-mini

When the WORKSPACEREADY column becomes True, the model has been deployed successfully:

NAME                     INSTANCE                   RESOURCEREADY   INFERENCEREADY   JOBSTARTED   WORKSPACESUCCEEDED   AGE
workspace-phi-3-5-mini Standard_NC24ads_A100_v4 True True True 4h15m

Test the Model

Find the inference service's cluster IP and test it using a temporary curl pod:

# Get the service endpoint
kubectl get svc workspace-phi-3-5-mini
export CLUSTERIP=$(kubectl get svc workspace-phi-3-5-mini -o jsonpath="{.spec.clusterIPs[0]}")

List available models

kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -s http://$CLUSTERIP/v1/models | jq

You should see output similar to:

{
"object": "list",
"data": [
{
"id": "phi-3.5-mini-instruct",
"object": "model",
"created": 1733370094,
"owned_by": "vllm",
"root": "/workspace/vllm/weights",
"parent": null,
"max_model_len": 16384
}
]
}

Make an Inference Call

Now make an inference call using the model:

kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi-3.5-mini-instruct",
"prompt": "What is kubernetes?",
"max_tokens": 50,
"temperature": 0
}'

Next Steps

Congratulations! You've successfully deployed and tested your first model with the headlamp-kaito plugin.

After completing this quick start:

  • Explore advanced features in Core Features
  • Customize model settings using the settings dialog (⚙️ icon in chat)

Additional Resources