Quick Start
After installing KAITO, you can quickly deploy a phi-3.5-mini-instruct inference service to get started.
Prerequisites
- A Kubernetes cluster with KAITO installed (see Installation)
kubectl
configured to access your cluster
Deploy Your First Model
Let's start by deploying a phi-3.5-mini-instruct model using a workspace configuration:
phi-3.5-workspace.yaml
apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: workspace-phi-3-5-mini
resource:
instanceType: "Standard_NC24ads_A100_v4"
labelSelector:
matchLabels:
apps: phi-3-5
inference:
preset:
name: phi-3.5-mini-instruct
Apply this configuration to your cluster:
kubectl apply -f phi-3.5-workspace.yaml
Monitor Deployment
Track the workspace status to see when the model has been deployed successfully:
kubectl get workspace workspace-phi-3-5-mini
When the WORKSPACEREADY
column becomes True
, the model has been deployed successfully:
NAME INSTANCE RESOURCEREADY INFERENCEREADY JOBSTARTED WORKSPACESUCCEEDED AGE
workspace-phi-3-5-mini Standard_NC24ads_A100_v4 True True True 4h15m
Test the Model
Find the inference service's cluster IP and test it using a temporary curl pod:
# Get the service endpoint
kubectl get svc workspace-phi-3-5-mini
export CLUSTERIP=$(kubectl get svc workspace-phi-3-5-mini -o jsonpath="{.spec.clusterIPs[0]}")
# List available models
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -s http://$CLUSTERIP/v1/models | jq
You should see output similar to:
{
"object": "list",
"data": [
{
"id": "phi-3.5-mini-instruct",
"object": "model",
"created": 1733370094,
"owned_by": "vllm",
"root": "/workspace/vllm/weights",
"parent": null,
"max_model_len": 16384
}
]
}
Make an Inference Call
Now make an inference call using the model:
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi-3.5-mini-instruct",
"prompt": "What is kubernetes?",
"max_tokens": 50,
"temperature": 0
}'
Next Steps
🎉 Congratulations! You've successfully deployed and tested your first model with KAITO.
- Learn More: Explore the full range of supported models in the presets documentation
- Advanced Usage: Learn about workspace configurations
- Contributing: See how to contribute new models