Skip to main content
Version: Next

AIKit Integration with KAITO

AIKit provides a streamlined way to package and deploy large language models (LLMs) as container images.

This document demonstrates how to integrate AIKit-built models with KAITO workspaces for efficient AI model deployment on Kubernetes, including CPU-based inference and custom model creation with a variety of supported formats, such as GGUF, GPTQ, EXL2, and more.

For more detailed information about AIKit, please refer to the AIKit documentation. For any AIKit-related issues, please open an issue in the AIKit repository.

Overview

AIKit enables you to:

  • 📦 Package AI models as OCI container images with minimal configuration
  • 🤏 Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom distroless-based image
  • 🏃 Run models with a variety of inference backends, such as text or image generation
  • 🖥️ Supports AMD64 and ARM64 CPUs and GPU-accelerated inferencing with NVIDIA GPUs
  • 🪄 Integrate seamlessly with KAITO's infrastructure management and deployment workflows
note

While AIKit and KAITO integrate well, they are separate projects. AIKit focuses on model packaging and deployment, while KAITO provides infrastructure management and Kubernetes deployment workflows via controllers. There may be differences in what model formats are supported by each project.

Deploying AIKit Models to KAITO

Cluster Setup

This guide will provide instructions using a kind cluster for local development and testing so it's easy to get started.

Please note that if you already have a Kubernetes cluster set up, you can skip the cluster setup section.

  • Download and install kind

  • Create a kind cluster:

kind create cluster --name kaito

KAITO Workspace Configuration

Create a KAITO workspace configuration file to deploy your model. Here's a complete example:

aikit-workspace.yaml
apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: workspace-llama-3point2-3b
resource:
labelSelector:
matchLabels:
apps: llama-3point2-3b
preferredNodes:
- kaito-control-plane
inference:
template:
spec:
containers:
- name: llama-3point2-3b
image: ghcr.io/sozercan/llama3.2:3b
args:
- "run"
- "--address=:5000"
Memory Requirements

Before deploying models, check the model's memory requirements to avoid Out of Memory (OOM) errors. Add appropriate resources.requests.memory and resources.limits.memory to your container spec based on the model requirements.

For GGUF models:

  • 7B models generally require at least 8GB of RAM
  • 13B models generally require at least 16GB of RAM
  • 70B models generally require at least 64GB of RAM

You can use gguf-parser-go to get a better estimate for the memory requirements for a given GGUF model, and quantization.

Label the nodes with the applicable label to ensure the workspace can schedule pods on them.

kubectl label nodes kaito-control-plane apps=llama-3point2-3b

Deploy the workspace using:

kubectl apply -f aikit-workspace.yaml

AIKit provides a number of pre-built and curated models that can be used directly. Please refer to Pre-made Models for available options.

tip

Alternatively, if you are on a supported cloud provider and want the cloud provider to auto-provision the nodes for you, you can define an instanceType for KAITO to autoprovision nodes, including CPU and GPU nodes.

You can specify the instance type based on your cloud provider's offerings. For example, for Azure, you can specify a Standard_D2ads_v5, which is a CPU SKU like this:

resource:
instanceType: "Standard_D2ads_v5"
labelSelector:
matchLabels:
apps: llama-3point2-3b

After workspace deployment succeeds, please refer to Quick Start for monitoring the workspace and testing model inference.

Custom Model Creation and Integration

AIKit provides a simple way to create custom models without additional tools except for Docker!

Here's an example on how to create a custom model and integrate it with KAITO:

export IMAGE_NAME="your-registry/your-model:latest"

docker buildx build -t $IMAGE_NAME --push \
--build-arg="model=huggingface://TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf" \
"https://raw.githubusercontent.com/sozercan/aikit/main/models/aikitfile.yaml"

After building the image, you can use it in your KAITO workspace configuration by updating the image field.

For more information on creating custom models, refer to the AIKit documentation.

info

AIKit supports a subset of backends, (such as llama.cpp, diffusers, exllamav2, and others) from LocalAI at this time. Please see Inference Supported Backends section for more details, and updates.