Version: v0.5.1

Usage

The detailed usage for Kaito supported models can be found in HERE. In case users want to deploy their own containerized models, they can provide the pod template in the inference field of the workspace custom resource (please see API definitions for details). The controller will create a deployment workload using all provisioned GPU nodes. Note that currently the controller does NOT handle automatic model upgrade. It only creates inference workloads based on the preset configurations if the workloads do not exist.

The number of the supported models in Kaito is growing! Please check this document to see how to add a new supported model.

Starting with version v0.3.0, Kaito supports model fine-tuning and using fine-tuned adapters in the inference service. Refer to the tuning document and inference document for more information.