Skip to main content
Version: Next

Presets

Curated Models

The following HuggingFace models are curated by the KAITO team with first-class support, including validated configurations and optimized inference settings.

Model NameDescriptionLicense
deepseek-ai/DeepSeek-R1-0528https://huggingface.co/deepseek-ai/DeepSeek-R1-0528MIT
deepseek-ai/DeepSeek-R1-Distill-Llama-8Bhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8BMIT
deepseek-ai/DeepSeek-R1-Distill-Qwen-14Bhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14BMIT
deepseek-ai/DeepSeek-V3-0324https://huggingface.co/deepseek-ai/DeepSeek-V3-0324MIT
google/gemma-3-4b-ithttps://huggingface.co/google/gemma-3-4b-itGemma
google/gemma-3-27b-ithttps://huggingface.co/google/gemma-3-27b-itGemma
google/gemma-4-26B-A4B-ithttps://huggingface.co/google/gemma-4-26B-A4B-itApache-2.0
google/gemma-4-31B-ithttps://huggingface.co/google/gemma-4-31B-itApache-2.0
google/gemma-4-E2B-ithttps://huggingface.co/google/gemma-4-E2B-itApache-2.0
google/gemma-4-E4B-ithttps://huggingface.co/google/gemma-4-E4B-itApache-2.0
meta-llama/Llama-3.1-8B-Instructhttps://huggingface.co/meta-llama/Llama-3.1-8B-InstructLlama 3.1
meta-llama/Llama-3.3-70B-Instructhttps://huggingface.co/meta-llama/Llama-3.3-70B-InstructLlama 3.3
microsoft/Phi-3-medium-4k-instructhttps://huggingface.co/microsoft/Phi-3-medium-4k-instructMIT
microsoft/Phi-3-medium-128k-instructhttps://huggingface.co/microsoft/Phi-3-medium-128k-instructMIT
microsoft/Phi-3-mini-4k-instructhttps://huggingface.co/microsoft/Phi-3-mini-4k-instructMIT
microsoft/Phi-3-mini-128k-instructhttps://huggingface.co/microsoft/Phi-3-mini-128k-instructMIT
microsoft/Phi-3.5-mini-instructhttps://huggingface.co/microsoft/Phi-3.5-mini-instructMIT
microsoft/Phi-4-mini-instructhttps://huggingface.co/microsoft/Phi-4-mini-instructMIT
microsoft/phi-4https://huggingface.co/microsoft/phi-4MIT
MiniMaxAI/MiniMax-M2.7https://huggingface.co/MiniMaxAI/MiniMax-M2.7Other
mistralai/Ministral-3-3B-Instruct-2512https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512Apache-2.0
mistralai/Ministral-3-8B-Instruct-2512https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512Apache-2.0
mistralai/Ministral-3-14B-Instruct-2512https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512Apache-2.0
mistralai/Mistral-7B-Instruct-v0.3https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3Apache-2.0
mistralai/Mistral-7B-v0.3https://huggingface.co/mistralai/Mistral-7B-v0.3Apache-2.0
mistralai/Mistral-Large-3-675B-Instruct-2512https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512Apache-2.0
mistralai/Mistral-Medium-3.5-128Bhttps://huggingface.co/mistralai/Mistral-Medium-3.5-128BOther
mistralai/Mistral-Small-4-119B-2603https://huggingface.co/mistralai/Mistral-Small-4-119B-2603Apache-2.0
mistralai/Mistral-Small-4-119B-2603-NVFP4https://huggingface.co/mistralai/Mistral-Small-4-119B-2603-NVFP4Apache-2.0
moonshotai/Kimi-K2.5https://huggingface.co/moonshotai/Kimi-K2.5Modified MIT
moonshotai/Kimi-K2.6https://huggingface.co/moonshotai/Kimi-K2.6Modified MIT
nvidia/Nemotron-Cascade-2-30B-A3Bhttps://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3BNVIDIA Open
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16NVIDIA Nemotron
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16NVIDIA Nemotron
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16NVIDIA Nemotron
nvidia/NVIDIA-Nemotron-Nano-9B-v2https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2NVIDIA Open
nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16NVIDIA Open
openai/gpt-oss-20bhttps://huggingface.co/openai/gpt-oss-20bApache-2.0
openai/gpt-oss-120bhttps://huggingface.co/openai/gpt-oss-120bApache-2.0
Qwen/Qwen2.5-Coder-7B-Instructhttps://huggingface.co/Qwen/Qwen2.5-Coder-7B-InstructApache-2.0
Qwen/Qwen2.5-Coder-32B-Instructhttps://huggingface.co/Qwen/Qwen2.5-Coder-32B-InstructApache-2.0
Qwen/Qwen3-8B-AWQhttps://huggingface.co/Qwen/Qwen3-8B-AWQApache-2.0
Qwen/Qwen3.5-2Bhttps://huggingface.co/Qwen/Qwen3.5-2BApache-2.0
Qwen/Qwen3.5-4Bhttps://huggingface.co/Qwen/Qwen3.5-4BApache-2.0
Qwen/Qwen3.5-9Bhttps://huggingface.co/Qwen/Qwen3.5-9BApache-2.0
Qwen/Qwen3.5-122B-A10Bhttps://huggingface.co/Qwen/Qwen3.5-122B-A10BApache-2.0
Qwen/Qwen3.5-122B-A10B-GPTQ-Int4https://huggingface.co/Qwen/Qwen3.5-122B-A10B-GPTQ-Int4Apache-2.0
Qwen/Qwen3.5-397B-A17B-GPTQ-Int4https://huggingface.co/Qwen/Qwen3.5-397B-A17B-GPTQ-Int4Apache-2.0
Qwen/Qwen3.6-27Bhttps://huggingface.co/Qwen/Qwen3.6-27BApache-2.0
Qwen/Qwen3.6-35B-A3Bhttps://huggingface.co/Qwen/Qwen3.6-35B-A3BApache-2.0
Qwen/Qwen3.6-35B-A3B-FP8https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8Apache-2.0

Generic HuggingFace Models

NOTE: Generic HuggingFace models support is best-effort only. Please file an issue under https://github.com/kaito-project/kaito/issues/ if your targeted model doesn't work in KAITO.

Starting from KAITO v0.9.0, generic Hugging Face models are supported on a best-effort basis. By specifying a Hugging Face model card ID as inference.preset.name in the KAITO workspace or InferenceSet configuration, you can run any Hugging Face model with a model architecture supported by vLLM on KAITO. In this process, KAITO retrieves the model metadata from the Hugging Face website and generates model preset configurations by analyzing this data. During the creation of vLLM inference workloads, KAITO downloads the model weights directly from the Hugging Face site. Below is an example illustrating how to create a Hugging Face inference workload using the model card ID Qwen/Qwen3-0.6B from https://huggingface.co/Qwen/Qwen3-0.6B:

apiVersion: kaito.sh/v1beta1
kind: Workspace
metadata:
name: qwen3-06b
resource:
instanceType: Standard_NC24ads_A100_v4
labelSelector:
matchLabels:
apps: qwen3-06b
inference:
preset:
name: Qwen/Qwen3-0.6B
presetOptions:
modelAccessSecret: hf-token # Reference to Secret name