Skip to main content

AI Runway

Deploy and manage large language models on Kubernetes — no YAML required.

A web UI and unified ModelDeployment CRD on top of the leading Kubernetes inference providers. Browse HuggingFace, pick a model, click deploy.

Get Started Download

Quick Start

Pick the flavor that matches how you work — both take less than a minute.

Run locally

Download the latest release and launch the dashboard against your current kubeconfig.

./airunway
# open http://localhost:3001

Deploy to Kubernetes

Apply the controller and optional dashboard manifests directly into your cluster.

kubectl apply -f https://raw.githubusercontent.com/kaito-project/airunway/main/deploy/controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kaito-project/airunway/main/deploy/dashboard.yaml
kubectl port-forward -n airunway-system svc/airunway 3001:80

Highlights

One-Click Deploy

Browse models, check GPU fit, and deploy from the web UI — no YAML required.

Unified CRD

A single ModelDeployment API works across every supported provider and engine.

Multiple Engines

vLLM, SGLang, TensorRT-LLM, and llama.cpp — picked automatically per workload.

Live Monitoring

Real-time status, log streaming, and Prometheus metrics built into the dashboard.

Cost Estimation

Surface GPU pricing and capacity guidance before you commit to a deployment.

Gateway Integration

Auto-detected Gateway API Inference Extension setup with a unified inference endpoint.

Supported Providers

Bring the inference stack you already know — AI Runway gives them a common API.

NVIDIA Dynamo

GPU-accelerated inference with aggregated or disaggregated serving.

KubeRay

Ray-based distributed inference.

KAITO

vLLM (GPU) and llama.cpp (CPU/GPU) support.

LLM-D

vLLM (GPU) with aggregated or disaggregated serving.

See it in action

A two-minute tour of deploying a model end to end.

Join the community

Issues, ideas, and pull requests are all welcome.

GitHub Issues Releases