Development Guide

Prerequisites

Go 1.23+ (for controller development)
Bun 1.0+ (for Web UI development)
Access to a Kubernetes cluster
Helm CLI (for provider installation)
kubectl configured with cluster access

Quick Start

Web UI Development

# Install dependencies
bun install

# Start development servers (frontend + backend)
bun run dev

# Development mode:
#   Frontend: http://localhost:5173 (Vite dev server, proxies API to backend)
#   Backend:  http://localhost:3001
#
# Production mode (compiled binary):
#   Single server: http://localhost:3001 (frontend embedded in backend)

Controller Development

# Build the controller binary
make controller-build

# Run controller tests
make controller-test

# Run controller locally (uses your kubeconfig)
make controller-run

# Regenerate CRDs and deepcopy code after editing *_types.go files
make controller-generate

# Build the docker container image
make controller-docker-build CONTROLLER_IMG=<YOUR IMAGE>

# Defaults: PUSH=false and PLATFORM=linux/amd64

# Optional: push instead of load, or target a different platform
make controller-docker-build CONTROLLER_IMG=<YOUR IMAGE> PUSH=true PLATFORM=linux/amd64,linux/arm64

# Install CRDs into the cluster
make controller-install

# Deploy controller to cluster
make controller-deploy CONTROLLER_IMG=<YOUR IMAGE>

Important: After editing controller/api/v1alpha1/*_types.go files, always run:

cd controller && make manifests generate

Building a Single Binary

The project can be compiled to a standalone executable that includes both the backend API and embedded frontend assets:

# Compile to single binary (includes frontend)
bun run compile

# Run the binary (serves both API and frontend on port 3001)
./dist/airunway

# Check version info
curl http://localhost:3001/api/health/version

The compile process:

Builds the frontend with Vite
Generates native Bun file imports in backend/src/embedded-assets.ts
Injects build-time constants (version, git commit, build time) via --define
Compiles everything into a single executable using bun build --compile --minify --sourcemap

The binary is completely self-contained with zero-copy file serving. The backend uses Hono on Bun for optimal performance.

Cross-Compilation

Build for multiple platforms:

# Build for all platforms
make compile-all

# Or individual targets
make compile-linux     # linux-x64, linux-arm64
make compile-darwin    # darwin-x64, darwin-arm64
make compile-windows   # windows-x64

# With explicit version
VERSION=v1.0.0 bun run compile

Supported targets:

linux-x64, linux-arm64
darwin-x64, darwin-arm64
windows-x64

Controller Development

The controller is a Go-based Kubernetes operator built with Kubebuilder.

Project Structure

controller/
├── api/v1alpha1/           # CRD type definitions
│   ├── modeldeployment_types.go
│   └── inferenceproviderconfig_types.go
├── cmd/                    # Main entrypoint
├── config/                 # Kustomize manifests
│   ├── crd/                # Generated CRD YAMLs
│   ├── rbac/               # RBAC manifests
│   └── manager/            # Controller deployment
├── internal/
│   ├── controller/         # Reconciliation logic
│   └── webhook/            # Validation webhooks
└── Makefile                # Build commands

CRDs

AI Runway defines two CRDs:

ModelDeployment (namespaced) - User-facing API for deploying models
InferenceProviderConfig (cluster-scoped) - Provider registration

After editing *_types.go files, regenerate code:

cd controller && make manifests generate

Reconciliation Flow

Core controller reconciliation steps:

Receive ModelDeployment event
Check for pause annotation (airunway.ai/reconcile-paused: "true") — skip if paused
Select engine — use explicit spec.engine.type or auto-select from provider capabilities (filtered by GPU/CPU, serving mode, and engine GPU requirements)
Validate spec (engine/resource compatibility, required fields)
Select provider — use explicit spec.provider.name or run auto-selection algorithm (CEL rules now see the resolved engine)
Set status — status.engine, status.provider, conditions

The core controller stops here. Provider controllers then take over with their own sequence:

Filter — only reconcile ModelDeployments where status.provider.name matches
Validate compatibility — check engine/mode support for this provider
Transform — convert ModelDeployment spec to provider-specific resource
Create/Update — apply provider resource with owner references
Sync status — map provider resource status back to ModelDeployment (phase, replicas, endpoint)
Handle deletion — clean up provider resources via finalizers (5-minute timeout)

Observability

Controller metrics:

airunway_modeldeployment_total{namespace, phase}
airunway_reconciliation_duration_seconds{provider}
airunway_reconciliation_errors_total{provider, error_type}
airunway_provider_selection{provider, reason}
airunway_deployment_replicas{name, namespace, state}
airunway_deployment_phase{name, namespace, phase}

Events emitted:

Normal   ProviderSelected    Selected provider 'dynamo': default → dynamo (GPU inference default)
Normal   ResourceCreated     Created DynamoGraphDeployment 'my-llm'
Warning  SecretNotFound      Secret 'hf-token-secret' not found in namespace 'default'
Warning  ProviderError       Provider resource in error state: insufficient GPUs
Warning  DriftDetected       Provider resource was modified directly, reconciling
Warning  FinalizerTimeout    Finalizer removed after timeout, provider resource may be orphaned

Running Locally

# Install CRDs first
make controller-install

# Run controller (uses your kubeconfig)
make controller-run

Testing

# Run unit tests
make controller-test

# Run with verbose output
cd controller && go test -v ./...

Test categories:

Unit tests — manifest transformation per provider, status mapping, provider selection algorithm, schema validation
Integration tests — controller reconciliation with mock K8s API, owner references, finalizer behavior, drift detection, webhook validation
E2E tests — full deployment lifecycle per provider, error recovery, controller restart resilience

Version Compatibility Matrix

AI Runway Controller	Kubernetes	KAITO Operator	Dynamo Operator	KubeRay Operator
v0.1.x	1.26-1.30	v0.3.x	v1.0.x	v1.1.x

Provider	Minimum Version	CRD API Version	Notes
KAITO	v0.3.0	kaito.sh/v1beta1	Requires GPU operator for GPU workloads
Dynamo	v1.0.0	nvidia.com/v1alpha1	Requires NVIDIA GPU operator; CRDs are bundled in the platform chart
KubeRay	v1.1.0	ray.io/v1	Optional: KubeRay autoscaler for scaling

Finalizer Handling

The controller uses finalizers to ensure provider resource cleanup on deletion:

Controller attempts cleanup for 5 minutes
After timeout, removes finalizer with warning event
Orphaned provider resources may remain (logged for manual cleanup)

Manual escape (immediate — use when deletion is stuck):

kubectl patch modeldeployment my-llm --type=merge \
  -p '{"metadata":{"finalizers":[]}}'

Provider Development

Provider controllers are independent operators in providers/<name>/:

# Build a provider binary (from provider directory)
cd providers/kaito && make build
cd providers/dynamo && make build
cd providers/kuberay && make build
cd providers/llmd && make build

# Build provider Docker image
cd providers/kaito && make docker-build IMG=<YOUR IMAGE>
cd providers/llmd && make docker-build IMG=<YOUR IMAGE>

# Defaults: PUSH=false and PLATFORM=linux/amd64

# Optional: push instead of load, or target a different platform
cd providers/llmd && make docker-build IMG=<YOUR IMAGE> PUSH=true PLATFORM=linux/amd64,linux/arm64

# Deploy provider to cluster
cd providers/kaito && make deploy IMG=<YOUR IMAGE>
cd providers/llmd && make deploy IMG=<YOUR IMAGE>

# Generate deploy manifest
cd providers/kaito && make generate-deploy-manifests

Environment Variables

Frontend (.env)

VITE_API_URL=http://localhost:3001
VITE_DEFAULT_NAMESPACE=airunway-system
VITE_DEFAULT_HF_SECRET=hf-token-secret

Backend (.env)

PORT=3001
DEFAULT_NAMESPACE=airunway-system
CORS_ORIGIN=http://localhost:5173
AUTH_ENABLED=false

Authentication

AI Runway supports optional authentication using Kubernetes OIDC tokens from your kubeconfig.

Enabling Authentication

Set the AUTH_ENABLED environment variable:

AUTH_ENABLED=true ./dist/airunway

Run the login command:
```
airunway login
```
This extracts your OIDC token from kubeconfig and opens the browser with a magic link.

Alternative: Specify server URL:

airunway login --server https://airunway.example.com

Use a specific kubeconfig context:
```
airunway login --context my-cluster
```

How It Works

The CLI extracts the OIDC id-token from your kubeconfig
Opens your browser with a URL containing the token in the fragment (#token=...)
The frontend saves the token to localStorage
All API requests include the token in the Authorization: Bearer header
The backend validates tokens using Kubernetes TokenReview API

Public Routes (No Auth Required)

These routes are accessible without authentication:

GET /api/health - Health check
GET /api/cluster/status - Cluster connection status
GET /api/settings - Settings (includes auth.enabled for frontend)

CLI Commands

airunway                    # Start server (default)
airunway serve              # Start server
airunway login              # Login with kubeconfig credentials
airunway login --server URL # Login to specific server
airunway login --context X  # Use specific kubeconfig context
airunway logout             # Clear stored credentials
airunway version            # Show version
airunway help               # Show help

Project Commands

Root

bun run dev           # Start both frontend and backend
bun run build         # Build all packages
bun run compile       # Build single binary (frontend + backend) to dist/airunway
bun run lint          # Lint all packages

Controller (Go)

make controller-build       # Build Go controller binary
make controller-test        # Run controller tests
make controller-run         # Run controller locally
make controller-generate    # Regenerate CRDs and deepcopy code
make controller-install     # Install CRDs into cluster
make controller-deploy      # Deploy controller to cluster

Frontend

bun run dev:frontend    # Start Vite dev server
bun run build:frontend  # Build for production

Backend

bun run dev:backend     # Start with watch mode
bun run build:backend   # Compile TypeScript
bun run compile         # Build single binary executable

The backend pins TypeScript to 5.3.3 to keep Bun/import-meta compilation behavior stable. Do not widen that version without validating bun run build:backend and bun run compile.

Backend Testing

cd backend
bun test                           # Run all backend tests
bun test src/routes/autoscaler.test.ts  # Run a specific test file
bun test --watch                   # Watch mode
bun test --coverage                # With coverage report

Test organization:

src/routes/*.test.ts — Route-level tests using Hono's app.request() (exercises full middleware stack)
src/services/*.test.ts — Service unit tests with mocked dependencies
src/lib/*.test.ts — Utility/library unit tests
src/test/helpers.ts — Shared test utilities (mockServiceMethod, withTimeout)
src/test/fixtures.ts — Reusable mock data for K8s resources

How mocking works: Tests import the Hono app directly and use app.request() to invoke routes in-process (no HTTP server needed). K8s-dependent services are mocked via property replacement on singleton instances. Tests that may hit K8s use withTimeout to gracefully skip when no cluster is available.

CI pipelines: The test.yml workflow runs all tests in an environment without a Kubernetes cluster (K8s-dependent tests gracefully skip via timeout). The e2e-backend.yml workflow runs the same tests against a real Kind cluster with KAITO and the controller deployed, where K8s-dependent tests execute fully.

Headlamp Plugin

cd plugins/headlamp
bun install             # Install plugin dependencies
bun run build           # Build plugin
bun run start           # Development mode with auto-rebuild
bun run test            # Run tests
bun run test:watch      # Watch mode for tests
bun run lint            # Lint code
bun run tsc             # Type check only

Makefile Commands

make setup              # Install deps, build, and deploy to Headlamp
make dev                # Build and deploy for development
make build              # Build only
make deploy             # Deploy to Headlamp plugins directory
make clean              # Remove build artifacts

Prerequisites for Headlamp Plugin

Headlamp Desktop (v0.20+) or Headlamp running in-cluster
AI Runway backend deployed or running locally

Configuring Backend URL

The plugin discovers the backend in this order:

Plugin Settings: Configure in Headlamp → Settings → Plugins → AIRunway
In-Cluster: Auto-discovers airunway.<namespace>.svc
Default: Falls back to http://localhost:3001

Testing with Headlamp Desktop

Build and deploy the plugin:
```
cd plugins/headlamp
make setup
```
Start AI Runway backend:
```
cd ../..
bun run dev:backend
```
Open Headlamp Desktop - the plugin should appear in the sidebar

Kubernetes Setup

Create HuggingFace Token Secret

kubectl create secret generic hf-token-secret \
  --from-literal=HF_TOKEN="your-token" \
  -n airunway

Install NVIDIA Dynamo (via Helm)

export NAMESPACE=dynamo-system
export RELEASE_VERSION=1.1.1

# The Dynamo platform chart bundles its CRDs
helm upgrade --install dynamo-platform \
  https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz \
  --namespace ${NAMESPACE} \
  --create-namespace \
  --set-json global.grove.install=true

Adding a New Provider

Providers are independent out-of-tree Go operators in providers/<name>/. Each provider watches ModelDeployment resources and creates provider-specific resources.

There are two provider patterns:

Shim Providers (Adapter Pattern)

Use this when wrapping an existing inference operator that has its own CRD (e.g., KAITO Workspace, DynamoGraphDeployment, RayService). The provider translates ModelDeployment → upstream CRD and syncs status back.

ModelDeployment → Provider Controller → Upstream CRD → Upstream Operator → Pods/Services
                                             ↑ status sync

Create provider directory:

providers/<name>/
├── cmd/main.go          # Provider entrypoint
├── controller.go        # Reconciliation logic
├── transformer.go       # ModelDeployment → upstream CRD conversion
├── status.go            # Upstream CRD → ModelDeployment status mapping
├── config.go            # InferenceProviderConfig self-registration
├── config/              # Kustomize deployment manifests
├── Dockerfile           # Container image
├── go.mod               # Independent Go module
└── go.sum

Implement the provider controller (see existing providers for examples):
- controller.go: Reconcile ModelDeployment resources where status.provider.name matches
- transformer.go: Convert ModelDeployment spec to upstream CRD resources
- status.go: Map upstream CRD status back to ModelDeployment status
- config.go: Define InferenceProviderConfigSpec with capabilities and desired-state selectionRules. Emit provider display metadata as annotations such as airunway.ai/display-name, airunway.ai/description, airunway.ai/default-namespace, and airunway.ai/documentation-url; providers may also mirror capabilities in airunway.ai/capabilities for compatibility. Set airunway.ai/installation annotations for Helm/manual installation metadata; airunway.ai/documentation can remain as a backward-compatible documentation fallback (see CRD Reference).

Native Providers (No Upstream CRD)

Use this when there is no upstream operator — the provider directly manages Kubernetes resources (Deployments, Services) from the ModelDeployment spec. No transformer or intermediate CRD is needed.

ModelDeployment → Provider Controller → Deployments/Services → Pods
                                             ↑ status sync

This works because the status.provider.resourceKind and resourceName fields are free-form strings — they can point at a Deployment just as easily as a Workspace. The core controller never inspects what the provider creates.

When to use this pattern:

Building a new inference runtime with no pre-existing CRD
A lightweight provider that runs vLLM/SGLang containers directly via Deployments
A "generic" provider where an upstream CRD adds no value

Directory structure (no transformer.go needed):

providers/<name>/
├── cmd/main.go          # Provider entrypoint
├── controller.go        # Reconciliation logic (creates Deployments/Services directly)
├── status.go            # Deployment/Pod → ModelDeployment status mapping
├── config.go            # InferenceProviderConfig self-registration
├── config/              # Kustomize deployment manifests
├── Dockerfile
├── go.mod
└── go.sum

Example reconciliation (simplified):

func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    md := &v1alpha1.ModelDeployment{}
    r.Get(ctx, req.NamespacedName, md)

    // Build Deployment directly from ModelDeployment spec — no intermediate CRD
    deploy := r.buildDeployment(md)  // vllm container with model args
    svc := r.buildService(md)

    controllerutil.CreateOrUpdate(ctx, r.Client, deploy, func() error { return nil })
    controllerutil.CreateOrUpdate(ctx, r.Client, svc, func() error { return nil })

    // Sync status from Deployment
    md.Status.Phase = phaseFromDeployment(deploy)
    md.Status.Provider.ResourceName = deploy.Name
    md.Status.Provider.ResourceKind = "Deployment"
    md.Status.Replicas = replicasFromDeployment(deploy)
    md.Status.Endpoint = endpointFromService(svc)
    r.Status().Update(ctx, md)

    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

The config.go for a native provider should define InferenceProviderConfigSpec.capabilities plus any selectionRules, and set display/default/documentation metadata through annotations such as airunway.ai/display-name, airunway.ai/description, airunway.ai/documentation-url, and airunway.ai/default-namespace. It can omit the airunway.ai/installation annotation when there is no upstream operator to install, or include it if the provider itself is installed via Helm. See CRD Reference and installation metadata for the annotation schemas.

Common Steps (Both Patterns)

Add Makefile targets in the root Makefile:

make <name>-provider-build         # Build provider binary
make <name>-provider-docker-build  # Build Docker image
make <name>-provider-deploy        # Deploy to cluster

Adding a New Model

Edit backend/src/data/models.json:

{
  "models": [
    {
      "id": "org/model-name",
      "name": "Model Display Name",
      "description": "Brief description",
      "size": "7B",
      "task": "chat",
      "contextLength": 32768,
      "supportedEngines": ["vllm", "sglang"],
      "minGpuMemory": "16GB"
    }
  ]
}

Testing API Endpoints

# Health check
curl http://localhost:3001/api/health

# Cluster status
curl http://localhost:3001/api/cluster/status

# List models
curl http://localhost:3001/api/models

# List deployments
curl http://localhost:3001/api/deployments

# Create deployment (Dynamo/KubeRay)
curl -X POST http://localhost:3001/api/deployments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "test-deployment",
    "namespace": "airunway-system",
    "provider": "dynamo",
    "modelId": "Qwen/Qwen3-0.6B",
    "engine": "vllm",
    "mode": "aggregated",
    "replicas": 1,
    "hfTokenSecret": "hf-token-secret",
    "enforceEager": true
  }'

# Create deployment (KAITO with premade model)
curl -X POST http://localhost:3001/api/deployments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "kaito-deployment",
    "namespace": "kaito-workspace",
    "provider": "kaito",
    "modelSource": "premade",
    "premadeModel": "llama3.2-1b",
    "computeType": "cpu"
  }'

# Create deployment (KAITO with HuggingFace GGUF - direct mode)
curl -X POST http://localhost:3001/api/deployments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "gemma-deployment",
    "namespace": "kaito-workspace",
    "provider": "kaito",
    "modelSource": "huggingface",
    "modelId": "bartowski/gemma-3-1b-it-GGUF",
    "ggufFile": "gemma-3-1b-it-Q8_0.gguf",
    "ggufRunMode": "direct",
    "computeType": "cpu"
  }'

# Create deployment (KAITO with vLLM for GPU inference)
curl -X POST http://localhost:3001/api/deployments \
  -H "Content-Type: application/json" \
  -d '{
    "name": "vllm-deployment",
    "namespace": "kaito-workspace",
    "provider": "kaito",
    "modelSource": "vllm",
    "modelId": "Qwen/Qwen3-0.6B",
    "hfTokenSecret": "hf-token-secret",
    "resources": { "gpu": 1 }
  }'

Accessing Deployed Models

After deployment is running:

# Port-forward to the service (check deployment details for exact service name)
# Dynamo/KubeRay deployments expose port 8000
kubectl port-forward svc/<deployment>-frontend 8000:8000 -n airunway-system

# KAITO deployments with vLLM expose port 8000
kubectl port-forward svc/<deployment-name> 8000:8000 -n kaito-workspace

# KAITO deployments with llama.cpp (premade/GGUF) expose port 5000
kubectl port-forward svc/<deployment-name> 5000:5000 -n kaito-workspace

# Test the model (OpenAI-compatible API)
# For vLLM (port 8000):
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# For llama.cpp (port 5000):
curl http://localhost:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2-1b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

GPU End-to-End Testing

make gpu-e2e runs a real-GPU end-to-end suite that deploys each inference provider through a ModelDeployment, drives it to Running, and asserts that inference actually serves through the inference gateway. Unlike the CPU/mocker e2e lanes, it requires real GPU hardware and an already-provisioned cluster — it never creates or deletes the cluster.

The harness (scripts/gpu-e2e.sh) builds and pushes the controller and provider images, installs any missing upstream operator, deploys everything, then runs the Go suite under test/e2e/gpu/. Providers covered: Dynamo, vLLM, KAITO (KubeRay is not yet supported).

Cluster preconditions

The harness installs none of these (except a missing operator via setup-<p>):

GPU nodes with the NVIDIA GPU Operator and NFD enabled, so nodes advertise nvidia.com/gpu and the nvidia.com/gpu.present=true label.
An RWX-capable StorageClass. The Dynamo model-cache PVC defaults to ReadWriteMany; Azure Disk classes are ReadWriteOnce and will leave the PVC Pending. The default is azurefile-premium; override with --storage-class.
The inference gateway (Gateway API CRDs + GAIE + Istio + BBR + a Gateway named inference-gateway). On a fresh cluster make -C providers/dynamo setup-dynamo installs it; otherwise it must already be present and Programmed. The suite fails fast if it is missing.
Pull access to the pushed images. The manager manifests carry no imagePullSecret, so the images must be public or the nodes must have pull access. New registry repositories often default to private — make them public once.

Running it

# All three providers, building+pushing images to your registry:
make gpu-e2e GPU_E2E_ARGS="--provider all --registry <your-registry>"

# A single provider:
make gpu-e2e GPU_E2E_ARGS="--provider vllm --registry <your-registry>"

# Re-test without rebuilding (requires an explicit, already-pushed tag):
make gpu-e2e GPU_E2E_ARGS="--provider dynamo --skip-build \
    --registry <your-registry> --img-tag <tag>"

# Run the Go suite directly against an already-deployed cluster (no rebuild):
go test -C test/e2e/gpu -tags=e2e -v -run 'TestGPUProviders/vllm' ./

Flags are passed to the script via GPU_E2E_ARGS; pass them inside the quotes, not as bare make arguments. See scripts/gpu-e2e.sh --help for the full list. Key flags: --provider, --registry (required when building), --img-tag, --storage-class, --skip-install, --skip-build, --keep.

Environment knobs

The script forwards these to the Go suite; you can also set them directly when running go test:

Variable	Meaning
`GPU_E2E_STORAGE_CLASS`	RWX StorageClass injected into the Dynamo fixture and asserted on (default `azurefile-premium`). Set by `--storage-class`.
`GPU_E2E_KEEP`	When `true`, leave `ModelDeployment`s running after the test for inspection. Set by `--keep`.
`GPU_E2E_RESULTS_DIR`	Optional override for where per-case result bundles are written (default `test/e2e/gpu/gpu-e2e-results/<timestamp>/`).
`GPU_E2E_RUN_TS`	Optional fixed timestamp for the results directory name.

Outcomes

Each case ends as PASS, FAIL, or SKIP. A SKIP means the cluster lacks the capacity to schedule that case (more GPUs requested than any node has, or no GPU free before the scheduling deadline) — it does not fail the run. Only a genuine error (a broken deployment, failed inference, or orphaned resources after delete) is a FAIL. Per-case logs and a result marker are written under the results directory.

Troubleshooting

Controller not reconciling

Check controller logs: kubectl logs -n airunway-system deploy/airunway-controller-manager
Verify CRDs are installed: kubectl get crd modeldeployments.airunway.ai
Check RBAC permissions for the controller service account

ModelDeployment stuck in Pending

Check if any InferenceProviderConfig resources exist: kubectl get inferenceproviderconfigs
Verify at least one provider has status.ready: true
Check controller logs for provider selection errors

Backend can't connect to cluster

Verify kubectl is configured: kubectl cluster-info
Check KUBECONFIG environment variable
Ensure proper RBAC permissions

Provider not detected as installed

Check CRD exists:
- Dynamo: kubectl get crd dynamographdeployments.nvidia.com
- KubeRay: kubectl get crd rayservices.ray.io
- KAITO: kubectl get crd workspaces.kaito.sh
Check operator deployment:
- Dynamo: kubectl get deployments -n dynamo-system
- KubeRay: kubectl get deployments -n ray-system
- KAITO: kubectl get deployments -n kaito-workspace

KAITO deployment stuck in Pending

Check KAITO workspace status: kubectl describe workspace <name> -n kaito-workspace
Verify node labels match labelSelector (default: kubernetes.io/os: linux)
For vLLM mode, ensure GPU nodes are available
Check events: kubectl get events -n kaito-workspace --sort-by=.lastTimestamp

Metrics not available

Metrics require AI Runway to run in-cluster
Check deployment pods are running: kubectl get pods -n <namespace>
Verify metrics endpoint is exposed (port 8000 for vLLM, port 5000 for llama.cpp)

Frontend can't reach backend

Check CORS_ORIGIN matches frontend URL
Verify backend is running on correct port
Check browser console for errors

Headlamp Plugin Issues

Plugin not appearing in Headlamp

Verify plugin was built: cd plugins/headlamp && bun run build
Check plugin deployment location:
- macOS: ~/.config/Headlamp/plugins/airunway-headlamp-plugin
- Linux: ~/.config/Headlamp/plugins/airunway-headlamp-plugin
- Windows: %APPDATA%/Headlamp/plugins/airunway-headlamp-plugin
Restart Headlamp after deploying the plugin

Plugin can't connect to backend

Check backend URL in Headlamp → Settings → Plugins → AIRunway
Verify backend is running: curl http://localhost:3001/api/health
For in-cluster deployments, ensure the service is accessible
Check browser dev tools (Network tab) for connection errors

The plugin auto-discovers the backend; ensure it's running
In-cluster: Deploy AI Runway backend to airunway-system namespace
Local development: Start backend with bun run dev:backend

Type errors after shared package changes

Rebuild the shared package: cd shared && bun run build
Rebuild the plugin: cd plugins/headlamp && bun run build
Clear TypeScript cache: rm -rf plugins/headlamp/node_modules/.cache

Prerequisites​

Quick Start​

Web UI Development​

Controller Development​

Building a Single Binary​

Cross-Compilation​

Controller Development​

Project Structure​

CRDs​

Reconciliation Flow​

Observability​

Running Locally​

Testing​

Version Compatibility Matrix​

Finalizer Handling​

Provider Development​

Environment Variables​

Frontend (.env)​

Backend (.env)​

Authentication​

Enabling Authentication​

Login Flow​

How It Works​

Public Routes (No Auth Required)​

CLI Commands​

Project Commands​

Root​

Controller (Go)​

Frontend​

Backend​

Backend Testing​

Headlamp Plugin​

Makefile Commands​

Prerequisites for Headlamp Plugin​

Configuring Backend URL​

Testing with Headlamp Desktop​

Kubernetes Setup​

Create HuggingFace Token Secret​

Install NVIDIA Dynamo (via Helm)​

Adding a New Provider​

Shim Providers (Adapter Pattern)​

Native Providers (No Upstream CRD)​

Common Steps (Both Patterns)​

Adding a New Model​

Testing API Endpoints​

Accessing Deployed Models​

GPU End-to-End Testing​

Cluster preconditions​

Running it​

Environment knobs​

Outcomes​

Troubleshooting​

Controller not reconciling​

ModelDeployment stuck in Pending​

Backend can't connect to cluster​

Provider not detected as installed​

KAITO deployment stuck in Pending​

Metrics not available​

Frontend can't reach backend​

Headlamp Plugin Issues​

Plugin not appearing in Headlamp​

Plugin can't connect to backend​

Plugin shows "Connection Failed" banner​

Type errors after shared package changes​

Prerequisites

Quick Start

Web UI Development

Controller Development

Building a Single Binary

Cross-Compilation

Controller Development

Project Structure

CRDs

Reconciliation Flow

Observability

Running Locally

Testing

Version Compatibility Matrix

Finalizer Handling

Provider Development

Environment Variables

Frontend (.env)

Backend (.env)

Authentication

Enabling Authentication

Login Flow

How It Works

Public Routes (No Auth Required)

CLI Commands

Project Commands

Root

Controller (Go)

Frontend

Backend

Backend Testing

Headlamp Plugin

Makefile Commands

Prerequisites for Headlamp Plugin

Configuring Backend URL

Testing with Headlamp Desktop

Kubernetes Setup

Create HuggingFace Token Secret

Install NVIDIA Dynamo (via Helm)

Adding a New Provider

Shim Providers (Adapter Pattern)

Native Providers (No Upstream CRD)

Common Steps (Both Patterns)

Adding a New Model

Testing API Endpoints

Accessing Deployed Models

GPU End-to-End Testing

Cluster preconditions

Running it

Environment knobs

Outcomes

Troubleshooting

Controller not reconciling

ModelDeployment stuck in Pending

Backend can't connect to cluster

Provider not detected as installed

KAITO deployment stuck in Pending

Metrics not available

Frontend can't reach backend

Headlamp Plugin Issues

Plugin not appearing in Headlamp

Plugin can't connect to backend

Plugin shows "Connection Failed" banner

Type errors after shared package changes