Architecture

System Overview

AI Runway is a fully decoupled platform. The core value lives in the Kubernetes controller and CRDs. The UI is an optional, swappable layer that communicates exclusively through a REST API. Any frontend (Headlamp, a custom CLI, or the bundled React UI) can drive the same backend.

AI Runway Architecture

Components at a Glance

Note: The UI layer shown above includes the Frontend layer and the Backend layer.

Component	Language	Role	Required?
Controller	Go (Kubebuilder)	Core operator — manages CRDs, provider selection, lifecycle	✅ Yes
Providers / Runtime adapters	Go / provider-specific	Provider-registered controllers or configs for KAITO, Dynamo, KubeRay, llm-d, Direct vLLM	✅ Yes (1+)
Backend API	TypeScript (Hono/Bun)	REST API — proxies K8s operations, auth, model catalog	Optional
React Frontend	React/TypeScript	Bundled Web UI	❌ Swappable
Headlamp Plugin	React/TypeScript	Alternative UI inside Headlamp dashboard	❌ Swappable
Shared Types	TypeScript	Shared API client & type contracts (`@airunway/shared`)	Library
kubectl / API	—	Direct CRD access via Kubernetes API	Always available

Component Architecture Diagram

 ┌──────────────────────────────────────────────────────────────────────────┐
 │                     FRONTEND LAYER (swappable)                          │
 │                                                                          │
 │  Any of these can be used — or replaced — independently:                │
 │                                                                          │
 │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────────┐  │
 │  │  React UI    │  │  Headlamp    │  │  Any Custom UI / CLI         │  │
 │  │  (bundled)   │  │  Plugin      │  │  (dashboard, portal, etc.)   │  │
 │  └──────┬───────┘  └──────┬───────┘  └──────────────┬───────────────┘  │
 │         │                 │                          │                   │
 └─────────┼─────────────────┼──────────────────────────┼───────────────────┘
           │                 │                          │
           │        REST API (JSON over HTTP)           │
           │         /api/deployments                   │
           │         /api/models                        │
           │         /api/runtimes                      │
           │         /api/health ...                    │
           ▼                 ▼                          ▼
 ┌──────────────────────────────────────────────────────────────────────────┐
 │                      BACKEND API LAYER                                   │
 │                                                                          │
 │  ┌────────────────────────────────────────────────────────────────────┐  │
 │  │  Hono REST API  (Bun runtime)                                     │  │
 │  │                                                                    │  │
 │  │  • Auth middleware (K8s TokenReview)                               │  │
 │  │  • Model catalog & HuggingFace integration                        │  │
 │  │  • Deployment CRUD, metrics, logs                                 │  │
 │  │  • Runtime installation (Helm)                                    │  │
 │  │  • GPU validation, cost estimation                                │  │
 │  │  • Provider-agnostic — reads InferenceProviderConfig CRDs         │  │
 │  └──────────────────────────┬─────────────────────────────────────────┘  │
 └─────────────────────────────┼────────────────────────────────────────────┘
                               │
                  Kubernetes API (client-go / @kubernetes/client-node)
                               │
 ┌─────────────────────────────┼────────────────────────────────────────────┐
 │                    KUBERNETES CLUSTER                                    │
 │                             │                                            │
 │              ┌──────────────▼──────────────┐                            │
 │              │    AI Runway Controller     │  (core operator)           │
 │              │    • Validates specs        │                            │
 │              │    • Selects providers (CEL)│                            │
 │              │    • Manages targets       │                            │
 │              └──────┬───────────────┬──────┘                            │
 │                     │ watches       │ reconciles registered targets      │
 │                     ▼               ▼                                    │
 │  ┌──────────────────────┐  ┌─────────────────────────────────────────┐  │
 │  │  ModelDeployment     │  │ Providers / runtimes registered by      │  │
 │  │  (CRD)               │  │ InferenceProviderConfig                 │  │
 │  │                      │  │                                         │  │
 │  │  InferenceProvider   │  │ KAITO | Dynamo | KubeRay | llm-d |      │  │
 │  │  Config (CRD)        │  │ Direct vLLM (provider-supplied)         │  │
 │  └──────────────────────┘  │                                         │  │
 │                            │ Targets include Workspace, DynamoGraph, │  │
 │                            │ RayService, llm-d resources, or a       │  │
 │                            │ plain Kubernetes Deployment             │  │
 │                            └─────────────────────────────────────────┘  │
 │                                                                          │
 │              ┌────────────────────────────────────────────┐              │
 │              │        Inference Pods (GPU/CPU)            │              │
 │              │  Running vLLM, sglang, TRT-LLM, llama.cpp │              │
 │              └────────────────────────────────────────────┘              │
 └──────────────────────────────────────────────────────────────────────────┘

Provider and Runtime Registration

Providers and direct runtimes register capabilities, selection rules, and installation metadata through InferenceProviderConfig resources. Some providers reconcile provider-specific CRDs such as KAITO Workspace, Dynamo graph resources, KubeRay RayService, or llm-d resources. Direct runtimes such as Direct vLLM can instead target a plain Kubernetes Deployment; this repository includes the Direct vLLM provider controller and shim manifests under providers/vllm/.

Why the Frontend Is Fully Decoupled

REST-only contract — The frontend communicates with the backend exclusively via HTTP/JSON. There is no shared state, no server-side rendering, and no session affinity.
Shared type library — @airunway/shared provides a typed API client and TypeScript types that any frontend can import. The Headlamp plugin already does this.
Backend is optional — The core platform (controller + CRDs) works without the backend/frontend. Users can manage ModelDeployment resources directly via kubectl, Terraform, GitOps, or any Kubernetes API client.
Swappable frontends — The bundled React UI, the Headlamp plugin, or any custom UI can all drive the same backend API simultaneously. No code changes needed.
Auth is delegated — Authentication uses Kubernetes TokenReview; the frontend simply passes a bearer token. Any UI that can obtain a K8s token works.

Gateway API Integration

AI Runway optionally integrates with the Gateway API Inference Extension to provide a unified inference gateway. When Gateway API Custom Resources are detected in the cluster, the controller automatically creates an InferencePool and HTTPRoute for each ModelDeployment, allowing all models to be called through a single Gateway endpoint using body-based routing on the model field.

The feature is auto-detected at startup and silently disabled if the required CRDs are not present. See Gateway Integration for full details.

Documentation

For detailed documentation on specific topics, see:

Document	Description
Controller Architecture	Reconciliation model, status ownership, drift detection, owner references, finalizers, update semantics, validation webhook, RBAC
CRD Reference	ModelDeployment and InferenceProviderConfig CRD specifications
Providers	Provider selection algorithm, capability matrix, provider abstraction, KAITO details
Web UI Architecture	Backend API, authentication flow, data models, frontend architecture, backend services
Headlamp Plugin	Headlamp dashboard plugin architecture and design
Observability	Prometheus metrics and Kubernetes events
Versioning & Upgrades	API versioning strategy, controller upgrades, compatibility matrix
Gateway Integration	Gateway API Inference Extension setup and usage
Design Decisions	Alternatives considered, testing strategy, known limitations, out of scope
API Reference	REST API endpoint documentation
Development Guide	Setup, build, and testing instructions

System Overview​

Components at a Glance​

Component Architecture Diagram​

Provider and Runtime Registration​

Why the Frontend Is Fully Decoupled​

Gateway API Integration​

Documentation​