Architecture
System Overview
AI Runway is a fully decoupled platform. The core value lives in the Kubernetes controller and CRDs. The UI is an optional, swappable layer that communicates exclusively through a REST API. Any frontend (Headlamp, a custom CLI, or the bundled React UI) can drive the same backend.

Components at a Glance
Note: The UI layer shown above includes the Frontend layer and the Backend layer.
| Component | Language | Role | Required? |
|---|---|---|---|
| Controller | Go (Kubebuilder) | Core operator — manages CRDs, provider selection, lifecycle | ✅ Yes |
| Providers / Runtime adapters | Go / provider-specific | Provider-registered controllers or configs for KAITO, Dynamo, KubeRay, llm-d, Direct vLLM | ✅ Yes (1+) |
| Backend API | TypeScript (Hono/Bun) | REST API — proxies K8s operations, auth, model catalog | Optional |
| React Frontend | React/TypeScript | Bundled Web UI | ❌ Swappable |
| Headlamp Plugin | React/TypeScript | Alternative UI inside Headlamp dashboard | ❌ Swappable |
| Shared Types | TypeScript | Shared API client & type contracts (@airunway/shared) | Library |
| kubectl / API | — | Direct CRD access via Kubernetes API | Always available |
Component Architecture Diagram
┌──────────────────────────────────────────────────────────────────────────┐
│ FRONTEND LAYER (swappable) │
│ │
│ Any of these can be used — or replaced — independently: │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────────┐ │
│ │ React UI │ │ Headlamp │ │ Any Custom UI / CLI │ │
│ │ (bundled) │ │ Plugin │ │ (dashboard, portal, etc.) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────┬───────────────┘ │
│ │ │ │ │
└─────────┼─────────────────┼──────────────────────────┼───────────────────┘
│ │ │
│ REST API (JSON over HTTP) │
│ /api/deployments │
│ /api/models │
│ /api/runtimes │
│ /api/health ... │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────────┐
│ BACKEND API LAYER │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Hono REST API (Bun runtime) │ │
│ │ │ │
│ │ • Auth middleware (K8s TokenReview) │ │
│ │ • Model catalog & HuggingFace integration │ │
│ │ • Deployment CRUD, metrics, logs │ │
│ │ • Runtime installation (Helm) │ │
│ │ • GPU validation, cost estimation │ │
│ │ • Provider-agnostic — reads InferenceProviderConfig CRDs │ │
│ └──────────────────────────┬─────────────────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────────────┘
│
Kubernetes API (client-go / @kubernetes/client-node)
│
┌─────────────────────────────┼────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ │ │
│ ┌──────────────▼──────────────┐ │
│ │ AI Runway Controller │ (core operator) │
│ │ • Validates specs │ │
│ │ • Selects providers (CEL)│ │
│ │ • Manages targets │ │
│ └──────┬───────────────┬──────┘ │
│ │ watches │ reconciles registered targets │
│ ▼ ▼ │
│ ┌──────────────────────┐ ┌─────────────────────────────────────────┐ │
│ │ ModelDeployment │ │ Providers / runtimes registered by │ │
│ │ (CRD) │ │ InferenceProviderConfig │ │
│ │ │ │ │ │
│ │ InferenceProvider │ │ KAITO | Dynamo | KubeRay | llm-d | │ │
│ │ Config (CRD) │ │ Direct vLLM (provider-supplied) │ │
│ └──────────────────────┘ │ │ │
│ │ Targets include Workspace, DynamoGraph, │ │
│ │ RayService, llm-d resources, or a │ │
│ │ plain Kubernetes Deployment │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Inference Pods (GPU/CPU) │ │
│ │ Running vLLM, sglang, TRT-LLM, llama.cpp │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
Provider and Runtime Registration
Providers and direct runtimes register capabilities, selection rules, and installation metadata through InferenceProviderConfig resources. Some providers reconcile provider-specific CRDs such as KAITO Workspace, Dynamo graph resources, KubeRay RayService, or llm-d resources. Direct runtimes such as Direct vLLM can instead target a plain Kubernetes Deployment; this repository includes the Direct vLLM provider controller and shim manifests under providers/vllm/.
Why the Frontend Is Fully Decoupled
- REST-only contract — The frontend communicates with the backend exclusively via
HTTP/JSON. There is no shared state, no server-side rendering, and no session affinity. - Shared type library —
@airunway/sharedprovides a typed API client and TypeScript types that any frontend can import. The Headlamp plugin already does this. - Backend is optional — The core platform (controller + CRDs) works without the backend/frontend. Users can manage
ModelDeploymentresources directly viakubectl, Terraform, GitOps, or any Kubernetes API client. - Swappable frontends — The bundled React UI, the Headlamp plugin, or any custom UI can all drive the same backend API simultaneously. No code changes needed.
- Auth is delegated — Authentication uses Kubernetes
TokenReview; the frontend simply passes a bearer token. Any UI that can obtain a K8s token works.
Gateway API Integration
AI Runway optionally integrates with the Gateway API Inference Extension to provide a unified inference gateway. When Gateway API Custom Resources are detected in the cluster, the controller automatically creates an InferencePool and HTTPRoute for each ModelDeployment, allowing all models to be called through a single Gateway endpoint using body-based routing on the model field.
The feature is auto-detected at startup and silently disabled if the required CRDs are not present. See Gateway Integration for full details.
Documentation
For detailed documentation on specific topics, see:
| Document | Description |
|---|---|
| Controller Architecture | Reconciliation model, status ownership, drift detection, owner references, finalizers, update semantics, validation webhook, RBAC |
| CRD Reference | ModelDeployment and InferenceProviderConfig CRD specifications |
| Providers | Provider selection algorithm, capability matrix, provider abstraction, KAITO details |
| Web UI Architecture | Backend API, authentication flow, data models, frontend architecture, backend services |
| Headlamp Plugin | Headlamp dashboard plugin architecture and design |
| Observability | Prometheus metrics and Kubernetes events |
| Versioning & Upgrades | API versioning strategy, controller upgrades, compatibility matrix |
| Gateway Integration | Gateway API Inference Extension setup and usage |
| Design Decisions | Alternatives considered, testing strategy, known limitations, out of scope |
| API Reference | REST API endpoint documentation |
| Development Guide | Setup, build, and testing instructions |