Skip to main content

Architecture

System Overview

AI Runway is a fully decoupled platform. The core value lives in the Kubernetes controller and CRDs. The UI is an optional, swappable layer that communicates exclusively through a REST API. Any frontend (Headlamp, a custom CLI, or the bundled React UI) can drive the same backend.

AI Runway Architecture

Components at a Glance

Note: The UI layer shown above includes the Frontend layer and the Backend layer.

ComponentLanguageRoleRequired?
ControllerGo (Kubebuilder)Core operator — manages CRDs, provider selection, lifecycle✅ Yes
Providers / Runtime adaptersGo / provider-specificProvider-registered controllers or configs for KAITO, Dynamo, KubeRay, llm-d, Direct vLLM✅ Yes (1+)
Backend APITypeScript (Hono/Bun)REST API — proxies K8s operations, auth, model catalogOptional
React FrontendReact/TypeScriptBundled Web UI❌ Swappable
Headlamp PluginReact/TypeScriptAlternative UI inside Headlamp dashboard❌ Swappable
Shared TypesTypeScriptShared API client & type contracts (@airunway/shared)Library
kubectl / APIDirect CRD access via Kubernetes APIAlways available

Component Architecture Diagram

┌──────────────────────────────────────────────────────────────────────────┐
│ FRONTEND LAYER (swappable) │
│ │
│ Any of these can be used — or replaced — independently: │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────────┐ │
│ │ React UI │ │ Headlamp │ │ Any Custom UI / CLI │ │
│ │ (bundled) │ │ Plugin │ │ (dashboard, portal, etc.) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────┬───────────────┘ │
│ │ │ │ │
└─────────┼─────────────────┼──────────────────────────┼───────────────────┘
│ │ │
│ REST API (JSON over HTTP) │
│ /api/deployments │
│ /api/models │
│ /api/runtimes │
│ /api/health ... │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────────┐
│ BACKEND API LAYER │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Hono REST API (Bun runtime) │ │
│ │ │ │
│ │ • Auth middleware (K8s TokenReview) │ │
│ │ • Model catalog & HuggingFace integration │ │
│ │ • Deployment CRUD, metrics, logs │ │
│ │ • Runtime installation (Helm) │ │
│ │ • GPU validation, cost estimation │ │
│ │ • Provider-agnostic — reads InferenceProviderConfig CRDs │ │
│ └──────────────────────────┬─────────────────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────────────┘

Kubernetes API (client-go / @kubernetes/client-node)

┌─────────────────────────────┼────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ │ │
│ ┌──────────────▼──────────────┐ │
│ │ AI Runway Controller │ (core operator) │
│ │ • Validates specs │ │
│ │ • Selects providers (CEL)│ │
│ │ • Manages targets │ │
│ └──────┬───────────────┬──────┘ │
│ │ watches │ reconciles registered targets │
│ ▼ ▼ │
│ ┌──────────────────────┐ ┌─────────────────────────────────────────┐ │
│ │ ModelDeployment │ │ Providers / runtimes registered by │ │
│ │ (CRD) │ │ InferenceProviderConfig │ │
│ │ │ │ │ │
│ │ InferenceProvider │ │ KAITO | Dynamo | KubeRay | llm-d | │ │
│ │ Config (CRD) │ │ Direct vLLM (provider-supplied) │ │
│ └──────────────────────┘ │ │ │
│ │ Targets include Workspace, DynamoGraph, │ │
│ │ RayService, llm-d resources, or a │ │
│ │ plain Kubernetes Deployment │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Inference Pods (GPU/CPU) │ │
│ │ Running vLLM, sglang, TRT-LLM, llama.cpp │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘

Provider and Runtime Registration

Providers and direct runtimes register capabilities, selection rules, and installation metadata through InferenceProviderConfig resources. Some providers reconcile provider-specific CRDs such as KAITO Workspace, Dynamo graph resources, KubeRay RayService, or llm-d resources. Direct runtimes such as Direct vLLM can instead target a plain Kubernetes Deployment; this repository includes the Direct vLLM provider controller and shim manifests under providers/vllm/.

Why the Frontend Is Fully Decoupled

  1. REST-only contract — The frontend communicates with the backend exclusively via HTTP/JSON. There is no shared state, no server-side rendering, and no session affinity.
  2. Shared type library@airunway/shared provides a typed API client and TypeScript types that any frontend can import. The Headlamp plugin already does this.
  3. Backend is optional — The core platform (controller + CRDs) works without the backend/frontend. Users can manage ModelDeployment resources directly via kubectl, Terraform, GitOps, or any Kubernetes API client.
  4. Swappable frontends — The bundled React UI, the Headlamp plugin, or any custom UI can all drive the same backend API simultaneously. No code changes needed.
  5. Auth is delegated — Authentication uses Kubernetes TokenReview; the frontend simply passes a bearer token. Any UI that can obtain a K8s token works.

Gateway API Integration

AI Runway optionally integrates with the Gateway API Inference Extension to provide a unified inference gateway. When Gateway API Custom Resources are detected in the cluster, the controller automatically creates an InferencePool and HTTPRoute for each ModelDeployment, allowing all models to be called through a single Gateway endpoint using body-based routing on the model field.

The feature is auto-detected at startup and silently disabled if the required CRDs are not present. See Gateway Integration for full details.

Documentation

For detailed documentation on specific topics, see:

DocumentDescription
Controller ArchitectureReconciliation model, status ownership, drift detection, owner references, finalizers, update semantics, validation webhook, RBAC
CRD ReferenceModelDeployment and InferenceProviderConfig CRD specifications
ProvidersProvider selection algorithm, capability matrix, provider abstraction, KAITO details
Web UI ArchitectureBackend API, authentication flow, data models, frontend architecture, backend services
Headlamp PluginHeadlamp dashboard plugin architecture and design
ObservabilityPrometheus metrics and Kubernetes events
Versioning & UpgradesAPI versioning strategy, controller upgrades, compatibility matrix
Gateway IntegrationGateway API Inference Extension setup and usage
Design DecisionsAlternatives considered, testing strategy, known limitations, out of scope
API ReferenceREST API endpoint documentation
Development GuideSetup, build, and testing instructions