Controller Architecture

The AI Runway controller follows a two-tier reconciliation model, inspired by the Kubernetes Container Runtime Interface (CRI) and Cluster API provider patterns:

CRI Pattern:
   kubelet ──► CRI Interface ──► containerd/CRI-O/dockershim ──► containers

AI Runway Provider Pattern:
   core ──► Provider Interface ──► kaito-provider/dynamo-provider ──► provider CRs

Just as dockershim was an adapter that made Docker work with the CRI interface, kaito-provider is an adapter that makes KAITO (which doesn't know about AI Runway) work with the AI Runway provider interface.

Lessons from Dockershim

Dockershim was deprecated and removed from Kubernetes in v1.24. Our design explicitly avoids these mistakes:

Dockershim Problem	Our Solution
Embedded in kubelet — tight coupling meant every kubelet release had to consider Docker compatibility	Provider controllers are separate deployments with independent release cycles
Maintenance burden — Docker bugs affected core Kubernetes releases	A bug in kaito-provider only affects KAITO users, not core
Blocked innovation — new features couldn't be adopted because dockershim couldn't translate them	Providers can be updated independently; `provider.overrides` provides escape hatch
Special treatment — Docker got different treatment than other runtimes	All providers go through the same InferenceProviderConfig interface

Core Controller

The core controller (modeldeployment_controller.go) is intentionally minimal:

Validates the ModelDeployment spec
Selects a provider (if spec.provider.name is empty) using InferenceProviderConfig resources
Updates status conditions (Validated, ProviderSelected)
Does NOT create provider-specific resources

Provider Controllers (Out-of-Tree)

Provider controllers watch for ModelDeployment resources where status.provider.name matches their name:

Check compatibility with the deployment configuration
Create provider-specific resources (KAITO Workspace, DynamoGraphDeployment, RayService, etc.)
Update ModelDeployment status (phase, replicas, endpoint, conditions)

This separation allows:

Clean separation of concerns
Independent provider controller versioning
Easy addition of new providers
Provider-specific features via spec.provider.overrides

Reconciliation Data Flow

                                    ┌─────────────────┐
                                    │ User applies    │
                                    │ ModelDeployment │
                                    └────────┬────────┘
                                             │
                                             ▼
                                    ┌─────────────────┐
                                    │ airunway-    │
                                    │ core webhooks   │
                                    │ (validation)    │
                                    └────────┬────────┘
                                             │
                          ┌──────────────────┴──────────────────┐
                          │                                     │
                          ▼                                     │
               ┌─────────────────────┐                         │
               │ engine specified?   │──── yes ────┐           │
               └─────────────────────┘             │           │
                          │                         │           │
                          no                        │           │
                          │                         │           │
                          ▼                         │           │
               ┌─────────────────────┐             │           │
               │ auto-select engine  │             │           │
               │ from provider       │             │           │
               │ capabilities        │             │           │
               │ (GPU/CPU + serving  │             │           │
               │  mode filtered)     │             │           │
               └─────────┬───────────┘             │           │
                         │                         │           │
                         │ sets status.engine.type  │           │
                         │                         │           │
                         └────────┬────────────────┘           │
                                  │                             │
                                  ▼                             │
               ┌─────────────────────┐                         │
               │ provider specified? │──── yes ───────────────►│
               └─────────────────────┘                         │
                          │                                     │
                          no                                    │
                          │                                     │
                          ▼                                     │
               ┌─────────────────────┐                         │
               │ built-in provider   │                         │
               │ selection algorithm │                         │
               │ (CEL-based)         │                         │
               └─────────┬───────────┘                         │
                         │                                     │
                         │ sets status.provider.name           │
                         │                                     │
                         └─────────────────────────────────────┤
                                                               │
                                                               ▼
                                                    ┌─────────────────────┐
                                                    │ Provider controller │
                                                    │ watches & creates   │
                                                    │ provider resource   │
                                                    │ (Kaito Workspace or │
                                                    │ Dynamo DGD, etc.)   │
                                                    └─────────┬───────────┘
                                                              │
                                                              ▼
                                                    ┌─────────────────────┐
                                                    │ Provider operator   │
                                                    │ reconciles pods     │
                                                    └─────────┬───────────┘
                                                              │
                                                              ▼
                                                    ┌─────────────────────┐
                                                    │ Provider syncs      │
                                                    │ status back to      │
                                                    │ ModelDeployment     │
                                                    └─────────────────────┘

Status Ownership

Multiple controllers write to ModelDeployment.status using server-side apply with distinct field managers:

Field	Owner	Description
`status.engine.type`	Core controller	Resolved engine type
`status.engine.selectedReason`	Core controller	Why this engine was chosen
`status.provider.name`	Core controller	Selected provider name
`status.provider.selectedReason`	Core controller	Why this provider was chosen
`status.phase`	Provider controller	Deploying / Running / Failed
`status.provider.resourceName`	Provider controller	Name of created upstream resource
`status.provider.resourceKind`	Provider controller	Kind of created upstream resource
`status.replicas.*`	Provider controller	Desired, ready, available counts
`status.endpoint.*`	Provider controller	Service name and port
`conditions[Validated]`	Core webhook	Spec validation result
`conditions[EngineSelected]`	Core controller	Engine selection result
`conditions[ProviderSelected]`	Core controller	Provider selection result
`conditions[ProviderCompatible]`	Provider controller	Engine/mode compatibility check
`conditions[ResourceCreated]`	Provider controller	Upstream resource creation status
`conditions[Ready]`	Provider controller	Overall readiness
`status.gateway.*`	Core controller	Gateway endpoint, model name, readiness
`conditions[GatewayReady]`	Core controller	Gateway route active

Drift Detection

The controller enforces the ModelDeployment spec on provider resources. If someone directly edits a provider resource (e.g., kubectl edit workspace my-llm), the controller overwrites those changes on the next reconciliation.

Pause annotation — to temporarily disable reconciliation for debugging:

metadata:
  annotations:
    airunway.ai/reconcile-paused: "true"

Owner References & Garbage Collection

The controller sets ownerReferences on created provider resources:

metadata:
  ownerReferences:
    - apiVersion: airunway.ai/v1alpha1
      kind: ModelDeployment
      name: my-llm
      uid: abc-123
      controller: true
      blockOwnerDeletion: true

This ensures:

Deleting ModelDeployment automatically deletes the provider resource
Provider resources cannot be accidentally orphaned
Clear ownership hierarchy in the cluster

Namespace Requirement: Provider resources are always created in the same namespace as the ModelDeployment. Cross-namespace ownership is not supported (Kubernetes owner references require same-namespace resources).

Finalizer Handling

The controller uses finalizers to ensure cleanup. If the provider operator is unavailable:

Controller attempts cleanup for 5 minutes
After timeout, controller removes finalizer with warning event
Orphaned provider resources may remain (logged for manual cleanup)

Manual escape (immediate):

kubectl patch modeldeployment my-llm --type=merge \
  -p '{"metadata":{"finalizers":[]}}'

Update Semantics

When a user updates a ModelDeployment spec, changes are handled based on field type:

Identity fields (trigger delete + recreate):

Field	Reason
`model.id`	Changing the model fundamentally changes the deployment
`model.source`	Changing from huggingface to custom changes how model is loaded
`engine.type`	Changing inference engine requires new containers (immutable once set)
`provider.name`	Changing provider requires different resource type
`serving.mode`	Changing aggregated ↔ disaggregated restructures the entire deployment

Warning: Changing identity fields causes brief downtime as the provider resource is deleted and recreated.

Config fields (in-place update):

Field	Notes
`model.servedName`	Changes API-facing model name argument
`scaling.replicas`	Can be updated without recreation
`scaling.prefill.`, `scaling.decode.`	Worker scaling (disaggregated mode)
`env`	Environment variable changes
`resources`	Memory/CPU/GPU adjustments
`engine.args`	Engine parameter tuning
`engine.contextLength`	Context length adjustment
`image`	Rolling update to new container image
`secrets.huggingFaceToken`	Updates secret reference
`podTemplate.metadata`	Updates pod labels/annotations
`nodeSelector`, `tolerations`	Scheduling constraints
`provider.overrides`	Provider-specific configuration

Status Mapping

The controller extracts meaningful error messages from provider status:

Provider	Provider Status	Unified Phase	Message Extraction
KAITO	`WorkspaceSucceeded: True`	Running	—
KAITO	`InferenceReady: False`	Deploying	Extract condition message
KAITO	Error condition	Failed	Extract error from condition
Dynamo	`state: successful`	Running	—
Dynamo	`state: deploying`	Deploying	—
Dynamo	`state: failed`	Failed	Extract from status.message
KubeRay	`serviceStatus: Running`	Running	—
KubeRay	`serviceStatus: Pending`	Pending	—
KubeRay	`serviceStatus: Failed`	Failed	Extract from serveStatuses
llmd	`Available`	Running	—
llmd	`Progressing`	Deploying	Extract condition message
llmd.	Error condition	Failed	Extract error from condition

Label Propagation

Labels from ModelDeployment.metadata.labels are selectively propagated:

To provider resource: Only labels with airunway.ai/ prefix are copied
To pods: Use spec.podTemplate.metadata.labels for pod-level labels
Controller-managed: The controller always adds airunway.ai/managed-by: airunway

Provider Overrides

The spec.provider.overrides field provides an escape hatch for provider-specific configuration not covered by the unified API:

Dynamo overrides:

provider:
  name: dynamo
  overrides:
    routerMode: "kv"          # kv | round-robin | none (default: round-robin)
    frontend:
      replicas: 2
      resources:
        cpu: "4"
        memory: "8Gi"

KubeRay overrides:

provider:
  name: kuberay
  overrides:
    head:
      resources:
        cpu: "4"
        memory: "8Gi"
      rayStartParams:
        dashboard-host: "0.0.0.0"
        num-cpus: "0"

KAITO currently has no supported overrides. Unknown keys trigger warnings; invalid types cause reconciliation failure.

Validation Webhook

The controller includes a validating admission webhook for ModelDeployment resources. Webhook TLS uses self-signed certificates managed by cert-controller (in-process, no cert-manager dependency).

Schema validation (core webhook):

Rule	Error Message
`engine: vllm` with `gpu.count: 0`	"vLLM engine requires GPU (set resources.gpu.count > 0)"
`engine: sglang` with `gpu.count: 0`	"SGLang engine requires GPU (set resources.gpu.count > 0)"
`engine: trtllm` with `gpu.count: 0`	"TensorRT-LLM engine requires GPU (set resources.gpu.count > 0)"
`mode: disaggregated` with `spec.resources.gpu`	"Cannot specify both resources.gpu and scaling.prefill/decode"
`mode: disaggregated` without `scaling.prefill` or `scaling.decode`	"Disaggregated mode requires scaling.prefill and scaling.decode"
`mode: disaggregated` without `scaling.prefill.gpu.count`	"Disaggregated mode requires scaling.prefill.gpu.count"
`mode: disaggregated` without `scaling.decode.gpu.count`	"Disaggregated mode requires scaling.decode.gpu.count"
Missing `engine.type`	"engine.type is required"
Missing `model.id` when `source: huggingface`	"model.id is required when source is huggingface"
Provider CRD not installed	"Provider '{name}' CRD not installed in cluster"

Provider compatibility (validated by provider controllers, not core):

Rule	Error Message
`engine: sglang` with `provider: kaito`	"KAITO does not support sglang engine"
`engine: trtllm` with `provider: kaito`	"KAITO does not support trtllm engine"
`engine: llamacpp` with `provider: dynamo`	"Dynamo does not support llamacpp engine"
`engine: llamacpp` with `provider: kuberay`	"KubeRay does not support llamacpp engine"
`engine: sglang` with `provider: kuberay`	"KubeRay does not support sglang engine"
`engine: trtllm` with `provider: kuberay`	"KubeRay does not support trtllm engine"
`gpu.count: 0` with `provider: dynamo`	"Dynamo requires GPU"
`gpu.count: 0` with `provider: kuberay`	"KubeRay requires GPU"
`mode: disaggregated` with `provider: kaito`	"KAITO does not support disaggregated mode"
`engine: sglang` with `provider: llmd`	"llm-d only supports vllm engine"
`gpu.count: 0` with `provider: llmd`	"llm-d requires GPU"

Provider compatibility is surfaced in ModelDeployment.status.conditions with type ProviderCompatible: False, maintaining the "core has zero provider knowledge" principle.

Webhook unavailability: If the webhook is not available (e.g., during initial setup), schema validation occurs at reconciliation time. The controller will accept the resource and set status.phase: Pending with a descriptive message until validation passes.

RBAC

Controller ServiceAccount

The controller ServiceAccount requires permissions for ModelDeployments, InferenceProviderConfigs, and provider-specific CRDs:

rules:
  - apiGroups: ["airunway.ai"]
    resources: ["modeldeployments", "modeldeployments/status"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["kaito.sh"]
    resources: ["workspaces"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["nvidia.com"]
    resources: ["dynamographdeployments"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["ray.io"]
    resources: ["rayservices"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["get", "list"]  # For dynamic version detection

User Permissions

Users only need RBAC for ModelDeployment resources — the controller acts as a privileged intermediary that creates provider resources on their behalf. This means a user who can create ModelDeployment in namespace X can effectively create provider resources in that namespace.

Secret Handling

Controller never reads secret contents
Only passes secret references to provider resources
HuggingFace tokens stay in Kubernetes secrets

Lessons from Dockershim​

Core Controller​

Provider Controllers (Out-of-Tree)​

Reconciliation Data Flow​

Status Ownership​

Drift Detection​

Owner References & Garbage Collection​

Finalizer Handling​

Update Semantics​

Status Mapping​

Label Propagation​

Provider Overrides​

Validation Webhook​

RBAC​

Controller ServiceAccount​

User Permissions​

Secret Handling​

See Also​