Multi-Node Distributed Inference

The Headlamp-KAITO plugin provides powerful multi-node distributed inference capabilities, enabling you to deploy large AI models across multiple GPU nodes for improved performance, scalability, and resource utilization.

Overview

Multi-node distributed inference allows you to:

Scale horizontally by distributing model workloads across multiple GPU nodes
Handle larger models that don't fit on a single node
Increase availability with redundancy across nodes

Deployment Strategies

1. Explicit Node Selection

Select specific nodes for precise control over your deployment:

resource:
  count: 3
  preferredNodes:
    - gpu-node-1
    - gpu-node-2  
    - gpu-node-3
  instanceType: Standard_NC80adis_H100_v5
  labelSelector:
    matchLabels:
      node.kubernetes.io/instance-type: Standard_NC80adis_H100_v5

Use cases:

High-performance workloads requiring specific hardware
Testing on known node configurations

2. Count-Based Auto-Provisioning

Specify the number of nodes while letting KAITO handle the selection:

resource:
  count: 4
  instanceType: Standard_NC80adis_H100_v5
  labelSelector:
    matchLabels:
      apps: llama-3-8b

Use cases:

Predictable scaling requirements
Cost optimization with specific node counts
Load balancing across available resources

3. Full Auto-Provisioning

Let KAITO determine the optimal configuration:

resource:
  instanceType: Standard_NC80adis_H100_v5
  labelSelector:
    matchLabels:
      apps: llama-3-8b

Use cases:

Development and testing environments
Dynamic workloads with varying requirements
Simplified deployment workflows

Node Selection Interface

The plugin provides an intuitive interface for managing multi-node deployments:

Node Selection Features

GPU Node Filtering: Automatically shows only GPU-enabled nodes
Instance Type Detection: Dynamically extracts instance types from selected nodes
Node Status Indicators: Real-time health and availability status
Taint Awareness: Displays node scheduling restrictions
Label-Based Filtering: Advanced filtering using Kubernetes labels

Quick Selectors

Pre-configured options for common scenarios:

Standard NC24 Instances
Standard NC96 Instances
GPU + AMD64 architecture
Custom label combinations

Benefits of Multi-Node Inference

Performance Improvements

Parallel Processing: Distribute inference requests across multiple nodes
Reduced Latency: Process multiple requests simultaneously
Higher Throughput: Aggregate processing power of multiple GPUs

Scalability Advantages

Horizontal Scaling: Add more nodes to handle increased load
Model Sharding: Split large models across multiple nodes
Dynamic Scaling: Adjust node count based on demand

Cost Optimization

Instance Type Flexibility: Use smaller, more cost-effective instances
Spot Instance Support: Leverage spot pricing across multiple nodes
Resource Efficiency: Better GPU utilization across the cluster

High Availability

Fault Tolerance: Continue operation if individual nodes fail
Load Distribution: Prevent single points of failure
Graceful Degradation: Maintain service with reduced capacity

Integration with KAITO

The multi-node features integrate seamlessly with KAITO's capabilities:

Automatic Model Sharding: KAITO handles model distribution
Service Discovery: Unified endpoints for distributed clusters
Health Monitoring: Built-in monitoring and alerting
Dynamic Scaling: Runtime adjustment of node configurations

Getting Started

Access the Deployment Dialog: Select a model from the catalog
Configure Node Selection: Choose your preferred deployment strategy
Review Configuration: Verify the generated YAML configuration
Deploy: Apply the configuration to your cluster
Monitor: Track deployment status and performance metrics

The multi-node distributed inference feature makes it easy to deploy and manage large-scale AI workloads while maintaining the simplicity and user-friendliness that KAITO is known for.

Overview​

Deployment Strategies​

1. Explicit Node Selection​

2. Count-Based Auto-Provisioning​

3. Full Auto-Provisioning​

Node Selection Interface​

Node Selection Features​

Quick Selectors​

Benefits of Multi-Node Inference​

Performance Improvements​

Scalability Advantages​

Cost Optimization​

High Availability​

Integration with KAITO​

Getting Started​

Overview

Deployment Strategies

1. Explicit Node Selection

2. Count-Based Auto-Provisioning

3. Full Auto-Provisioning

Node Selection Interface

Node Selection Features

Quick Selectors

Benefits of Multi-Node Inference

Performance Improvements

Scalability Advantages

Cost Optimization

High Availability

Integration with KAITO

Getting Started