FAQ
How do I ensure preferred nodes are correctly labeled for use in my workspace?
For using preferred nodes, make sure the node has the label specified in the labelSelector under matchLabels. For example, if your labelSelector is:
labelSelector:
matchLabels:
apps: falcon-7b
Then the node should have the label: apps=falcon-7b
.
How to upgrade the existing deployment to use the latest model configuration?
When using hosted public models, you can delete the existing inference workload (Deployment
or StatefulSet
) manually, and the workspace controller will create a new one with the latest preset configuration (e.g., the image version) defined in the current release.
For private models, it is recommended to create a new workspace with a new image version in the Spec.
How to update model/inference parameters to override the KAITO Preset Configuration?
KAITO provides a limited capability to override preset configurations for models that use transformer
runtime manually.
To update parameters for a deployed model, perform kubectl edit
against the workload, which could be either a StatefulSet
or Deployment
.
For example, to enable 4-bit quantization on a falcon-7b-instruct
deployment:
kubectl edit deployment workspace-falcon-7b-instruct
Within the deployment specification, locate and modify the command field.
Original:
accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16
Modified to enable 4-bit Quantization:
accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16 --load_in_4bit
Currently, we allow users to change the following parameters manually:
pipeline
: For text-generation models this can be eithertext-generation
orconversational
.load_in_4bit
orload_in_8bit
: Model quantization resolution.
Should you need to customize other parameters, kindly file an issue for potential future inclusion.
What is the difference between instruct and non-instruct models?
The main distinction lies in their intended use cases:
- Instruct models: Fine-tuned versions optimized for interactive chat applications. They are typically the preferred choice for most implementations due to their enhanced performance in conversational contexts.
- Non-instruct (raw) models: Designed for further fine-tuning with your own data.