Schedule Workloads to Specific GPU Nodes
When defining a Hardware Profile, you often need to ensure that the AI inference workload is strictly scheduled onto nodes with a specific type of GPU (such as an NVIDIA A100 or H100) and that the workload tolerates the taints on those dedicated nodes to avoid regular CPU workloads taking over the GPU nodes.
This guide demonstrates how to configure these constraints in a Hardware Profile so that your Data Scientists don't need to manually configure them.
Use Node Selectors
Node selectors allow you to guide pods to specific nodes based on node labels.
- Find the exact Kubernetes label of the GPU nodes in your cluster. For example:
accelerator: nvidia-a100nvidia.com/gpu.present: "true"
- Edit or create your Hardware Profile.
- In the Node Selectors section, add the Key-Value pair corresponding to the label:
- Key:
accelerator - Value:
nvidia-a100
- Key:
Once saved, any Inference Service attempting to use this Hardware Profile will inherently receive this node selector, ensuring it only lands on a node with an A100 GPU.
Use Taints and Tolerations
GPU nodes are frequently "tainted" by cluster administrators so that standard pods (like web servers or generic databases) are not scheduled on them, thereby reserving the GPU processing power for AI workloads.
If your GPU nodes have a taint like nvidia.com/gpu:NoSchedule, your Hardware Profile must include a corresponding toleration.
- Under the Tolerations section of your Hardware Profile, add a new toleration.
- Configure it to match the taint on the GPU node:
- Key:
nvidia.com/gpu - Operator:
Exists(This tolerates any value for the keynvidia.com/gpu. Alternatively, useEqualand explicitly set the Value). - Effect:
NoSchedule(Matches the restrictive effect of the taint).
- Key:
By adding this toleration to the Hardware Profile, the deployed Inference Service is explicitly granted "permission" to be scheduled on the dedicated GPU nodes.
Combined Configuration
By combining both a Node Selector (to instruct the scheduler where to go) and a Toleration (to allow the scheduler to place it there), your Hardware Profile effectively acts as a reliable blueprint for heterogeneous node architectures.