The NVIDIA GPU Operator manages NVIDIA GPU resources in an OKD cluster and automates tasks related to bootstrapping GPU nodes.
Since the GPU is a special resource in the cluster, you must install some components before deploying application workloads onto the GPU.
These components include the NVIDIA drivers which enables compute unified device architecture (CUDA), Kubernetes device plugin, container runtime and others such as automatic node labelling, monitoring and more.
|
The NVIDIA GPU Operator is supported only by NVIDIA. For more information about obtaining support from NVIDIA, see Obtaining Support from NVIDIA.
|
There are two ways to enable GPUs with OKD OKD Virtualization: the OKD-native way described here and by using the NVIDIA GPU Operator.
The NVIDIA GPU Operator is a Kubernetes Operator that enables OKD OKD Virtualization to expose GPUs to virtualized workloads running on OKD.
It allows users to easily provision and manage GPU-enabled virtual machines, providing them with the ability to run complex artificial intelligence/machine learning (AI/ML) workloads on the same platform as their other workloads.
It also provides an easy way to scale the GPU capacity of their infrastructure, allowing for rapid growth of GPU-based workloads.