Configuring virtual GPUs

About using virtual GPUs with OKD Virtualization
- Adding kernel arguments to enable the IOMMU driver
About using the NVIDIA GPU Operator
- Options for configuring mediated devices
Creating and exposing mediated devices
Removing mediated devices from the cluster
How vGPUs are assigned to nodes
Assigning a vGPU to a VM by using the CLI
Assigning a vGPU to a VM by using the web console
Additional resources

If you have graphics processing unit (GPU) cards, OKD Virtualization can automatically create virtual GPUs (vGPUs) that you can assign to virtual machines (VMs).

About using virtual GPUs with OKD Virtualization

You can create vGPUs for your virtual machines (VMs) using supported GPU cards. Refer to your hardware vendor’s documentation for functionality and support details.

You can use the NVIDIA GPU Operator to manage vGPUs for your virtual machines (VMs) on the cluster nodes. You must add these devices to the HyperConverged custom resource (CR) so that OKD Virtualization can discover and make them available to virtual machines.

A mediated device is a physical device that is divided into one or more virtual devices. vGPUs are a type of mediated device (mdev) where the performance of the physical GPU is divided among the virtual devices. You can assign mediated devices to one or more virtual machines (VMs), but the number of guests must be compatible with your GPU. Some GPUs do not support multiple guests.

Adding kernel arguments to enable the IOMMU driver

You must enable the Input-Output Memory Management Unit (IOMMU) driver before you can configure mediated devices. To enable the IOMMU driver in the kernel, create the MachineConfig object and add the kernel arguments.

Prerequisites

You have cluster administrator permissions.
Your CPU hardware is Intel or AMD.
You enabled Intel Virtualization Technology for Directed I/O extensions or AMD IOMMU in the BIOS.
You have installed the OpenShift CLI (oc).

Procedure

Create a MachineConfig object that identifies the kernel argument. The following example shows a kernel argument for an Intel CPU.
```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 100-worker-iommu
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
      - intel_iommu=on
# ...
```
- metadata.labels.machineconfiguration.openshift.io/role specifies that the new kernel argument is applied only to worker nodes.
- metadata.name specifies the ranking of this kernel argument (100) among the machine configs and its purpose. If you have an AMD CPU, specify the kernel argument as amd_iommu=on.
- spec.kernelArguments specifies the kernel argument as intel_iommu for an Intel CPU.

Create the new MachineConfig object:

$ oc create -f 100-worker-kernel-arg-iommu.yaml

Verification

Verify that the new MachineConfig object was added by entering the following command and observing the output:

$ oc get MachineConfig

Example output:

NAME                                       IGNITIONVERSION                    AGE
00-master                                   3.5.0                             164m
00-worker                                   3.5.0                             164m
01-master-container-runtime                 3.5.0                             164m
01-master-kubelet                           3.5.0                             164m
01-worker-container-runtime                 3.5.0                             164m
01-worker-kubelet                           3.5.0                             164m
100-master-chrony-configuration             3.5.0                             169m
100-master-set-core-user-password           3.5.0                             169m
100-worker-chrony-configuration             3.5.0                             169m
100-worker-iommu                            3.5.0                             14s

Verify that IOMMU is enabled at the operating system (OS) level by entering the following command:
```
$ dmesg | grep -i iommu
```
- If IOMMU is enabled, output is displayed as shown in the following example:
  
  Example output:
  Intel: [ 0.000000] DMAR: Intel(R) IOMMU Driver AMD: [ 0.000000] AMD-Vi: IOMMU Initialized

About using the NVIDIA GPU Operator

You can use the NVIDIA GPU Operator to provision worker nodes for running GPU-accelerated virtual machines (VMs) in OKD Virtualization.

The NVIDIA GPU Operator manages NVIDIA GPU resources in an OKD cluster and automates tasks when preparing nodes for GPU workloads.

Before you can deploy application workloads to a GPU resource, you must install components such as the NVIDIA drivers that enable the compute unified device architecture (CUDA), Kubernetes device plugin, container runtime, and other features, such as automatic node labeling and monitoring. By automating these tasks, you can quickly scale the GPU capacity of your infrastructure. The NVIDIA GPU Operator can especially facilitate provisioning complex artificial intelligence and machine learning (AI/ML) workloads.

Options for configuring mediated devices

There are two available methods for configuring mediated devices when using the NVIDIA GPU Operator. The method that Red Hat tests uses OKD Virtualization features to schedule mediated devices, while the NVIDIA method only uses the GPU Operator.

Using the NVIDIA GPU Operator to configure mediated devices

This method exclusively uses the NVIDIA GPU Operator to configure mediated devices. To use this method, refer to NVIDIA GPU Operator with OKD Virtualization in the NVIDIA documentation.

Using OKD Virtualization to configure mediated devices

This method, which is tested by Red Hat, uses OKD Virtualization’s capabilities to configure mediated devices. In this case, the NVIDIA GPU Operator is only used for installing drivers with the NVIDIA vGPU Manager. The GPU Operator does not configure mediated devices.

When using the OKD Virtualization method, you still configure the GPU Operator by following the NVIDIA documentation. However, this method differs from the NVIDIA documentation in the following ways:

You must not overwrite the default disableMDEVConfiguration: false setting in the HyperConverged custom resource (CR).

Setting this feature gate as described in the NVIDIA documentation prevents OKD Virtualization from configuring mediated devices.

You must configure your ClusterPolicy manifest so that it matches the following example:

kind: ClusterPolicy
apiVersion: nvidia.com/v1
metadata:
  name: gpu-cluster-policy
spec:
  operator:
    defaultRuntime: crio
    use_ocp_driver_toolkit: true
    initContainer: {}
  sandboxWorkloads:
    enabled: true
    defaultWorkload: vm-vgpu
  driver:
    enabled: false
  dcgmExporter: {}
  dcgm:
    enabled: true
  daemonsets: {}
  devicePlugin: {}
  gfd: {}
  migManager:
    enabled: true
  nodeStatusExporter:
    enabled: true
  mig:
    strategy: single
  toolkit:
    enabled: true
  validator:
    plugin:
      env:
        - name: WITH_WORKLOAD
          value: "true"
  vgpuManager:
    enabled: true
    repository: <vgpu_container_registry>
    image: <vgpu_image_name>
    version: <nvidia_vgpu_manager_version>
  vgpuDeviceManager:
    enabled: false
  sandboxDevicePlugin:
    enabled: false
  vfioManager:
    enabled: false

spec.drive.enabled is set to false. This is not required for VMs.
spec.vgpuManager.enabled is set to true. This is required if you want to use vGPUs with VMs.
spec.vgpuManager.repository is set to your registry value.
spec.vgpuManager.version is set to the version of the vGPU driver you have downloaded from the NVIDIA website and used to build the image.
spec.vgpuDeviceManager.enabled is set to false to allow OKD Virtualization to configure mediated devices instead of the NVIDIA GPU Operator.
spec.sandboxDevicePlugin.enabled is set to false to prevent discovery and advertising of the vGPU devices to the kubelet.
spec.vfioManager.enabled is set to false to prevent loading the vfio-pci driver. Instead, follow the OKD Virtualization documentation to configure PCI passthrough.

Creating and exposing mediated devices

As an administrator, you can create mediated devices and expose them to the cluster by editing the HyperConverged custom resource (CR). Before you edit the CR, explore a worker node to find the configuration values that are specific to your hardware devices.

Prerequisites

You installed the OpenShift CLI (oc).
You enabled the Input-Output Memory Management Unit (IOMMU) driver.
If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices.
- If you use NVIDIA cards, you installed the NVIDIA GRID driver.

Procedure

Identify the name selector and resource name values for the mediated devices by exploring a worker node:
1. Start a debugging session with the worker node by using the oc debug command. For example:
  $ oc debug node/node-11.redhat.com
2. Change the root directory of the shell process to the file system of the host node by running the following command:
  # chroot /host
3. Navigate to the mdev_bus directory and view its contents. Each subdirectory name is a PCI address of a physical GPU. For example:
  # cd sys/class/mdev_bus && ls
  Example output:
  0000:4b:00.4
4. Go to the directory for your physical device and list the supported mediated device types as defined by the hardware vendor. For example:
  # cd 0000:4b:00.4 && ls mdev_supported_types
  Example output:
  nvidia-742 nvidia-744 nvidia-746 nvidia-748 nvidia-750 nvidia-752 nvidia-743 nvidia-745 nvidia-747 nvidia-749 nvidia-751 nvidia-753
5. Select the mediated device type that you want to use and identify its name selector value by viewing the contents of its name file. For example:
  # cat nvidia-745/name
  Example output:
  NVIDIA A2-2Q
Open the HyperConverged CR in your default editor by running the following command:
```
$ oc edit hyperconverged kubevirt-hyperconverged -n kubevirt-hyperconverged
```
Create and expose the mediated devices by updating the configuration:
1. Create mediated devices by adding them to the spec.mediatedDevicesConfiguration stanza.
2. Expose the mediated devices to the cluster by adding the mdevNameSelector and resourceName values to the spec.permittedHostDevices.mediatedDevices stanza. The resourceName value is based on the mdevNameSelector value, but you use underscores instead of spaces.
  
  Example HyperConverged CR:
  apiVersion: hco.kubevirt.io/v1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: kubevirt-hyperconverged spec: mediatedDevicesConfiguration: mediatedDeviceTypes: - nvidia-745 nodeMediatedDeviceTypes: - mediatedDeviceTypes: - nvidia-746 nodeSelector: kubernetes.io/hostname: node-11.redhat.com permittedHostDevices: mediatedDevices: - mdevNameSelector: NVIDIA A2-2Q resourceName: nvidia.com/NVIDIA_A2-2Q - mdevNameSelector: NVIDIA A2-4Q resourceName: nvidia.com/NVIDIA_A2-4Q # ...
  where:
  
  mediatedDeviceTypes
  
  Specifies global settings for the cluster and is required.
  
  nodeMediatedDeviceTypes
  
  Specifies global configuration overrides for a specific node or group of nodes and is optional. Must be used with the global mediatedDeviceTypes configuration.
  
  mediatedDeviceTypes
  
  Specifies an override to the global mediatedDeviceTypes configuration for the specified nodes. Required if you use nodeMediatedDeviceTypes.
  
  nodeSelector
  
  Specifies the node selector and must include a key:value pair. Required if you use nodeMediatedDeviceTypes.
  
  mdevNameSelector
  
  Specifies the mediated devices that map to this value on the host.
  
  resourceName
  
  Specifies the matching resource name that is allocated on the node.
Save your changes and exit the editor.

Verification

Confirm that the virtual GPU is attached to the node by running the following command:

$ oc get node <node_name> -o json \
  | jq '.status.allocatable \
  | with_entries(select(.key | startswith("nvidia.com/"))) \
  | with_entries(select(.value != "0"))'

Removing mediated devices from the cluster

To remove a mediated device from the cluster, delete the information for that device from the HyperConverged custom resource (CR).

Prerequisites

You have installed the OpenShift CLI (oc).

Procedure

Edit the HyperConverged CR in your default editor by running the following command:
```
$ oc edit hyperconverged kubevirt-hyperconverged -n kubevirt-hyperconverged
```
Remove the device information from the spec.mediatedDevicesConfiguration and spec.permittedHostDevices stanzas of the HyperConverged CR. Removing both entries ensures that you can later create a new mediated device type on the same node. For example:
```
apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: kubevirt-hyperconverged
spec:
  mediatedDevicesConfiguration:
    mediatedDeviceTypes:
      - nvidia-231
  permittedHostDevices:
    mediatedDevices:
    - mdevNameSelector: GRID T4-2Q
      resourceName: nvidia.com/GRID_T4-2Q
```
- To remove the nvidia-231 device type, delete it from the mediatedDeviceTypes array.
- To remove the GRID T4-2Q device, delete the mdevNameSelector field and its corresponding resourceName field.
Save your changes and exit the editor.

How vGPUs are assigned to nodes

OKD Virtualization configures a single mdev type and the maximum number of instances of the selected mdev type for each physical device. The cluster architecture affects how devices are created and assigned to nodes.

Large cluster with multiple cards per node

On nodes with multiple cards that can support similar vGPU types, the relevant device types are created in a round-robin manner. For example:

# ...
mediatedDevicesConfiguration:
  mediatedDeviceTypes:
  - nvidia-222
  - nvidia-228
  - nvidia-105
  - nvidia-108
# ...

In this scenario, each node has two cards, both of which support the following vGPU types:

nvidia-105
# ...
nvidia-108
nvidia-217
nvidia-299
# ...

On each node, OKD Virtualization creates the following vGPUs:

16 vGPUs of type nvidia-105 on the first card.
2 vGPUs of type nvidia-108 on the second card.

One node has a single card that supports more than one requested vGPU type

OKD Virtualization uses the supported type that comes first on the mediatedDeviceTypes list.

For example, the card on a node card supports nvidia-223 and nvidia-224. The following mediatedDeviceTypes list is configured:

# ...
mediatedDevicesConfiguration:
  mediatedDeviceTypes:
  - nvidia-22
  - nvidia-223
  - nvidia-224
# ...

In this example, OKD Virtualization uses the nvidia-223 type.

Assigning a vGPU to a VM by using the CLI

Assign mediated devices such as virtual GPUs (vGPUs) to virtual machines (VMs).

Prerequisites

The mediated device is configured in the HyperConverged custom resource.
The virtual machine (VM) is stopped.

Procedure

Assign the mediated device to a VM by editing the spec.domain.devices.gpus stanza of the VirtualMachine manifest.

Example virtual machine manifest:
```
apiVersion: kubevirt.io/v1
kind: VirtualMachine
spec:
  domain:
    devices:
      gpus:
      - deviceName: nvidia.com/TU104GL_Tesla_T4
        name: gpu1
      - deviceName: nvidia.com/GRID_T4-2Q
        name: gpu2
```
- spec.template.spec.domain.devices.gpus.deviceName specifies the resource name associated with the mediated device.
- spec.template.spec.domain.devices.gpus.name specifies a name to identify the device on the VM.

Verification

To verify that the device is available from the virtual machine, run the following command, substituting <device_name> with the deviceName value from the VirtualMachine manifest:
```
$ lspci -nnk | grep <device_name>
```

Assigning a vGPU to a VM by using the web console

You can assign virtual GPUs to virtual machines by using the OKD web console.

You can add hardware devices to virtual machines created from customized templates or a YAML file. You cannot add devices to pre-supplied boot source templates for specific operating systems.

Prerequisites

The vGPU is configured as a mediated device in your cluster.
- To view the devices that are connected to your cluster, click Compute → Hardware Devices from the side menu.
The VM is stopped.

Procedure

In the OKD web console, click Virtualization → VirtualMachines from the side menu.
Select the VM that you want to assign the device to.
On the Details tab, click GPU devices.
Click Add GPU device.
Enter an identifying value in the Name field.
From the Device name list, select the device that you want to add to the VM.
Click Save.

Verification

To confirm that the devices were added to the VM, click the YAML tab and review the VirtualMachine configuration. Mediated devices are added to the spec.domain.devices stanza.

About using virtual GPUs with OKD Virtualization

Adding kernel arguments to enable the IOMMU driver

About using the NVIDIA GPU Operator

Options for configuring mediated devices

Creating and exposing mediated devices

Removing mediated devices from the cluster

How vGPUs are assigned to nodes

Assigning a vGPU to a VM by using the CLI

Assigning a vGPU to a VM by using the web console

Additional resources