Preparing your cluster - Installing | Virtualization

Supported platforms
- OKD Virtualization on AWS bare metal
- IBM Z and IBM LinuxONE compatibility
Hardware and operating system requirements
Live migration requirements
Physical resource overhead requirements
Single-node OpenShift differences
Object maximums
Cluster high-availability options

Review this section before you install OKD Virtualization to ensure that your cluster meets the requirements.

Installation method considerations: You can use any installation method, including user-provisioned, installer-provisioned, or assisted installer, to deploy OKD. However, the installation method and the cluster topology might affect OKD Virtualization functionality, such as snapshots or live migration.
Red Hat OpenShift Data Foundation: If you deploy OKD Virtualization with Red Hat OpenShift Data Foundation, you must create a dedicated storage class for Windows virtual machine disks. See Optimizing ODF PersistentVolumes for Windows VMs for details.
IPv6: You cannot run OKD Virtualization on a single-stack IPv6 cluster.

FIPS mode

If you install your cluster in FIPS mode, no additional setup is required for OKD Virtualization.

Supported platforms

You can use the following platforms with OKD Virtualization:

On-premise bare metal servers. See Planning a bare metal cluster for OKD Virtualization.
Amazon Web Services bare metal instances. See Installing a cluster on AWS with customizations.
IBM Cloud® Bare Metal Servers. See Deploy OKD Virtualization on IBM Cloud® Bare Metal nodes.
IBM Z® or IBM® LinuxONE (s390x architecture) systems where an OKD cluster is installed in a logical partition (LPAR). See Preparing to install on IBM Z and IBM LinuxONE.

Bare metal instances or servers offered by other cloud providers are not supported.

OKD Virtualization on AWS bare metal

You can run OKD Virtualization on an Amazon Web Services (AWS) bare-metal OKD cluster.

OKD Virtualization is also supported on Red Hat OpenShift Service on AWS (ROSA) Classic clusters, which have the same configuration requirements as AWS bare-metal clusters.

Before you set up your cluster, review the following summary of supported features and limitations:

Installing

You can install the cluster by using installer-provisioned infrastructure, ensuring that you specify bare-metal instance types for the worker nodes. For example, you can use the c5n.metal type value for a machine based on x86_64 architecture. You specify bare-metal instance types by editing the install-config.yaml file.

For more information, see the OKD documentation about installing on AWS.

Accessing virtual machines (VMs)

There is no change to how you access VMs by using the virtctl CLI tool or the OKD web console.
You can expose VMs by using a NodePort or LoadBalancer service.
- The load balancer approach is preferable because OKD automatically creates the load balancer in AWS and manages its lifecycle. A security group is also created for the load balancer, and you can use annotations to attach existing security groups. When you remove the service, OKD removes the load balancer and its associated resources.

Networking

You cannot use Single Root I/O Virtualization (SR-IOV) or bridge Container Network Interface (CNI) networks, including virtual LAN (VLAN). If your application requires a flat layer 2 network or control over the IP pool, consider using OVN-Kubernetes secondary overlay networks.

Storage

You can use any storage solution that is certified by the storage vendor to work with the underlying platform.

AWS bare-metal and ROSA clusters might have different supported storage solutions. Ensure that you confirm support with your storage vendor.

Using Amazon Elastic File System (EFS) or Amazon Elastic Block Store (EBS) with OKD Virtualization might cause performance and functionality limitations as shown in the following table:

Table 1. EFS and EBS performance and functionality limitations
Feature	EBS volume			EFS volume	Shared storage solutions
	gp2	gp3	io2
VM live migration	Not available	Not available	Available	Available	Available
Fast VM creation by using cloning	Available			Not available	Available
VM backup and restore by using snapshots	Available			Not available	Available

Consider using CSI storage, which supports ReadWriteMany (RWX), cloning, and snapshots to enable live migration, fast VM creation, and VM snapshots capabilities.

Hosted control planes (HCPs)

HCPs for OKD Virtualization are not currently supported on AWS infrastructure.

Additional resources

IBM Z and IBM LinuxONE compatibility

You can use OKD Virtualization in an OKD cluster that is installed in a logical partition (LPAR) on an IBM Z® or IBM® LinuxONE (s390x architecture) system.

Some features are not currently available on s390x architecture, while others require workarounds or procedural changes. These lists are subject to change.

Currently unavailable features

The following features are not available or do not function on s390x architecture:

Memory hot plugging and hot unplugging
Watchdog devices
Node Health Check Operator
SR-IOV Operator
virtual Trusted Platform Module (vTPM) devices
Red Hat OpenShift Pipelines tasks
UEFI mode for VMs
PCI passthrough
USB host passthrough
Configuring virtual GPUs
OKD Virtualization cluster checkup framework
Creating and managing Windows VMs

Functionality differences

The following features are available for use on s390x architecture but function differently or require procedural changes:

When deleting a virtual machine by using the web console, the grace period option is ignored.
When configuring the default CPU model, the spec.defaultCPUModel value is "gen15b" for an IBM Z cluster.
When hot unplugging a secondary network interface, the virtctl migrate <vm_name> command does not migrate the VM. As a workaround, restart the VM by running the following command:
```
$ virtctl restart <vm_name>
```
When configuring a downward metrics device, if you use a VM preference, the spec.preference.name value must be set to rhel.9.s390x or another available preference with the format *.s390x.

Hardware and operating system requirements

Review the following hardware and operating system requirements for OKD Virtualization.

CPU requirements

Supported by Fedora 9.

See Red Hat Ecosystem Catalog for supported CPUs.

If your worker nodes have different CPUs, live migration failures might occur because different CPUs have different capabilities. You can mitigate this issue by ensuring that your worker nodes have CPUs with the appropriate capacity and by configuring node affinity rules for your virtual machines.

See Configuring a required node affinity rule for details.

Support for AMD and Intel 64-bit architectures (x86-64-v2).
Support for Intel 64 or AMD64 CPU extensions.
Intel VT or AMD-V hardware virtualization extensions enabled.
NX (no execute) flag enabled.

Operating system requirements

Fedora CoreOS (FCOS) installed on worker nodes.

See About RHCOS for details.

Fedora worker nodes are not supported.

Storage requirements

Supported by OKD. See Optimizing storage.
You must create a default OKD Virtualization or OKD storage class. The purpose of this is to address the unique storage needs of VM workloads and offer optimized performance, reliability, and user experience. If both OKD Virtualization and OKD default storage classes exist, the OKD Virtualization class takes precedence when creating VM disks.

To mark a storage class as the default for virtualization workloads, set the annotation storageclass.kubevirt.io/is-default-virt-class to "true".

If the storage provisioner supports snapshots, you must associate a VolumeSnapshotClass object with the default storage class.

About volume and access modes for virtual machine disks

If you use the storage API with known storage providers, the volume and access modes are selected automatically. However, if you use a storage class that does not have a storage profile, you must configure the volume and access mode.

For best results, use the ReadWriteMany (RWX) access mode and the Block volume mode. This is important for the following reasons:

ReadWriteMany (RWX) access mode is required for live migration.
The Block volume mode performs significantly better than the Filesystem volume mode. This is because the Filesystem volume mode uses more storage layers, including a file system layer and a disk image file. These layers are not necessary for VM disk storage.

For example, if you use Red Hat OpenShift Data Foundation, Ceph RBD volumes are preferable to CephFS volumes.

You cannot live migrate virtual machines with the following configurations:

Storage volume with ReadWriteOnce (RWO) access mode
Passthrough features such as GPUs

Set the evictionStrategy field to None for these virtual machines. The None strategy powers down VMs during node reboots.

Live migration requirements

Shared storage with ReadWriteMany (RWX) access mode.

Sufficient RAM and network bandwidth.

You must ensure that there is enough memory request capacity in the cluster to support node drains that result in live migrations. You can determine the approximate required spare memory by using the following calculation:

Product of (Maximum number of nodes that can drain in parallel) and (Highest total VM memory request allocations across nodes)

The default number of migrations that can run in parallel in the cluster is 5.

If the virtual machine uses a host model CPU, the nodes must support the virtual machine’s host model CPU.
A dedicated Multus network for live migration is highly recommended. A dedicated network minimizes the effects of network saturation on tenant workloads during migration.

Physical resource overhead requirements

OKD Virtualization is an add-on to OKD and imposes additional overhead that you must account for when planning a cluster. Each cluster machine must accommodate the following overhead requirements in addition to the OKD requirements. Oversubscribing the physical resources in a cluster can affect performance.

The numbers noted in this documentation are based on Red Hat’s test methodology and setup. These numbers can vary based on your own individual setup and environments.

Memory overhead

Calculate the memory overhead values for OKD Virtualization by using the equations below.

Cluster memory overhead

Memory overhead per infrastructure node ≈ 150 MiB

Memory overhead per worker node ≈ 360 MiB

Additionally, OKD Virtualization environment resources require a total of 2179 MiB of RAM that is spread across all infrastructure nodes.

Virtual machine memory overhead

Memory overhead per virtual machine ≈ (1.002 × requested memory) \
              + 218 MiB \ (1)
              + 8 MiB × (number of vCPUs) \ (2)
              + 16 MiB × (number of graphics devices) \ (3)
              + (additional memory overhead) (4)

1	Required for the processes that run in the `virt-launcher` pod.
2	Number of virtual CPUs requested by the virtual machine.
3	Number of virtual graphics cards requested by the virtual machine.
4	Additional memory overhead: If your environment includes a Single Root I/O Virtualization (SR-IOV) network device or a Graphics Processing Unit (GPU), allocate 1 GiB additional memory overhead for each device. If Secure Encrypted Virtualization (SEV) is enabled, add 256 MiB. If Trusted Platform Module (TPM) is enabled, add 53 MiB.

CPU overhead

Calculate the cluster processor overhead requirements for OKD Virtualization by using the equation below. The CPU overhead per virtual machine depends on your individual setup.

Cluster CPU overhead

CPU overhead for infrastructure nodes ≈ 4 cores

OKD Virtualization increases the overall utilization of cluster level services such as logging, routing, and monitoring. To account for this workload, ensure that nodes that host infrastructure components have capacity allocated for 4 additional cores (4000 millicores) distributed across those nodes.

CPU overhead for worker nodes ≈ 2 cores + CPU overhead per virtual machine

Each worker node that hosts virtual machines must have capacity for 2 additional cores (2000 millicores) for OKD Virtualization management workloads in addition to the CPUs required for virtual machine workloads.

Virtual machine CPU overhead

If dedicated CPUs are requested, there is a 1:1 impact on the cluster CPU overhead requirement. Otherwise, there are no specific rules about how many CPUs a virtual machine requires.

Storage overhead

Use the guidelines below to estimate storage overhead requirements for your OKD Virtualization environment.

Cluster storage overhead

Aggregated storage overhead per node ≈ 10 GiB

10 GiB is the estimated on-disk storage impact for each node in the cluster when you install OKD Virtualization.

Virtual machine storage overhead

Storage overhead per virtual machine depends on specific requests for resource allocation within the virtual machine. The request could be for ephemeral storage on the node or storage resources hosted elsewhere in the cluster. OKD Virtualization does not currently allocate any additional ephemeral storage for the running container itself.

Example

As a cluster administrator, if you plan to host 10 virtual machines in the cluster, each with 1 GiB of RAM and 2 vCPUs, the memory impact across the cluster is 11.68 GiB. The estimated on-disk storage impact for each node in the cluster is 10 GiB and the CPU impact for worker nodes that host virtual machine workloads is a minimum of 2 cores.

Single-node OpenShift differences

You can install OKD Virtualization on single-node OpenShift.

However, you should be aware that Single-node OpenShift does not support the following features:

High availability
Pod disruption
Live migration
Virtual machines or templates that have an eviction strategy configured

Additional resources

Glossary of common terms for OKD storage

Object maximums

You must consider the following tested object maximums when planning your cluster:

Cluster high-availability options

You can configure one of the following high-availability (HA) options for your cluster:

Automatic high availability for installer-provisioned infrastructure (IPI) is available by deploying machine health checks.

In OKD clusters installed using installer-provisioned infrastructure and with a properly configured MachineHealthCheck resource, if a node fails the machine health check and becomes unavailable to the cluster, it is recycled. What happens next with VMs that ran on the failed node depends on a series of conditions. See Run strategies for more detailed information about the potential outcomes and how run strategies affect those outcomes.

Automatic high availability for both IPI and non-IPI is available by using the Node Health Check Operator on the OKD cluster to deploy the NodeHealthCheck controller. The controller identifies unhealthy nodes and uses a remediation provider, such as the Self Node Remediation Operator or Fence Agents Remediation Operator, to remediate the unhealthy nodes. For more information on remediation, fencing, and maintaining nodes, see the Workload Availability for Red Hat OpenShift documentation.
High availability for any platform is available by using either a monitoring system or a qualified human to monitor node availability. When a node is lost, shut it down and run oc delete node <lost_node>.

Without an external monitoring system or a qualified human monitoring node health, virtual machines lose high availability.

Preparing your cluster for OKD Virtualization

Supported platforms

OKD Virtualization on AWS bare metal

IBM Z and IBM LinuxONE compatibility

Currently unavailable features

Functionality differences

Hardware and operating system requirements

CPU requirements

Operating system requirements

Storage requirements

About volume and access modes for virtual machine disks

Live migration requirements

Physical resource overhead requirements

Memory overhead

CPU overhead

Storage overhead

Single-node OpenShift differences

Object maximums

Cluster high-availability options