×

Live migration is the process of moving a running virtual machine (VM) to another node in the cluster without interrupting the virtual workload. Live migration enables smooth transitions during cluster upgrades or any time a node needs to be drained for maintenance or configuration changes.

By default, live migration traffic is encrypted using Transport Layer Security (TLS).

Live migration requirements

Live migration has the following requirements:

  • The cluster must have shared storage with ReadWriteMany (RWX) access mode.

  • The cluster must have sufficient RAM and network bandwidth.

    You must ensure that there is enough memory request capacity in the cluster to support node drains that result in live migrations. You can determine the approximate required spare memory by using the following calculation:

    Product of (Maximum number of nodes that can drain in parallel) and (Highest total VM memory request allocations across nodes)

    The default number of migrations that can run in parallel in the cluster is 5.

  • If a VM uses a host model CPU, the nodes must support the CPU.

  • Configuring a dedicated Multus network for live migration is highly recommended. A dedicated network minimizes the effects of network saturation on tenant workloads during migration.

VM migration tuning

You can adjust your cluster-wide live migration settings based on the type of workload and migration scenario. This enables you to control how many VMs migrate at the same time, the network bandwidth you want to use for each migration, and how long OKD Virtualization attempts to complete the migration before canceling the process. Configure these settings in the HyperConverged custom resource (CR).

If you are migrating multiple VMs per node at the same time, set a bandwidthPerMigration limit to prevent a large or busy VM from using a large portion of the node’s network bandwidth. By default, the bandwidthPerMigration value is 0, which means unlimited.

A large VM running a heavy workload (for example, database processing), with higher memory dirty rates, requires a higher bandwidth to complete the migration.

Post copy mode, when enabled, triggers if the initial pre-copy phase does not complete within the defined timeout. During post copy, the VM CPUs pause on the source host while transferring the minimum required memory pages. Then the VM CPUs activate on the destination host, and the remaining memory pages transfer into the destination node at runtime. This can impact performance during the transfer.

Post copy mode should not be used for critical data, or with unstable networks.

Common live migration tasks

You can perform the following live migration tasks: