Configuring higher VM workload density - Postinstallation configuration | Virtualization

Using wasp-agent to increase VM workload density
Removing the wasp-agent component
Pod eviction conditions used by wasp-agent
- Environment variables

You can increase the number of virtual machines (VMs) on nodes by overcommitting memory (RAM). Increasing VM workload density can be useful in the following situations:

You have many similar workloads.
You have underused workloads.

Memory overcommitment can lower workload performance on a highly utilized system.

Using wasp-agent to increase VM workload density

The wasp-agent component facilitates memory overcommitment by assigning swap resources to worker nodes. It also manages pod evictions when nodes are at risk due to high swap I/O traffic or high utilization.

The wasp-agent component is deployed automatically if memoryOvercommitPercentage is set to more than 100 when you first create the HyperConverged custom resource (CR).

Swap resources can be only assigned to virtual machine workloads (VM pods) of the Burstable Quality of Service (QoS) class. VM pods of the Guaranteed QoS class and pods of any QoS class that do not belong to VMs cannot swap resources.

For descriptions of QoS classes, see Configure Quality of Service for Pods (Kubernetes documentation).

Using spec.domain.resources.requests.memory in the VM manifest disables the memory overcommit configuration. Use spec.domain.memory.guest instead.

Prerequisites

You have installed the OpenShift CLI (oc).
You are logged into the cluster with the cluster-admin role.
A memory overcommit ratio is defined.
The node belongs to a worker pool.

The wasp-agent component deploys an Open Container Initiative (OCI) hook to enable swap usage for containers on the node level. The low-level nature requires the DaemonSet object to be privileged.

Procedure

Configure the kubelet service to permit swap usage:

Create or edit a KubeletConfig file with the parameters shown in the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-config
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ''  # MCP
      #machine.openshift.io/cluster-api-machine-role: worker # machine
      #node-role.kubernetes.io/worker: '' # node
  kubeletConfig:
    failSwapOn: false

Wait for the worker nodes to sync with the new configuration by running the following command:
```
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
```

Provision swap by creating a MachineConfig object:

Create a MachineConfig file with the paramaters shown in the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 90-worker-swap
spec:
  config:
    ignition:
      version: 3.5.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Provision and enable swap
            ConditionFirstBoot=no
            ConditionPathExists=!/var/tmp/swapfile

            [Service]
            Type=oneshot
            Environment=SWAP_SIZE_MB=5000
            ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
            sudo chmod 600 /var/tmp/swapfile && \
            sudo mkswap /var/tmp/swapfile && \
            sudo swapon /var/tmp/swapfile && \
            free -h"

            [Install]
            RequiredBy=kubelet-dependencies.target
          enabled: true
          name: swap-provision.service
        - contents: |
            [Unit]
            Description=Restrict swap for system slice
            ConditionFirstBoot=no

            [Service]
            Type=oneshot
            ExecStart=/bin/sh -c "sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""

            [Install]
            RequiredBy=kubelet-dependencies.target
          enabled: true
          name: cgroup-system-slice-config.service

To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node by using the following formula:

NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)

Example:

NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
               = 16 GB * (1.5 - 1)
               = 16 GB * (0.5)
               =  8 GB

Wait for the worker nodes to sync with the new configuration by running the following command:
```
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
```

Enable memory overcommitment in OKD Virtualization by using the web console or the CLI.
- Web console
  
  In the OKD web console, go to Virtualization → Overview → Settings → General settings → Memory density.
  
  Set Enable memory density to on.
- CLI
  
  Configure your OKD Virtualization to enable higher memory density and set the overcommit rate:
  
  $ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged --type='json' -p='[ \ { \ "op": "replace", \ "path": "/spec/higherWorkloadDensity/memoryOvercommitPercentage", \ "value": 150 \ } \ ]'
  
  Successful output:
  
  hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched

Verification

To verify the deployment of wasp-agent, run the following command:
```
$ oc rollout status ds wasp-agent -n kubevirt-hyperconverged
```
If the deployment is successful, the following message is displayed:

Example output:
```
daemon set "wasp-agent" successfully rolled out
```
To verify that swap is correctly provisioned, complete the following steps:
1. View a list of worker nodes by running the following command:
  $ oc get nodes -l node-role.kubernetes.io/worker
2. Select a node from the list and display its memory usage by running the following command:
  $ oc debug node/<selected_node> -- free -m (1)
  1 Replace <selected_node> with the node name.
  
  If swap is provisioned, an amount greater than zero is displayed in the Swap: row.
  
  Table 1. Example output
  
  total
  
  used
  
  free
  
  shared
  
  buff/cache
  
  available
  
  Mem:
  
  31846
  
  23155
  
  1044
  
  6014
  
  14483
  
  8690
  
  Swap:
  
  8191
  
  2337
  
  5854
Verify the OKD Virtualization memory overcommitment configuration by running the following command:
```
$ oc -n openshift-cnv get HyperConverged/kubevirt-hyperconverged -o jsonpath='{.spec.higherWorkloadDensity}{"\n"}'
```
Example output:
```
{"memoryOvercommitPercentage":150}
```
The returned value must match the value you had previously configured.

Table 1. Example output
	total	used	free	shared	buff/cache	available
Mem:	31846	23155	1044	6014	14483	8690
Swap:	8191	2337	5854

Removing the wasp-agent component

If you no longer need memory overcommitment, you can remove the wasp-agent component and associated resources from your cluster.

Prerequisites

You have logged in to the cluster with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Revert the memory overcommitment configuration by running the following command:

$ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged \
  --type='json' \
  -p='[{"op": "remove", "path": "/spec/higherWorkloadDensity"}]'

Delete the MachineConfig that provisions swap memory by running the following command:
```
$ oc delete machineconfig 90-worker-swap
```
Delete the associated KubeletConfig custom resource (CR) by running the following command:
```
$ oc delete kubeletconfig custom-config
```
Wait for the worker nodes to reconcile, by running the following command and observing the output:
```
$ oc wait mcp worker --for condition=Updated=True --timeout=-1s
```

Verification

Confirm that swap is no longer enabled on a node, by running the following command and observing the output:
```
$ oc debug node/<selected_node> -- free -m
```
Ensure that the Swap: row shows 0 or that no swap space shows as provisioned.

Pod eviction conditions used by wasp-agent

The wasp agent manages pod eviction when the system is heavily loaded and nodes are at risk. Eviction triggers if one of the following conditions occurs:

High swap I/O traffic

This condition occurs when swap-related I/O traffic is excessively high.

Condition:

averageSwapInPerSecond > maxAverageSwapInPagesPerSecond
&&
averageSwapOutPerSecond > maxAverageSwapOutPagesPerSecond

By default, the maxAverageSwapInPagesPerSecond and maxAverageSwapOutPagesPerSecond values are 1000 pages. The default time interval for calculating the average is 30 seconds.

High swap utilization

This condition occurs when swap utilization is excessively high, causing the current virtual memory usage to exceed the factored threshold. The NODE_SWAP_SPACE setting in your MachineConfig object can impact this condition.

Condition:

nodeWorkingSet + nodeSwapUsage < totalNodeMemory + totalSwapMemory × thresholdFactor

Environment variables

You can use the following environment variables to adjust the values used to calculate eviction conditions:

Environment variable

Function

MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND

Sets the value of maxAverageSwapInPagesPerSecond.

MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND

Sets the value of maxAverageSwapOutPagesPerSecond.

SWAP_UTILIZATION_THRESHOLD_FACTOR

Sets the thresholdFactor value used to calculate high swap utilization.

AVERAGE_WINDOW_SIZE_SECONDS

Sets the time interval for calculating the average swap usage.