×

You can increase the number of virtual machines (VMs) on nodes by overcommitting memory (RAM). Increasing VM workload density can be useful in the following situations:

  • You have many similar workloads.

  • You have underused workloads.

Memory overcommitment can lower workload performance on a highly utilized system.

Using wasp-agent to increase VM workload density

The wasp-agent component facilitates memory overcommitment by assigning swap resources to worker nodes. It also manages pod evictions when nodes are at risk due to high swap I/O traffic or high utilization.

The wasp-agent component is deployed automatically if memoryOvercommitPercentage is set to more than 100 when you first create the HyperConverged custom resource (CR).

Swap resources can be only assigned to virtual machine workloads (VM pods) of the Burstable Quality of Service (QoS) class. VM pods of the Guaranteed QoS class and pods of any QoS class that do not belong to VMs cannot swap resources.

For descriptions of QoS classes, see Configure Quality of Service for Pods (Kubernetes documentation).

Using spec.domain.resources.requests.memory in the VM manifest disables the memory overcommit configuration. Use spec.domain.memory.guest instead.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You are logged into the cluster with the cluster-admin role.

  • A memory overcommit ratio is defined.

  • The node belongs to a worker pool.

The wasp-agent component deploys an Open Container Initiative (OCI) hook to enable swap usage for containers on the node level. The low-level nature requires the DaemonSet object to be privileged.

Procedure
  1. Configure the kubelet service to permit swap usage:

    1. Create or edit a KubeletConfig file with the parameters shown in the following example:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: custom-config
      spec:
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/worker: ''  # MCP
            #machine.openshift.io/cluster-api-machine-role: worker # machine
            #node-role.kubernetes.io/worker: '' # node
        kubeletConfig:
          failSwapOn: false
    2. Wait for the worker nodes to sync with the new configuration by running the following command:

      $ oc wait mcp worker --for condition=Updated=True --timeout=-1s
  2. Provision swap by creating a MachineConfig object:

    1. Create a MachineConfig file with the paramaters shown in the following example:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: 90-worker-swap
      spec:
        config:
          ignition:
            version: 3.5.0
          systemd:
            units:
              - contents: |
                  [Unit]
                  Description=Provision and enable swap
                  ConditionFirstBoot=no
                  ConditionPathExists=!/var/tmp/swapfile
      
                  [Service]
                  Type=oneshot
                  Environment=SWAP_SIZE_MB=5000
                  ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
                  sudo chmod 600 /var/tmp/swapfile && \
                  sudo mkswap /var/tmp/swapfile && \
                  sudo swapon /var/tmp/swapfile && \
                  free -h"
      
                  [Install]
                  RequiredBy=kubelet-dependencies.target
                enabled: true
                name: swap-provision.service
              - contents: |
                  [Unit]
                  Description=Restrict swap for system slice
                  ConditionFirstBoot=no
      
                  [Service]
                  Type=oneshot
                  ExecStart=/bin/sh -c "sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
      
                  [Install]
                  RequiredBy=kubelet-dependencies.target
                enabled: true
                name: cgroup-system-slice-config.service

      To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node by using the following formula:

      NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)

      Example:

      NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
                     = 16 GB * (1.5 - 1)
                     = 16 GB * (0.5)
                     =  8 GB
    2. Wait for the worker nodes to sync with the new configuration by running the following command:

      $ oc wait mcp worker --for condition=Updated=True --timeout=-1s
  3. Enable memory overcommitment in OKD Virtualization by using the web console or the CLI.

    • Web console

      1. In the OKD web console, go to VirtualizationOverviewSettingsGeneral settingsMemory density.

      2. Set Enable memory density to on.

    • CLI

      • Configure your OKD Virtualization to enable higher memory density and set the overcommit rate:

        $ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged --type='json' -p='[ \
          { \
          "op": "replace", \
          "path": "/spec/higherWorkloadDensity/memoryOvercommitPercentage", \
          "value": 150 \
          } \
        ]'

        Successful output:

        hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
Verification
  1. To verify the deployment of wasp-agent, run the following command:

    $ oc rollout status ds wasp-agent -n kubevirt-hyperconverged

    If the deployment is successful, the following message is displayed:

    Example output:

    daemon set "wasp-agent" successfully rolled out
  2. To verify that swap is correctly provisioned, complete the following steps:

    1. View a list of worker nodes by running the following command:

      $ oc get nodes -l node-role.kubernetes.io/worker
    2. Select a node from the list and display its memory usage by running the following command:

      $ oc debug node/<selected_node> -- free -m (1)
      1 Replace <selected_node> with the node name.

      If swap is provisioned, an amount greater than zero is displayed in the Swap: row.

      Table 1. Example output

      total

      used

      free

      shared

      buff/cache

      available

      Mem:

      31846

      23155

      1044

      6014

      14483

      8690

      Swap:

      8191

      2337

      5854

  3. Verify the OKD Virtualization memory overcommitment configuration by running the following command:

    $ oc -n openshift-cnv get HyperConverged/kubevirt-hyperconverged -o jsonpath='{.spec.higherWorkloadDensity}{"\n"}'

    Example output:

    {"memoryOvercommitPercentage":150}

    The returned value must match the value you had previously configured.

Removing the wasp-agent component

If you no longer need memory overcommitment, you can remove the wasp-agent component and associated resources from your cluster.

Prerequisites
  • You are logged in to the cluster with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Revert the memory overcommitment configuration:

    $ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged \
      --type='json' \
      -p='[{"op": "remove", "path": "/spec/higherWorkloadDensity"}]'
  2. Delete the MachineConfig that provisions swap memory:

    $ oc delete machineconfig 90-worker-swap
  3. Delete the associated KubeletConfig:

    $ oc delete kubeletconfig custom-config
  4. Wait for the worker nodes to reconcile:

    $ oc wait mcp worker --for condition=Updated=True --timeout=-1s
Verification
  • Confirm that swap is no longer enabled on a node:

    $ oc debug node/<selected_node> -- free -m

    Ensure that the Swap: row shows 0 or that no swap space shows as provisioned.

Pod eviction conditions used by wasp-agent

The wasp agent manages pod eviction when the system is heavily loaded and nodes are at risk. Eviction is triggered if one of the following conditions is met:

High swap I/O traffic

This condition is met when swap-related I/O traffic is excessively high.

Condition:

averageSwapInPerSecond > maxAverageSwapInPagesPerSecond
&&
averageSwapOutPerSecond > maxAverageSwapOutPagesPerSecond

By default, maxAverageSwapInPagesPerSecond and maxAverageSwapOutPagesPerSecond are set to 1000 pages. The default time interval for calculating the average is 30 seconds.

High swap utilization

This condition is met when swap utilization is excessively high, causing the current virtual memory usage to exceed the factored threshold. The NODE_SWAP_SPACE setting in your MachineConfig object can impact this condition.

Condition:

nodeWorkingSet + nodeSwapUsage < totalNodeMemory + totalSwapMemory × thresholdFactor

Environment variables

You can use the following environment variables to adjust the values used to calculate eviction conditions:

Environment variable

Function

MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND

Sets the value of maxAverageSwapInPagesPerSecond.

MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND

Sets the value of maxAverageSwapOutPagesPerSecond.

SWAP_UTILIZATION_THRESHOLD_FACTOR

Sets the thresholdFactor value used to calculate high swap utilization.

AVERAGE_WINDOW_SIZE_SECONDS

Sets the time interval for calculating the average swap usage.