×

You can increase the number of virtual machines (VMs) on nodes by overcommitting memory (RAM). Increasing VM workload density can be useful in the following situations:

  • You have many similar workloads.

  • You have underused workloads.

Memory overcommitment can lower workload performance on a highly utilized system.

Using wasp-agent to increase VM workload density

The wasp-agent component facilitates memory overcommitment by assigning swap resources to worker nodes. It also manages pod evictions when nodes are at risk due to high swap I/O traffic or high utilization.

Swap resources can be only assigned to virtual machine workloads (VM pods) of the Burstable Quality of Service (QoS) class. VM pods of the Guaranteed QoS class and pods of any QoS class that do not belong to VMs cannot swap resources.

For descriptions of QoS classes, see Configure Quality of Service for Pods (Kubernetes documentation).

Using spec.domain.resources.requests.memory in the VM manifest disables the memory overcommit configuration. Use spec.domain.memory.guest instead.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You are logged into the cluster with the cluster-admin role.

  • A memory overcommit ratio is defined.

  • The node belongs to a worker pool.

The wasp-agent component deploys an Open Container Initiative (OCI) hook to enable swap usage for containers on the node level. The low-level nature requires the DaemonSet object to be privileged.

Procedure
  1. Configure the kubelet service to permit swap usage:

    1. Create or edit a KubeletConfig file with the parameters shown in the following example:

      Example of a KubeletConfig file
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: custom-config
      spec:
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/worker: ''  # MCP
            #machine.openshift.io/cluster-api-machine-role: worker # machine
            #node-role.kubernetes.io/worker: '' # node
        kubeletConfig:
          failSwapOn: false
    2. Wait for the worker nodes to sync with the new configuration by running the following command:

      $ oc wait mcp worker --for condition=Updated=True --timeout=-1s
  2. Provision swap by creating a MachineConfig object. For example:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 90-worker-swap
    spec:
      config:
        ignition:
          version: 3.5.0
        systemd:
          units:
            - contents: |
                [Unit]
                Description=Provision and enable swap
                ConditionFirstBoot=no
                ConditionPathExists=!/var/tmp/swapfile
    
                [Service]
                Type=oneshot
                Environment=SWAP_SIZE_MB=5000
                ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
                sudo chmod 600 /var/tmp/swapfile && \
                sudo mkswap /var/tmp/swapfile && \
                sudo swapon /var/tmp/swapfile && \
                free -h"
    
                [Install]
                RequiredBy=kubelet-dependencies.target
              enabled: true
              name: swap-provision.service
            - contents: |
                [Unit]
                Description=Restrict swap for system slice
                ConditionFirstBoot=no
    
                [Service]
                Type=oneshot
                ExecStart=/bin/sh -c "sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""
    
                [Install]
                RequiredBy=kubelet-dependencies.target
              enabled: true
              name: cgroup-system-slice-config.service

    To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node by using the following formula:

    NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)
    Example
    NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
                   = 16 GB * (1.5 - 1)
                   = 16 GB * (0.5)
                   =  8 GB
  3. Create a privileged service account by running the following commands:

    $ oc adm new-project wasp
    $ oc create sa -n wasp wasp
    $ oc create clusterrolebinding wasp --clusterrole=cluster-admin --serviceaccount=wasp:wasp
    $ oc adm policy add-scc-to-user -n wasp privileged -z wasp
  4. Wait for the worker nodes to sync with the new configuration by running the following command:

    $ oc wait mcp worker --for condition=Updated=True --timeout=-1s
  5. Determine the pull URL for the wasp agent image by running the following command:

    $ oc get csv -n openshift-cnv -l=operators.coreos.com/kubevirt-hyperconverged.openshift-cnv -ojson | jq '.items[0].spec.relatedImages[] | select(.name|test(".*wasp-agent.*")) | .image'
  6. Deploy wasp-agent by creating a DaemonSet object as shown in the following example:

    kind: DaemonSet
    apiVersion: apps/v1
    metadata:
      name: wasp-agent
      namespace: wasp
      labels:
        app: wasp
        tier: node
    spec:
      selector:
        matchLabels:
          name: wasp
      template:
        metadata:
          annotations:
            description: >-
              Configures swap for workloads
          labels:
            name: wasp
        spec:
          containers:
            - env:
                - name: SWAP_UTILIZATION_THRESHOLD_FACTOR
                  value: "0.8"
                - name: MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND
                  value: "1000000000"
                - name: MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND
                  value: "1000000000"
                - name: AVERAGE_WINDOW_SIZE_SECONDS
                  value: "30"
                - name: VERBOSITY
                  value: "1"
                - name: FSROOT
                  value: /host
                - name: NODE_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: spec.nodeName
              image: >-
                quay.io/openshift-virtualization/wasp-agent:v4.20 (1)
              imagePullPolicy: Always
              name: wasp-agent
              resources:
                requests:
                  cpu: 100m
                  memory: 50M
              securityContext:
                privileged: true
              volumeMounts:
                - mountPath: /host
                  name: host
                - mountPath: /rootfs
                  name: rootfs
          hostPID: true
          hostUsers: true
          priorityClassName: system-node-critical
          serviceAccountName: wasp
          terminationGracePeriodSeconds: 5
          volumes:
            - hostPath:
                path: /
              name: host
            - hostPath:
                path: /
              name: rootfs
      updateStrategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 10%
          maxSurge: 0
    1 Replace the image value with the image URL from the previous step.
  7. Deploy alerting rules by creating a PrometheusRule object. For example:

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        tier: node
        wasp.io: ""
      name: wasp-rules
      namespace: wasp
    spec:
      groups:
        - name: alerts.rules
          rules:
            - alert: NodeHighSwapActivity
              annotations:
                description: High swap activity detected at {{ $labels.instance }}. The rate
                  of swap out and swap in exceeds 200 in both operations in the last minute.
                  This could indicate memory pressure and may affect system performance.
                runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/docs/runbooks/NodeHighSwapActivity.md
                summary: High swap activity detected at {{ $labels.instance }}.
              expr: rate(node_vmstat_pswpout[1m]) > 200 and rate(node_vmstat_pswpin[1m]) >
                200
              for: 1m
              labels:
                kubernetes_operator_component: kubevirt
                kubernetes_operator_part_of: kubevirt
                operator_health_impact: warning
                severity: warning
  8. Add the cluster-monitoring label to the wasp namespace by running the following command:

    $ oc label namespace wasp openshift.io/cluster-monitoring="true"
  9. Enable memory overcommitment in OKD Virtualization by using the web console or the CLI.

    • Web console

      1. In the OKD web console, go to VirtualizationOverviewSettingsGeneral settingsMemory density.

      2. Set Enable memory density to on.

    • CLI

      • Configure your OKD Virtualization to enable higher memory density and set the overcommit rate:

        $ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged --type='json' -p='[ \
          { \
          "op": "replace", \
          "path": "/spec/higherWorkloadDensity/memoryOvercommitPercentage", \
          "value": 150 \
          } \
        ]'
        Successful output
        hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
Verification
  1. To verify the deployment of wasp-agent, run the following command:

    $ oc rollout status ds wasp-agent -n wasp

    If the deployment is successful, the following message is displayed:

    Example output
    daemon set "wasp-agent" successfully rolled out
  2. To verify that swap is correctly provisioned, complete the following steps:

    1. View a list of worker nodes by running the following command:

      $ oc get nodes -l node-role.kubernetes.io/worker
    2. Select a node from the list and display its memory usage by running the following command:

      $ oc debug node/<selected_node> -- free -m (1)
      1 Replace <selected_node> with the node name.

      If swap is provisioned, an amount greater than zero is displayed in the Swap: row.

      Table 1. Example output

      total

      used

      free

      shared

      buff/cache

      available

      Mem:

      31846

      23155

      1044

      6014

      14483

      8690

      Swap:

      8191

      2337

      5854

  3. Verify the OKD Virtualization memory overcommitment configuration by running the following command:

    $ oc -n openshift-cnv get HyperConverged/kubevirt-hyperconverged -o jsonpath='{.spec.higherWorkloadDensity}{"\n"}'
    Example output
    {"memoryOvercommitPercentage":150}

    The returned value must match the value you had previously configured.

Removing the wasp-agent component

If you no longer need memory overcommitment, you can remove the wasp-agent component and associated resources from your cluster.

Prerequisites
  • You are logged in to the cluster with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Remove the wasp-agent DaemonSet:

    $ oc delete daemonset wasp-agent -n wasp
  2. If deployed, remove the alerting rules:

    $ oc delete prometheusrule wasp-rules -n wasp
  3. Optionally, delete the wasp namespace if no other resources depend on it:

    $ oc delete namespace wasp
  4. Revert the memory overcommitment configuration:

    $ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged \
      --type='json' \
      -p='[{"op": "remove", "path": "/spec/higherWorkloadDensity"}]'
  5. Delete the MachineConfig that provisions swap memory:

    $ oc delete machineconfig 90-worker-swap
  6. Delete the associated KubeletConfig:

    $ oc delete kubeletconfig custom-config
  7. Wait for the worker nodes to reconcile:

    $ oc wait mcp worker --for condition=Updated=True --timeout=-1s
Verification
  • Confirm that the wasp-agent DaemonSet is removed:

    $ oc get daemonset -n wasp

    No wasp-agent should be listed.

  • Confirm that swap is no longer enabled on a node:

    $ oc debug node/<selected_node> -- free -m

    Ensure that the Swap: row shows 0 or that no swap space shows as provisioned.

Pod eviction conditions used by wasp-agent

The wasp agent manages pod eviction when the system is heavily loaded and nodes are at risk. Eviction is triggered if one of the following conditions is met:

High swap I/O traffic

This condition is met when swap-related I/O traffic is excessively high.

Condition
averageSwapInPerSecond > maxAverageSwapInPagesPerSecond
&&
averageSwapOutPerSecond > maxAverageSwapOutPagesPerSecond

By default, maxAverageSwapInPagesPerSecond and maxAverageSwapOutPagesPerSecond are set to 1000 pages. The default time interval for calculating the average is 30 seconds.

High swap utilization

This condition is met when swap utilization is excessively high, causing the current virtual memory usage to exceed the factored threshold. The NODE_SWAP_SPACE setting in your MachineConfig object can impact this condition.

Condition
nodeWorkingSet + nodeSwapUsage < totalNodeMemory + totalSwapMemory × thresholdFactor

Environment variables

You can use the following environment variables to adjust the values used to calculate eviction conditions:

Environment variable

Function

MAX_AVERAGE_SWAP_IN_PAGES_PER_SECOND

Sets the value of maxAverageSwapInPagesPerSecond.

MAX_AVERAGE_SWAP_OUT_PAGES_PER_SECOND

Sets the value of maxAverageSwapOutPagesPerSecond.

SWAP_UTILIZATION_THRESHOLD_FACTOR

Sets the thresholdFactor value used to calculate high swap utilization.

AVERAGE_WINDOW_SIZE_SECONDS

Sets the time interval for calculating the average swap usage.