Configuring higher VM workload density

Using wasp-agent to configure higher VM workload density

To increase the number of virtual machines (VMs), you can configure a higher VM workload density in your cluster by overcommitting the amount of memory (RAM).

Configuring higher workload density is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The following workloads are especially suited for higher workload density:

Many similar workloads
Underused workloads

While overcommitted memory can lead to a higher workload density, it can also lower workload performance of a highly utilized system.

Using `wasp-agent` to configure higher VM workload density

The wasp-agent component enables an OKD cluster to assign swap resources to virtual machine (VM) workloads. Swap usage is only supported on worker nodes.

Swap resources can be only assigned to virtual machine workloads (VM pods) of the Burstable Quality of Service (QoS) class. VM pods of the Guaranteed QoS class and pods of any QoS class that do not belong to VMs cannot swap resources.

For descriptions of QoS classes, see Configure Quality of Service for Pods (Kubernetes documentation).

Prerequisites

The oc tool is available.
You are logged into the cluster with the cluster-admin role.
A memory over-commit ratio is defined.
The node belongs to a worker pool.

Procedure

Create a privileged service account by entering the following commands:
```
$ oc adm new-project wasp
```
```
$ oc create sa -n wasp wasp
```
```
$ oc create clusterrolebinding wasp --clusterrole=cluster-admin --serviceaccount=wasp:wasp
```
```
$ oc adm policy add-scc-to-user -n wasp privileged -z wasp
```
The wasp-agent component deploys an OCI hook to enable swap usage for containers on the node level. The low-level nature requires the DaemonSet object to be privileged.

Deploy wasp-agent by creating a DaemonSet object as follows:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: wasp-agent
  namespace: wasp
  labels:
    app: wasp
    tier: node
spec:
  selector:
    matchLabels:
      name: wasp
  template:
    metadata:
      annotations:
        description: >-
          Configures swap for workloads
      labels:
          name: wasp
    spec:
      serviceAccountName: wasp
      hostPID: true
      hostUsers: true
      terminationGracePeriodSeconds: 5
      containers:
        - name: wasp-agent
          image: >-
            registry.redhat.io/container-native-virtualization/wasp-agent-rhel9:v4.16
          imagePullPolicy: Always
          env:
          - name: "FSROOT"
            value: "/host"
          resources:
            requests:
              cpu: 100m
              memory: 50M
          securityContext:
            privileged: true
          volumeMounts:
          - name: host
            mountPath: "/host"
      volumes:
      - name: host
        hostPath:
          path: "/"
      priorityClassName: system-node-critical
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 10%
      maxSurge: 0
status: {}

Configure the kubelet service to permit swap:

Create a KubeletConfiguration file as shown in the example:

Example of a KubeletConfiguration file

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-config
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ''  # MCP
      #machine.openshift.io/cluster-api-machine-role: worker # machine
      #node-role.kubernetes.io/worker: '' # node
  kubeletConfig:
    failSwapOn: false
    evictionSoft:
      memory.available: "1Gi"
    evictionSoftGracePeriod:
      memory.available: "10s"

If the cluster is already using an existing KubeletConfiguration file, add the following to the spec section:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-config
# ...
spec
# ...
    kubeletConfig:
      evictionSoft:
        memory.available: 1Gi
      evictionSoftGracePeriod:
        memory.available: 1m30s
      failSwapOn: false

Run the following command:

$ oc wait mcp worker --for condition=Updated=True

Create a MachineConfig object to provision swap as follows:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 90-worker-swap
spec:
  config:
    ignition:
      version: 3.4.0
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Provision and enable swap
          ConditionFirstBoot=no

          [Service]
          Type=oneshot
          Environment=SWAP_SIZE_MB=5000
          ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/tmp/swapfile count=${SWAP_SIZE_MB} bs=1M && \
          sudo chmod 600 /var/tmp/swapfile && \
          sudo mkswap /var/tmp/swapfile && \
          sudo swapon /var/tmp/swapfile && \
          free -h && \
          sudo systemctl set-property --runtime system.slice MemorySwapMax=0 IODeviceLatencyTargetSec=\"/ 50ms\""

          [Install]
          RequiredBy=kubelet-dependencies.target
        enabled: true
        name: swap-provision.service

To have enough swap space for the worst-case scenario, make sure to have at least as much swap space provisioned as overcommitted RAM. Calculate the amount of swap space to be provisioned on a node using the following formula:

NODE_SWAP_SPACE = NODE_RAM * (MEMORY_OVER_COMMIT_PERCENT / 100% - 1)

Example:

NODE_SWAP_SPACE = 16 GB * (150% / 100% - 1)
                = 16 GB * (1.5 - 1)
                = 16 GB * (0.5)
                =  8 GB

Deploy alerting rules as follows:

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: wasp-alerts
  namespace: openshift-monitoring
spec:
  groups:
  - name: wasp.rules
    rules:
    - alert: NodeSwapping
      annotations:
        description: Node {{ $labels.instance }} is swapping at a rate of {{ printf "%.2f" $value }} MB/s
        runbook_url: https://github.com/openshift-virtualization/wasp-agent/tree/main/runbooks/alerts/NodeSwapping.md
        summary: A node is swapping memory pages
      expr: |
        # In MB/s
        irate(node_memory_SwapFree_bytes{job="node-exporter"}[5m]) / 1024^2 > 0
      for: 1m
      labels:
        severity: critical

Configure OKD Virtualization to use memory overcommit either by using the OKD web console or by editing the HyperConverged custom resource (CR) file in the CLI.
- Web console
  
  In the OKD web console, go to Virtualization → Overview → Settings → General settings → Memory density.
  
  Set Enable memory density to on.
- CLI
  
  Configure your OKD Virtualization to enable higher memory density and set the overcommit rate:
  
  $ oc -n openshift-cnv patch HyperConverged/kubevirt-hyperconverged --type='json' -p='[ \ { \ "op": "replace", \ "path": "/spec/higherWorkloadDensity/memoryOvercommitPercentage", \ "value": 150 \ } \ ]'
  
  Successful output
  
  hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
After applying all configurations, the swap feature is fully available only after all MachineConfigPool rollouts are complete.

Verification

To verify the deployment of wasp-agent, run the following command:
```
$  oc rollout status ds wasp-agent -n wasp
```
If the deployment is successful, the following message is displayed:
```
daemon set "wasp-agent" successfully rolled out
```
To verify that swap is correctly provisioned, do the following:
1. Run the following command:
  $ oc get nodes -l node-role.kubernetes.io/worker
2. Select a node from the provided list and run the following command:
  $ oc debug node/<selected-node> -- free -m
  If swap is provisioned correctly, an amount greater than zero is displayed, similar to the following:
  
  total
  
  used
  
  free
  
  shared
  
  buff/cache
  
  available
  
  Mem:
  
  31846
  
  23155
  
  1044
  
  6014
  
  14483
  
  8690
  
  Swap:
  
  8191
  
  2337
  
  5854
Verify the OKD Virtualization memory overcommitment configuration by running the following command:
```
$ oc -n openshift-cnv get HyperConverged/kubevirt-hyperconverged -o jsonpath='{.spec.higherWorkloadDensity}{"\n"}'
```
Example output
```
{"memoryOvercommitPercentage":150}
```
The returned value must match the value you had previously configured.

	total	used	free	shared	buff/cache	available
Mem:	31846	23155	1044	6014	14483	8690
Swap:	8191	2337	5854

Using wasp-agent to configure higher VM workload density

Using `wasp-agent` to configure higher VM workload density