High availability for pod-level bonds on SR-IOV networks - Hardware networks | Networking

Installing the PF Status Relay Operator using the CLI
Installing the PF Status Relay Operator using the web console
Configuring the PF Status Relay Operator for LACP state monitoring on SR-IOV networks

For workloads using pod-level bonding with SR-IOV virtual functions (VFs), despite an upstream switch failure, an underlying physical function (PF) might still report an up state. This creates a silent failure, as attached VFs remain up and pods continue to send traffic to a dead endpoint, causing packet loss.

The PF Status Relay Operator solves this issue by using Link Aggregation Control Protocol (LACP) as an active health check. In this configuration, each physical function (PF) is placed in its own single-member LACP bond with the upstream switch. When the Operator detects an LACP failure on a PF’s bond, it changes the link state of the attached VFs from auto to disabled. This action triggers the pod’s active-backup bond to fail over to its backup network path, maintaining high availability.

Configuring LACP state monitoring for SR-IOV networks is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Installing the PF Status Relay Operator using the CLI

Install the PF Status Relay Operator to enable OKD to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs).

Prerequisites

You configured LACP on your upstream switch.
You configured pod-level bonding for your SR-IOV networks.
You installed the OpenShift CLI (oc).
You have cluster-admin privileges.

Procedure

Create the openshift-pf-status-relay-operator namespace by entering the following command:

$ cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-pf-status-relay-operator
  annotations:
    workload.openshift.io/allowed: management
EOF

Create an OperatorGroup custom resource (CR) by entering the following command:

$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: pf-status-relay-operators
  namespace: openshift-pf-status-relay-operator
spec:
  targetNamespaces:
  - openshift-pf-status-relay-operator
EOF

Create a Subscription CR for the PF Status Relay Operator by entering the following command:

$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: pf-status-relay-operator-subscription
  namespace: openshift-pf-status-relay-operator
spec:
  channel: stable
  name: pf-status-relay-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Verification

To verify that the Operator is installed, enter the following command and then check that output shows Succeeded for the Operator:
```
$ oc get csv -n openshift-pf-status-relay-operator -o custom-columns=Name:.metadata.name,Phase:.status.phase
```

Installing the PF Status Relay Operator using the web console

Install the PF Status Relay Operator to enable OKD to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs).

Prerequisites

You configured LACP on your upstream switch.
You configured pod-level bonding for your SR-IOV networks.
You have cluster-admin privileges.

Procedure

Install the PF Status Relay Operator:
1. In the OKD web console, click Ecosystem → Software Catalog.
2. Select PF Status Relay Operator from the list of available Operators, and then click Install.
3. On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
4. Click Install.

Verification

Verify that the PF Status Relay Operator shows the Status as Succeeded on the Installed Operators dashboard.

Configuring the PF Status Relay Operator for LACP state monitoring on SR-IOV networks

Use the PF Status Relay Operator to enable Link Aggregation Control Protocol (LACP) state monitoring for workloads using pod-level bonding with SR-IOV networks. The Operator monitors the LACP state on physical functions (PF) and changes the link state for attached virtual functions (VF) when it detects an upstream failure. With this approach, you can detect failures on VFs attached to a PF to ensure a timely fail over to backup network path, ensuring high availability for your workloads.

The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks:

Create host-level NIC bonds on worker nodes and configure LACP.
Define SR-IOV network policies to create virtual functions (VFs) on the bonded interfaces.
Deploy the PF Status Relay Operator to monitor PFs and monitor the LACP state.
Verify that pods using these VFs automatically fail over to a backup network path in case of upstream switch failure.

The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks. This scenario uses SR-IOV network cards with two ports on each node, worker-0 and worker-1, with both ports connected to a shared switch to support LACP bonding.

Prerequisites

Nodes must have a NIC that supports SR-IOV.
The SR-IOV Network Operator is installed.
The PF Status Relay Operator is installed.
The physical switch ports connected to the worker nodes are configured for LACP with a fast polling rate.
The linkState is set to auto or disable for the SR-IOV VFs that you want to monitor. The Operator ignores VFs with the linkState set to enable. The default value for SR-IOV VFs is linkState: auto.

Procedure

Create the project namespace by creating a namespace.yaml file such as the following example:

Example namespace.yaml file

apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: sriov-operator-tests
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/warn: privileged
    security.openshift.io/scc.podSecurityLabelSync: "false"
  name: sriov-operator-tests (1)

1	The namespace where you deploy the high-availability pod.

Apply the namespace by running the following command:
```
$ oc apply -f namespace.yaml
```

Configure host-level LACP bonds:

Create a YAML file that defines the NodeNetworkConfigurationPolicy resource for the ens5f0 interface on the worker-0 node:

Example nncpBondF0Worker0.yaml file

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: example-bond-f0
spec:
  nodeSelector:
    kubernetes.io/hostname: worker-0 (1)
  desiredState:
    interfaces:
      - name: example-bond-f0
        description: example-bond-f0
        type: bond
        state: up
        mtu: 9216
        link-aggregation:
          mode: 802.3ad (2)
          options:
            miimon: '100'
            lacp_rate: 'fast' (3)
            min_links: '1'
          port:
            - ens5f0 (4)
      - name: ens5f0
        type: ethernet
        state: up
        mtu: 9216

1	The node where the bonded interface is created.
2	You must set the LACP mode to `802.3ad` to enable LACP on the bond.
3	You must set the LACP rate `fast` on the interface and on the switch. The `fast` rate sends LACP packets every second.
4	The PF that you want to include in the bond.

Create a YAML file that defines the NodeNetworkConfigurationPolicy resource for the ens5f1 interface on the worker-0 node:

Example nncpBondF1Worker0.yaml file

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: example-bond-f1
spec:
  nodeSelector:
    kubernetes.io/hostname: worker-0 (1)
  desiredState:
    interfaces:
      - name: example-bond-f1
        description: example-bond-f1
        type: bond
        state: up
        mtu: 9216
        link-aggregation:
          mode: 802.3ad (2)
          options:
            miimon: '100'
            lacp_rate: 'fast' (3)
            min_links: '1'
          port:
            - ens5f1 (4)
      - name: ens5f1
        type: ethernet
        state: up
        mtu: 9216

1	The node where the bonded interface is created.
2	You must set the LACP mode to `802.3ad` to enable LACP on the bond.
3	You must set the LACP rate `fast` on the interface and on the switch. The `fast` rate sends LACP packets every second.
4	The PF that you want to include in the bond.

Apply the resources by running the following commands:

$ oc apply -f nncpBondF0Worker0.yaml
$ oc apply -f nncpBondF1Worker0.yaml

Create SR-IOV network VFs for the bonded interfaces:

Create a YAML file that defines the SriovNetworkNodePolicy resource for the ens5f0 interface on the worker-0 node:

Example sriovnetworkpolicy-port1.yaml file

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriovnetpolicy-port-0
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens5f0 (1)
  nodeSelector:
    kubernetes.io/hostname: worker-0 (2)
  numVfs: 10 (3)
  priority: 99
  resourceName: resourceport0 (4)

1	The PF to create the VFs from.
2	The node where the VFs are created.
3	The number of VFs to create on the PF.
4	The resource name used by pods to request these VFs.

Create a YAML file that defines the SriovNetworkNodePolicy resource for the ens5f1 interface on the worker-0 node:

Example sriovnetworkpolicy-port2.yaml file

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriovnetpolicy-port-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens5f1 (1)
  nodeSelector:
    kubernetes.io/hostname: worker-0 (2)
  numVfs: 10 (3)
  priority: 99
  resourceName: resourceport1 (4)

1	The PF to create the VFs from.
2	The node where the VFs are created.
3	The number of VFs to create on the PF.
4	The resource name used by pods to request these VFs.

Apply the resources by running the following commands:

$ oc apply -f sriovnetworkpolicy-port1.yaml
$ oc apply -f sriovnetworkpolicy-port2.yaml

Configure the PF Status Relay Operator:

Create a YAML file that defines the PFLACPMonitor resource. This example file configures the Operator to monitor the LACP status of ens5f0 and ens5f1 bonded interfaces on the worker-0 node:

Example pflacpmonitor.yaml file

apiVersion: pfstatusrelay.openshift.io/v1alpha1
kind: PFLACPMonitor
metadata:
  namespace: openshift-pf-status-relay-operator
  labels:
    app.kubernetes.io/name: pf-status-relay-operator
  name: pflacpmonitor-worker-0
spec:
  interfaces:
    - ens5f0 (1)
    - ens5f1
  pollingInterval: 1000 (2)
  nodeSelector:
    kubernetes.io/hostname: worker-0 (3)

1	The list of PFs to monitor.
2	The polling interval in milliseconds to check the LACP status on the monitored interfaces. The minimum value is `1000`.
3	The node for the target interfaces.

Use only one PFLACPMonitor custom resource to monitor each network interface on a node. If you create multiple resources that target the same interface, the PF Status Relay Operator will not process the conflicting configurations.

Apply the PFLACPMonitor resource by running the following command:
```
$ oc apply -f pflacpmonitor.yaml
```

Verification

Check the logs of the PF Status Relay Operator to verify that it is monitoring the LACP state:

$ oc logs -n openshift-pf-status-relay-operator <pf_status_relay_operator_pod_name>

Example output

{"time":"2025-07-24T13:35:54.653201692Z","level":"INFO","msg":"lacp is up","interface":"ens5f0"}
{"time":"2025-07-24T13:35:54.65347273Z","level":"INFO","msg":"vf link state was set","id":0,"state":"auto","interface":"ens5f0"}
...

Apply the SriovNetwork resources to make the VFs available for use within the sriov-operator-tests namespace:

Create a YAML file that defines the SriovNetwork resource for the VFs created on ens5f0:

Example sriovnetwork-port1.yaml file

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriovnetwork-port0
  namespace: openshift-sriov-network-operator
spec:
  capabilities: '{ "mac": true }'
  networkNamespace: sriov-operator-tests
  resourceName: resourceport0

Create a YAML file that defines the SriovNetwork resource for the VFs created on ens5f1:

Example sriovnetwork-port2.yaml file

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriovnetwork-port1
  namespace: openshift-sriov-network-operator
spec:
  capabilities: '{ "mac": true }'
  networkNamespace: sriov-operator-tests
  resourceName: resourceport1

Apply the resources by running the following commands:

$ oc apply -f sriovnetwork-port1.yaml
$ oc apply -f sriovnetwork-port2.yaml

Define a high-availability pod that uses the SR-IOV VFs:

Apply the NetworkAttachmentDefinition resource to create an active-backup bond using the two SR-IOV networks:
Example nad-bond.yaml file
```
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: nad-bond-1
  namespace: sriov-operator-tests
spec:
  config: |-
    {"type": "bond", "cniVersion": "0.3.1", "name": "bond-net1",
    "mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "mtu": 1450,
    "links": [{"name": "net1"},{"name": "net2"}], "capabilities": {"ips": true}, "ipam": {"type": "static"}}
```
- linksInContainer: true creates the bond inside the pod’s network namespace.
- mode: active-backup configures the bond to use active-backup mode.
- links specifies the pod-level interfaces to include in the bond.
  
  The PF Status Relay Operator provides LACP state monitoring for pod-level bonding with the mode: active-backup configuration only.
Apply the NetworkAttachmentDefinition resource by running the following command:
```
$ oc apply -f nad-bond.yaml
```

Create a YAML file that defines the Pod resource that uses the VFs from the bonded interfaces in active-backup mode:

Example client-bond.yaml file

apiVersion: v1
kind: Pod
metadata:
  name: client-bond
  namespace: sriov-operator-tests
  annotations:
    k8s.v1.cni.cncf.io/networks: |- (1)
      [{
          "name": "sriovnetwork-port0",
          "interface": "net1",
          "mac": "<mac_address>"
        },{
          "name": "sriovnetwork-port1",
          "interface": "net2",
          "mac": "<mac_address>"
        },{
          "name": "nad-bond-1",
          "interface": "bond0",
          "ips": ["192.168.10.254/24","2001:100::254/64"],
          "mac": "<mac_address>"
      }]
spec:
  nodeName: worker-0
  containers:
    - name: client-bond
      image: quay.io/nginx/nginx-unprivileged
      imagePullPolicy: IfNotPresent
      command: ["/bin/sh", "-c", "sleep 3650d"]
      securityContext:
        privileged: true
      command: ["/bin/sleep", "3650d"]

1	The annotation requests three networks: two SR-IOV VFs, `net1` and `net2` and one bond, `bond0`, which uses them.

Apply the Pod resource by running the following command:
```
$ oc apply -f client-bond.yaml
```

Check that the failover mechanism:

Check the initial status of the pod-level bond by running the following command:

sh-4.4# cat /proc/net/bonding/bond0

Example output

[root@client-bond-tlb /]# cat /proc/net/bonding/bond0
...

Bonding Mode: transmit load balancing
Transmit Hash Policy: layer2 (0)
Primary Slave: None
Currently Active Slave: net1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: net1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: AA:BB:CC:DD:EE:FF
Slave queue ID: 0

Slave Interface: net2
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: BB:CC:DD:EE:FF:GG

Both net1 and net2 interfaces are up.

Exit the pod shell.
Simulate an LACP failure on your upstream physical switch. To simulate this scenario, you can filter LACP traffic on the switch port that you want to test the failure on. This ensures that the physical link remains up while the LACP pollings fails. The command to do this is vendor-dependent.

Verify the failover inside the pod by logging back into the client-bond pod and checking the bond status again:

sh-4.4# cat /proc/net/bonding/bond0

Example output

...

Bonding Mode: transmit load balancing
Transmit Hash Policy: layer2 (0)
Primary Slave: None
Currently Active Slave: net2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: net1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: AA:BB:CC:DD:EE:FF
Slave queue ID: 0

Slave Interface: net2
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: BB:CC:DD:EE:FF:GG
Slave queue ID: 0

The net1 interface is down, and the net2 interface is now the active interface.

The client-bond pod detects the link state change and switches to the backup network path.