$ cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
name: openshift-pf-status-relay-operator
annotations:
workload.openshift.io/allowed: management
EOF
For workloads using pod-level bonding with SR-IOV virtual functions (VFs), despite an upstream switch failure, an underlying physical function (PF) might still report an up
state. This creates a silent failure, as attached VFs remain up and pods continue to send traffic to a dead endpoint, causing packet loss.
The PF Status Relay Operator solves this issue by using Link Aggregation Control Protocol (LACP) as an active health check. In this configuration, each physical function (PF) is placed in its own single-member LACP bond with the upstream switch. When the Operator detects an LACP failure on a PF’s bond, it changes the link state of the attached VFs from auto
to disabled
. This action triggers the pod’s active-backup
bond to fail over to its backup network path, maintaining high availability.
Configuring LACP state monitoring for SR-IOV networks is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
Install the PF Status Relay Operator to enable OKD to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs).
You configured LACP on your upstream switch.
You configured pod-level bonding for your SR-IOV networks.
You installed the OpenShift CLI (oc
).
You have cluster-admin privileges.
Create the openshift-pf-status-relay-operator
namespace by entering the following command:
$ cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
name: openshift-pf-status-relay-operator
annotations:
workload.openshift.io/allowed: management
EOF
Create an OperatorGroup
custom resource (CR) by entering the following command:
$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: pf-status-relay-operators
namespace: openshift-pf-status-relay-operator
spec:
targetNamespaces:
- openshift-pf-status-relay-operator
EOF
Create a Subscription
CR for the PF Status Relay Operator by entering the following command:
$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: pf-status-relay-operator-subscription
namespace: openshift-pf-status-relay-operator
spec:
channel: stable
name: pf-status-relay-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
To verify that the Operator is installed, enter the following command and then check that output shows Succeeded
for the Operator:
$ oc get csv -n openshift-pf-status-relay-operator -o custom-columns=Name:.metadata.name,Phase:.status.phase
Install the PF Status Relay Operator to enable OKD to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs).
You configured LACP on your upstream switch.
You configured pod-level bonding for your SR-IOV networks.
You have cluster-admin privileges.
Install the PF Status Relay Operator:
In the OKD web console, click Ecosystem → Software Catalog.
Select PF Status Relay Operator from the list of available Operators, and then click Install.
On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
Click Install.
Verify that the PF Status Relay Operator shows the Status as Succeeded on the Installed Operators dashboard.
Use the PF Status Relay Operator to enable Link Aggregation Control Protocol (LACP) state monitoring for workloads using pod-level bonding with SR-IOV networks. The Operator monitors the LACP state on physical functions (PF) and changes the link state for attached virtual functions (VF) when it detects an upstream failure. With this approach, you can detect failures on VFs attached to a PF to ensure a timely fail over to backup network path, ensuring high availability for your workloads.
The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks:
Create host-level NIC bonds on worker nodes and configure LACP.
Define SR-IOV network policies to create virtual functions (VFs) on the bonded interfaces.
Deploy the PF Status Relay Operator to monitor PFs and monitor the LACP state.
Verify that pods using these VFs automatically fail over to a backup network path in case of upstream switch failure.
The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks. This scenario uses SR-IOV network cards with two ports on each node, worker-0
and worker-1
, with both ports connected to a shared switch to support LACP bonding.
Nodes must have a NIC that supports SR-IOV.
The SR-IOV Network Operator is installed.
The PF Status Relay Operator is installed.
The physical switch ports connected to the worker nodes are configured for LACP with a fast polling rate.
The linkState
is set to auto
or disable
for the SR-IOV VFs that you want to monitor. The Operator ignores VFs with the linkState
set to enable
. The default value for SR-IOV VFs is linkState: auto
.
Create the project namespace by creating a namespace.yaml
file such as the following example:
namespace.yaml
fileapiVersion: v1
kind: Namespace
metadata:
labels:
kubernetes.io/metadata.name: sriov-operator-tests
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/warn: privileged
security.openshift.io/scc.podSecurityLabelSync: "false"
name: sriov-operator-tests (1)
1 | The namespace where you deploy the high-availability pod. |
Apply the namespace by running the following command:
$ oc apply -f namespace.yaml
Configure host-level LACP bonds:
Create a YAML file that defines the NodeNetworkConfigurationPolicy
resource for the ens5f0
interface on the worker-0
node:
nncpBondF0Worker0.yaml
fileapiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: example-bond-f0
spec:
nodeSelector:
kubernetes.io/hostname: worker-0 (1)
desiredState:
interfaces:
- name: example-bond-f0
description: example-bond-f0
type: bond
state: up
mtu: 9216
link-aggregation:
mode: 802.3ad (2)
options:
miimon: '100'
lacp_rate: 'fast' (3)
min_links: '1'
port:
- ens5f0 (4)
- name: ens5f0
type: ethernet
state: up
mtu: 9216
1 | The node where the bonded interface is created. |
2 | You must set the LACP mode to 802.3ad to enable LACP on the bond. |
3 | You must set the LACP rate fast on the interface and on the switch. The fast rate sends LACP packets every second. |
4 | The PF that you want to include in the bond. |
Create a YAML file that defines the NodeNetworkConfigurationPolicy
resource for the ens5f1
interface on the worker-0
node:
nncpBondF1Worker0.yaml
fileapiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: example-bond-f1
spec:
nodeSelector:
kubernetes.io/hostname: worker-0 (1)
desiredState:
interfaces:
- name: example-bond-f1
description: example-bond-f1
type: bond
state: up
mtu: 9216
link-aggregation:
mode: 802.3ad (2)
options:
miimon: '100'
lacp_rate: 'fast' (3)
min_links: '1'
port:
- ens5f1 (4)
- name: ens5f1
type: ethernet
state: up
mtu: 9216
1 | The node where the bonded interface is created. |
2 | You must set the LACP mode to 802.3ad to enable LACP on the bond. |
3 | You must set the LACP rate fast on the interface and on the switch. The fast rate sends LACP packets every second. |
4 | The PF that you want to include in the bond. |
Apply the resources by running the following commands:
$ oc apply -f nncpBondF0Worker0.yaml
$ oc apply -f nncpBondF1Worker0.yaml
Create SR-IOV network VFs for the bonded interfaces:
Create a YAML file that defines the SriovNetworkNodePolicy
resource for the ens5f0
interface on the worker-0
node:
sriovnetworkpolicy-port1.yaml
fileapiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriovnetpolicy-port-0
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice
nicSelector:
pfNames:
- ens5f0 (1)
nodeSelector:
kubernetes.io/hostname: worker-0 (2)
numVfs: 10 (3)
priority: 99
resourceName: resourceport0 (4)
1 | The PF to create the VFs from. |
2 | The node where the VFs are created. |
3 | The number of VFs to create on the PF. |
4 | The resource name used by pods to request these VFs. |
Create a YAML file that defines the SriovNetworkNodePolicy
resource for the ens5f1
interface on the worker-0
node:
sriovnetworkpolicy-port2.yaml
fileapiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriovnetpolicy-port-1
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice
nicSelector:
pfNames:
- ens5f1 (1)
nodeSelector:
kubernetes.io/hostname: worker-0 (2)
numVfs: 10 (3)
priority: 99
resourceName: resourceport1 (4)
1 | The PF to create the VFs from. |
2 | The node where the VFs are created. |
3 | The number of VFs to create on the PF. |
4 | The resource name used by pods to request these VFs. |
Apply the resources by running the following commands:
$ oc apply -f sriovnetworkpolicy-port1.yaml
$ oc apply -f sriovnetworkpolicy-port2.yaml
Configure the PF Status Relay Operator:
Create a YAML file that defines the PFLACPMonitor
resource. This example file configures the Operator to monitor the LACP status of ens5f0
and ens5f1
bonded interfaces on the worker-0
node:
pflacpmonitor.yaml
fileapiVersion: pfstatusrelay.openshift.io/v1alpha1
kind: PFLACPMonitor
metadata:
namespace: openshift-pf-status-relay-operator
labels:
app.kubernetes.io/name: pf-status-relay-operator
name: pflacpmonitor-worker-0
spec:
interfaces:
- ens5f0 (1)
- ens5f1
pollingInterval: 1000 (2)
nodeSelector:
kubernetes.io/hostname: worker-0 (3)
1 | The list of PFs to monitor. |
2 | The polling interval in milliseconds to check the LACP status on the monitored interfaces. The minimum value is 1000 . |
3 | The node for the target interfaces. |
Use only one |
Apply the PFLACPMonitor
resource by running the following command:
$ oc apply -f pflacpmonitor.yaml
Check the logs of the PF Status Relay Operator to verify that it is monitoring the LACP state:
$ oc logs -n openshift-pf-status-relay-operator <pf_status_relay_operator_pod_name>
{"time":"2025-07-24T13:35:54.653201692Z","level":"INFO","msg":"lacp is up","interface":"ens5f0"}
{"time":"2025-07-24T13:35:54.65347273Z","level":"INFO","msg":"vf link state was set","id":0,"state":"auto","interface":"ens5f0"}
...
Apply the SriovNetwork
resources to make the VFs available for use within the sriov-operator-tests
namespace:
Create a YAML file that defines the SriovNetwork
resource for the VFs created on ens5f0
:
sriovnetwork-port1.yaml
fileapiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriovnetwork-port0
namespace: openshift-sriov-network-operator
spec:
capabilities: '{ "mac": true }'
networkNamespace: sriov-operator-tests
resourceName: resourceport0
Create a YAML file that defines the SriovNetwork
resource for the VFs created on ens5f1
:
sriovnetwork-port2.yaml
fileapiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriovnetwork-port1
namespace: openshift-sriov-network-operator
spec:
capabilities: '{ "mac": true }'
networkNamespace: sriov-operator-tests
resourceName: resourceport1
Apply the resources by running the following commands:
$ oc apply -f sriovnetwork-port1.yaml
$ oc apply -f sriovnetwork-port2.yaml
Define a high-availability pod that uses the SR-IOV VFs:
Apply the NetworkAttachmentDefinition
resource to create an active-backup
bond using the two SR-IOV networks:
nad-bond.yaml
fileapiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: nad-bond-1
namespace: sriov-operator-tests
spec:
config: |-
{"type": "bond", "cniVersion": "0.3.1", "name": "bond-net1",
"mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "mtu": 1450,
"links": [{"name": "net1"},{"name": "net2"}], "capabilities": {"ips": true}, "ipam": {"type": "static"}}
linksInContainer: true
creates the bond inside the pod’s network namespace.
mode: active-backup
configures the bond to use active-backup mode.
links
specifies the pod-level interfaces to include in the bond.
The PF Status Relay Operator provides LACP state monitoring for pod-level bonding with the |
Apply the NetworkAttachmentDefinition
resource by running the following command:
$ oc apply -f nad-bond.yaml
Create a YAML file that defines the Pod
resource that uses the VFs from the bonded interfaces in active-backup mode:
client-bond.yaml
fileapiVersion: v1
kind: Pod
metadata:
name: client-bond
namespace: sriov-operator-tests
annotations:
k8s.v1.cni.cncf.io/networks: |- (1)
[{
"name": "sriovnetwork-port0",
"interface": "net1",
"mac": "<mac_address>"
},{
"name": "sriovnetwork-port1",
"interface": "net2",
"mac": "<mac_address>"
},{
"name": "nad-bond-1",
"interface": "bond0",
"ips": ["192.168.10.254/24","2001:100::254/64"],
"mac": "<mac_address>"
}]
spec:
nodeName: worker-0
containers:
- name: client-bond
image: quay.io/nginx/nginx-unprivileged
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c", "sleep 3650d"]
securityContext:
privileged: true
command: ["/bin/sleep", "3650d"]
1 | The annotation requests three networks: two SR-IOV VFs, net1 and net2 and one bond, bond0 , which uses them. |
Apply the Pod
resource by running the following command:
$ oc apply -f client-bond.yaml
Check that the failover mechanism:
Log in to the client-bond
pod by running the following command:
$ oc rsh -n sriov-operator-tests client-bond
Check the initial status of the pod-level bond by running the following command:
sh-4.4# cat /proc/net/bonding/bond0
[root@client-bond-tlb /]# cat /proc/net/bonding/bond0
...
Bonding Mode: transmit load balancing
Transmit Hash Policy: layer2 (0)
Primary Slave: None
Currently Active Slave: net1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
Slave Interface: net1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: AA:BB:CC:DD:EE:FF
Slave queue ID: 0
Slave Interface: net2
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: BB:CC:DD:EE:FF:GG
Both net1
and net2
interfaces are up.
Exit the pod shell.
Simulate an LACP failure on your upstream physical switch. To simulate this scenario, you can filter LACP traffic on the switch port that you want to test the failure on. This ensures that the physical link remains up while the LACP pollings fails. The command to do this is vendor-dependent.
Verify the failover inside the pod by logging back into the client-bond
pod and checking the bond status again:
sh-4.4# cat /proc/net/bonding/bond0
...
Bonding Mode: transmit load balancing
Transmit Hash Policy: layer2 (0)
Primary Slave: None
Currently Active Slave: net2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
Slave Interface: net1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: AA:BB:CC:DD:EE:FF
Slave queue ID: 0
Slave Interface: net2
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: BB:CC:DD:EE:FF:GG
Slave queue ID: 0
The net1
interface is down, and the net2
interface is now the active interface.
The client-bond pod detects the link state change and switches to the backup network path.