×

Labeling nodes with an SR-IOV enabled NIC

If you want to enable SR-IOV on only SR-IOV capable nodes there are a couple of ways to do this:

  1. Install the Node Feature Discovery (NFD) Operator. NFD detects the presence of SR-IOV enabled NICs and labels the nodes with node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = true.

  2. Examine the SriovNetworkNodeState CR for each node. The interfaces stanza includes a list of all of the SR-IOV devices discovered by the SR-IOV Network Operator on the worker node. Label each node with feature.node.kubernetes.io/network-sriov.capable: "true" by using the following command:

    $ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"

    You can label the nodes with whatever name you want.

Setting one sysctl flag

You can set interface-level network sysctl settings for a pod connected to a SR-IOV network device.

In this example, net.ipv4.conf.IFNAME.accept_redirects is set to 1 on the created virtual interfaces.

The sysctl-tuning-test is a namespace used in this example.

  • Use the following command to create the sysctl-tuning-test namespace:

    $ oc create namespace sysctl-tuning-test

Setting one sysctl flag on nodes with SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io custom resource definition (CRD) to OKD. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain and reboot the nodes.

It can take several minutes for a configuration change to apply.

Follow this procedure to create a SriovNetworkNodePolicy custom resource (CR).

Procedure
  1. Create an SriovNetworkNodePolicy custom resource (CR). For example, save the following YAML as the file policyoneflag-sriov-node-network.yaml:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: policyoneflag (1)
      namespace: openshift-sriov-network-operator (2)
    spec:
      resourceName: policyoneflag (3)
      nodeSelector: (4)
        feature.node.kubernetes.io/network-sriov.capable="true"
      priority: 10 (5)
      numVfs: 5 (6)
      nicSelector: (7)
        pfNames: ["ens5"] (8)
      deviceType: "netdevice" (9)
      isRdma: false (10)
    1 The name for the custom resource object.
    2 The namespace where the SR-IOV Network Operator is installed.
    3 The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
    4 The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
    5 Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
    6 The number of the virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
    7 The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter, then you do not need to specify any other parameter because a network ID is unique.
    8 Optional: An array of one or more physical function (PF) names for the device.
    9 Optional: The driver type for the virtual functions. The only allowed value is netdevice. For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true.
    10 Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false. If the isRdma parameter is set to true, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications.

    The vfio-pci driver type is not supported.

  2. Create the SriovNetworkNodePolicy object:

    $ oc create -f policyoneflag-sriov-node-network.yaml

    After applying the configuration update, all the pods in sriov-network-operator namespace change to the Running status.

  3. To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.

    $ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
    Example output
    Succeeded

Configuring sysctl on a SR-IOV network

You can set interface specific sysctl settings on virtual interfaces created by SR-IOV by adding the tuning configuration to the optional metaPlugins parameter of the SriovNetwork resource.

The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition custom resource (CR) automatically.

Do not edit NetworkAttachmentDefinition custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.

To change the interface-level network net.ipv4.conf.IFNAME.accept_redirects sysctl settings, create an additional SR-IOV network with the Container Network Interface (CNI) tuning plugin.

Prerequisites
  • Install the OKD CLI (oc).

  • Log in to the OKD cluster as a user with cluster-admin privileges.

Procedure
  1. Create the SriovNetwork custom resource (CR) for the additional SR-IOV network attachment and insert the metaPlugins configuration, as in the following example CR. Save the YAML as the file sriov-network-interface-sysctl.yaml.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: onevalidflag (1)
      namespace: openshift-sriov-network-operator (2)
    spec:
      resourceName: policyoneflag (3)
      networkNamespace: sysctl-tuning-test (4)
      ipam: '{ "type": "static" }' (5)
      capabilities: '{ "mac": true, "ips": true }' (6)
      metaPlugins : | (7)
        {
          "type": "tuning",
          "capabilities":{
            "mac":true
          },
          "sysctl":{
             "net.ipv4.conf.IFNAME.accept_redirects": "1"
          }
        }
    1 A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
    2 The namespace where the SR-IOV Network Operator is installed.
    3 The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
    4 The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
    5 A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
    6 Optional: Set capabilities for the additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.
    7 Optional: The metaPlugins parameter is used to add additional capabilities to the device. In this use case set the type field to tuning. Specify the interface-level network sysctl you want to set in the sysctl field.
  2. Create the SriovNetwork resource:

    $ oc create -f sriov-network-interface-sysctl.yaml
Verifying that the NetworkAttachmentDefinition CR is successfully created
  • Confirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition CR by running the following command:

    $ oc get network-attachment-definitions -n <namespace> (1)
    1 Replace <namespace> with the value for networkNamespace that you specified in the SriovNetwork object. For example, sysctl-tuning-test.
    Example output
    NAME                                  AGE
    onevalidflag                          14m

    There might be a delay before the SR-IOV Network Operator creates the CR.

Verifying that the additional SR-IOV network attachment is successful

To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:

  1. Create a Pod CR. Save the following YAML as the file examplepod.yaml:

    apiVersion: v1
    kind: Pod
    metadata:
      name: tunepod
      namespace: sysctl-tuning-test
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
            {
              "name": "onevalidflag",  (1)
              "mac": "0a:56:0a:83:04:0c", (2)
              "ips": ["10.100.100.200/24"] (3)
           }
          ]
    spec:
      containers:
      - name: podexample
        image: centos
        command: ["/bin/bash", "-c", "sleep INF"]
        securityContext:
          runAsUser: 2000
          runAsGroup: 3000
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
    1 The name of the SR-IOV network attachment definition CR.
    2 Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object.
    3 Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object.
  2. Create the Pod CR:

    $ oc apply -f examplepod.yaml
  3. Verify that the pod is created by running the following command:

    $ oc get pod -n sysctl-tuning-test
    Example output
    NAME      READY   STATUS    RESTARTS   AGE
    tunepod   1/1     Running   0          47s
  4. Log in to the pod by running the following command:

    $ oc rsh -n sysctl-tuning-test tunepod
  5. Verify the values of the configured sysctl flag. Find the value net.ipv4.conf.IFNAME.accept_redirects by running the following command::

    $ sysctl net.ipv4.conf.net1.accept_redirects
    Example output
    net.ipv4.conf.net1.accept_redirects = 1

Configuring sysctl settings for pods associated with bonded SR-IOV interface flag

You can set interface-level network sysctl settings for a pod connected to a bonded SR-IOV network device.

In this example, the specific network interface-level sysctl settings that can be configured are set on the bonded interface.

The sysctl-tuning-test is a namespace used in this example.

  • Use the following command to create the sysctl-tuning-test namespace:

    $ oc create namespace sysctl-tuning-test

Setting all sysctl flag on nodes with bonded SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io custom resource definition (CRD) to OKD. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.

It might take several minutes for a configuration change to apply.

Follow this procedure to create a SriovNetworkNodePolicy custom resource (CR).

Procedure
  1. Create an SriovNetworkNodePolicy custom resource (CR). Save the following YAML as the file policyallflags-sriov-node-network.yaml. Replace policyallflags with the name for the configuration.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: policyallflags (1)
      namespace: openshift-sriov-network-operator (2)
    spec:
      resourceName: policyallflags (3)
      nodeSelector: (4)
        node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true`
      priority: 10 (5)
      numVfs: 5 (6)
      nicSelector: (7)
        pfNames: ["ens1f0"]  (8)
      deviceType: "netdevice" (9)
      isRdma: false (10)
    1 The name for the custom resource object.
    2 The namespace where the SR-IOV Network Operator is installed.
    3 The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
    4 The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
    5 Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
    6 The number of virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
    7 The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter, then you do not need to specify any other parameter because a network ID is unique.
    8 Optional: An array of one or more physical function (PF) names for the device.
    9 Optional: The driver type for the virtual functions. The only allowed value is netdevice. For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true.
    10 Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false. If the isRdma parameter is set to true, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications.

    The vfio-pci driver type is not supported.

  2. Create the SriovNetworkNodePolicy object:

    $ oc create -f policyallflags-sriov-node-network.yaml

    After applying the configuration update, all the pods in sriov-network-operator namespace change to the Running status.

  3. To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.

    $ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
    Example output
    Succeeded

Configuring sysctl on a bonded SR-IOV network

You can set interface specific sysctl settings on a bonded interface created from two SR-IOV interfaces. Do this by adding the tuning configuration to the optional Plugins parameter of the bond network attachment definition.

Do not edit NetworkAttachmentDefinition custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.

To change specific interface-level network sysctl settings create the SriovNetwork custom resource (CR) with the Container Network Interface (CNI) tuning plugin by using the following procedure.

Prerequisites
  • Install the OKD CLI (oc).

  • Log in to the OKD cluster as a user with cluster-admin privileges.

Procedure
  1. Create the SriovNetwork custom resource (CR) for the bonded interface as in the following example CR. Save the YAML as the file sriov-network-attachment.yaml.

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: allvalidflags (1)
      namespace: openshift-sriov-network-operator (2)
    spec:
      resourceName: policyallflags (3)
      networkNamespace: sysctl-tuning-test (4)
      capabilities: '{ "mac": true, "ips": true }' (5)
    1 A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
    2 The namespace where the SR-IOV Network Operator is installed.
    3 The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
    4 The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
    5 Optional: The capabilities to configure for this additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.
  2. Create the SriovNetwork resource:

    $ oc create -f sriov-network-attachment.yaml
  3. Create a bond network attachment definition as in the following example CR. Save the YAML as the file sriov-bond-network-interface.yaml.

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: bond-sysctl-network
      namespace: sysctl-tuning-test
    spec:
      config: '{
      "cniVersion":"0.4.0",
      "name":"bound-net",
      "plugins":[
        {
          "type":"bond", (1)
          "mode": "active-backup", (2)
          "failOverMac": 1, (3)
          "linksInContainer": true, (4)
          "miimon": "100",
          "links": [ (5)
            {"name": "net1"},
            {"name": "net2"}
          ],
          "ipam":{ (6)
            "type":"static"
          }
        },
        {
          "type":"tuning", (7)
          "capabilities":{
            "mac":true
          },
          "sysctl":{
            "net.ipv4.conf.IFNAME.accept_redirects": "0",
            "net.ipv4.conf.IFNAME.accept_source_route": "0",
            "net.ipv4.conf.IFNAME.disable_policy": "1",
            "net.ipv4.conf.IFNAME.secure_redirects": "0",
            "net.ipv4.conf.IFNAME.send_redirects": "0",
            "net.ipv6.conf.IFNAME.accept_redirects": "0",
            "net.ipv6.conf.IFNAME.accept_source_route": "1",
            "net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000",
            "net.ipv6.neigh.IFNAME.retrans_time_ms": "2000"
          }
        }
      ]
    }'
    1 The type is bond.
    2 The mode attribute specifies the bonding mode. The bonding modes supported are:
    • balance-rr - 0

    • active-backup - 1

    • balance-xor - 2

      For balance-rr or balance-xor modes, you must set the trust mode to on for the SR-IOV virtual function.

    3 The failover attribute is mandatory for active-backup mode.
    4 The linksInContainer=true flag informs the Bond CNI that the required interfaces are to be found inside the container. By default, Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus.
    5 The links section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: "net", plus a consecutive number, starting with one.
    6 A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. In this pod example IP addresses are configured manually, so in this case,ipam is set to static.
    7 Add additional capabilities to the device. For example, set the type field to tuning. Specify the interface-level network sysctl you want to set in the sysctl field. This example sets all interface-level network sysctl settings that can be set.
  4. Create the bond network attachment resource:

    $ oc create -f sriov-bond-network-interface.yaml
Verifying that the NetworkAttachmentDefinition CR is successfully created
  • Confirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition CR by running the following command:

    $ oc get network-attachment-definitions -n <namespace> (1)
    1 Replace <namespace> with the networkNamespace that you specified when configuring the network attachment, for example, sysctl-tuning-test.
    Example output
    NAME                          AGE
    bond-sysctl-network           22m
    allvalidflags                 47m

    There might be a delay before the SR-IOV Network Operator creates the CR.

Verifying that the additional SR-IOV network resource is successful

To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:

  1. Create a Pod CR. For example, save the following YAML as the file examplepod.yaml:

    apiVersion: v1
    kind: Pod
    metadata:
      name: tunepod
      namespace: sysctl-tuning-test
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
            {"name": "allvalidflags"}, (1)
            {"name": "allvalidflags"},
            {
              "name": "bond-sysctl-network",
              "interface": "bond0",
              "mac": "0a:56:0a:83:04:0c", (2)
              "ips": ["10.100.100.200/24"] (3)
           }
          ]
    spec:
      containers:
      - name: podexample
        image: centos
        command: ["/bin/bash", "-c", "sleep INF"]
        securityContext:
          runAsUser: 2000
          runAsGroup: 3000
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
    1 The name of the SR-IOV network attachment definition CR.
    2 Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object.
    3 Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object.
  2. Apply the YAML:

    $ oc apply -f examplepod.yaml
  3. Verify that the pod is created by running the following command:

    $ oc get pod -n sysctl-tuning-test
    Example output
    NAME      READY   STATUS    RESTARTS   AGE
    tunepod   1/1     Running   0          47s
  4. Log in to the pod by running the following command:

    $ oc rsh -n sysctl-tuning-test tunepod
  5. Verify the values of the configured sysctl flag. Find the value net.ipv6.neigh.IFNAME.base_reachable_time_ms by running the following command::

    $ sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
    Example output
    net.ipv6.neigh.bond0.base_reachable_time_ms = 20000