×

Ethernet device configuration object

You can configure an Ethernet network device by defining an SriovNetwork object.

The following YAML describes an SriovNetwork object:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: <name> (1)
  namespace: openshift-sriov-network-operator (2)
spec:
  resourceName: <sriov_resource_name> (3)
  networkNamespace: <target_namespace> (4)
  vlan: <vlan> (5)
  spoofChk: "<spoof_check>" (6)
  ipam: |- (7)
    {}
  linkState: <link_state> (8)
  maxTxRate: <max_tx_rate> (9)
  minTxRate: <min_tx_rate> (10)
  vlanQoS: <vlan_qos> (11)
  trust: "<trust_vf>" (12)
  capabilities: <capabilities> (13)
1 A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
2 The namespace where the SR-IOV Network Operator is installed.
3 The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
4 The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
5 Optional: A Virtual LAN (VLAN) ID for the additional network. The integer value must be from 0 to 4095. The default value is 0.
6 Optional: The spoof check mode of the VF. The allowed values are the strings "on" and "off".

You must enclose the value you specify in quotes or the object is rejected by the SR-IOV Network Operator.

7 A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
8 Optional: The link state of virtual function (VF). Allowed value are enable, disable and auto.
9 Optional: A maximum transmission rate, in Mbps, for the VF.
10 Optional: A minimum transmission rate, in Mbps, for the VF. This value must be less than or equal to the maximum transmission rate.

Intel NICs do not support the minTxRate parameter. For more information, see BZ#1772847.

11 Optional: An IEEE 802.1p priority level for the VF. The default value is 0.
12 Optional: The trust mode of the VF. The allowed values are the strings "on" and "off".

You must enclose the value that you specify in quotes, or the SR-IOV Network Operator rejects the object.

13 Optional: The capabilities to configure for this additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.

Configuration of IP address assignment for an additional network

The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.

You can use the following IP address assignment types:

  • Static assignment.

  • Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.

  • Dynamic assignment through the Whereabouts IPAM CNI plugin.

Static IP address assignment configuration

The following table describes the configuration for static IP address assignment:

Table 1. ipam static configuration object
Field Type Description

type

string

The IPAM address type. The value static is required.

addresses

array

An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.

routes

array

An array of objects specifying routes to configure inside the pod.

dns

array

Optional: An array of objects specifying the DNS configuration.

The addresses array requires objects with the following fields:

Table 2. ipam.addresses[] array
Field Type Description

address

string

An IP address and network prefix that you specify. For example, if you specify 10.10.21.10/24, then the additional network is assigned an IP address of 10.10.21.10 and the netmask is 255.255.255.0.

gateway

string

The default gateway to route egress network traffic to.

Table 3. ipam.routes[] array
Field Type Description

dst

string

The IP address range in CIDR format, such as 192.168.17.0/24 or 0.0.0.0/0 for the default route.

gw

string

The gateway where network traffic is routed.

Table 4. ipam.dns object
Field Type Description

nameservers

array

An array of one or more IP addresses for to send DNS queries to.

domain

array

The default domain to append to a hostname. For example, if the domain is set to example.com, a DNS lookup query for example-host is rewritten as example-host.example.com.

search

array

An array of domain names to append to an unqualified hostname, such as example-host, during a DNS lookup query.

Static IP address assignment configuration example
{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

Dynamic IP address (DHCP) assignment configuration

The following JSON describes the configuration for dynamic IP address address assignment with DHCP.

Renewal of DHCP leases

A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.

The SR-IOV Network Operator does not create a DHCP server deployment; The Cluster Network Operator is responsible for creating the minimal DHCP server deployment.

To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:

Example shim network attachment definition
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...
Table 5. ipam DHCP configuration object
Field Type Description

type

string

The IPAM address type. The value dhcp is required.

Dynamic IP address (DHCP) assignment configuration example
{
  "ipam": {
    "type": "dhcp"
  }
}

Dynamic IP address assignment configuration with Whereabouts

The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.

The following table describes the configuration for dynamic IP address assignment with Whereabouts:

Table 6. ipam whereabouts configuration object
Field Type Description

type

string

The IPAM address type. The value whereabouts is required.

range

string

An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses.

exclude

array

Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned.

Dynamic IP address assignment configuration example that uses Whereabouts
{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

Configuring SR-IOV additional network

You can configure an additional network that uses SR-IOV hardware by creating an SriovNetwork object. When you create an SriovNetwork object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition object.

Do not modify or delete an SriovNetwork object if it is attached to any pods in a running state.

Prerequisites
  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

Procedure
  1. Create a SriovNetwork object, and then save the YAML in the <name>.yaml file, where <name> is a name for this additional network. The object specification might resemble the following example:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: attach1
      namespace: openshift-sriov-network-operator
    spec:
      resourceName: net1
      networkNamespace: project2
      ipam: |-
        {
          "type": "host-local",
          "subnet": "10.56.217.0/24",
          "rangeStart": "10.56.217.171",
          "rangeEnd": "10.56.217.181",
          "gateway": "10.56.217.1"
        }
  2. To create the object, enter the following command:

    $ oc create -f <name>.yaml

    where <name> specifies the name of the additional network.

  3. Optional: To confirm that the NetworkAttachmentDefinition object that is associated with the SriovNetwork object that you created in the previous step exists, enter the following command. Replace <namespace> with the networkNamespace you specified in the SriovNetwork object.

    $ oc get net-attach-def -n <namespace>

Change the MTU value of a virtual function for a running pod

You can change the maximum transmission unit (MTU) of a virtual function (VF) for a running pod by omitting the mtu field from the SriovNetworkNodePolicy custom resource (CR) and configuring the physical function (PF) MTU by using the Kubernetes NMState Operator.

When the mtu field is set in the SriovNetworkNodePolicy CR, the SR-IOV Network Operator continuously enforces that MTU value on the VF. This reverts any application-level MTU changes and can trigger a node drain. To avoid this conflict, use the following approach:

  • Omit the mtu field from the SriovNetworkNodePolicy CR. This allows the SR-IOV Network Operator to provision VFs without managing their MTU.

  • Use the Kubernetes NMState Operator to set the MTU of the PF to the required value. A VF cannot have a higher MTU than its parent PF, so you must set the PF MTU first.

With these configurations in place, a pod that has the NET_ADMIN Linux capability can safely set its own VF MTU without interference from the SR-IOV Network Operator.

If you already configured a value for the mtu field in your SriovNetworkNodePolicy CR, removing it might trigger a node drain. Perform this change during a scheduled maintenance window.

Prerequisites
  • You installed the OpenShift CLI (oc).

  • You logged in as a user with cluster-admin privileges.

  • You installed the SR-IOV Network Operator.

  • You installed the Kubernetes NMState Operator.

Procedure
  1. Verify that the mtu field is not present in your SriovNetworkNodePolicy CR by running the following command:

    $ oc get sriovnetworknodepolicy <policy_name> -n openshift-sriov-network-operator -o jsonpath='{.spec.mtu}'

    where:

    <policy_name>

    Specifies the name of the SriovNetworkNodePolicy CR.

    If the command returns a value, remove the mtu field from the CR by running the following command:

    $ oc patch sriovnetworknodepolicy <policy_name> -n openshift-sriov-network-operator \
      --type=json -p='[{"op": "remove", "path": "/spec/mtu"}]'

    The SR-IOV Network Operator reconciles and creates the VFs with the default MTU of 1500.

  2. Verify that the VFs are created with the default MTU by running the following commands:

    $ oc debug node/<node_name>
    # chroot /host
    # ip link show <vf_interface>

    where:

    <node_name>

    Specifies the name of the node where the PF is located.

    <vf_interface>

    Specifies the VF interface name, for example ens3f0v0.

    Example output
    4: ens3f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether aa:bb:cc:dd:ee:01 brd ff:ff:ff:ff:ff:ff
  3. Create a NodeNetworkConfigurationPolicy CR to set the MTU of the PF:

    1. Create a file named nncp-set-pf-mtu.yaml with the following content:

      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: set-pf-mtu
      spec:
        nodeSelector:
          kubernetes.io/hostname: <node_name>
        desiredState:
          interfaces:
            - name: <pf_interface>
              type: ethernet
              state: up
              mtu: <mtu_value>

      where:

      <node_name>

      Specifies the name of the node where the PF is located.

      <pf_interface>

      Specifies the name of the PF interface, for example ens3f0.

      <mtu_value>

      Specifies the required MTU value for the PF, for example 9000. This value must be greater than or equal to the MTU that the application sets on the VF.

    2. Apply the CR by running the following command:

      $ oc apply -f nncp-set-pf-mtu.yaml
  4. Verify that the NMState policy has been applied successfully by running the following command:

    $ oc get nodenetworkconfigurationpolicy set-pf-mtu
    Example output
    NAME          STATUS      REASON
    set-pf-mtu    Available   SuccessfullyConfigured

    Wait until the STATUS column shows Available before proceeding.

  5. Verify that the PF MTU has been updated on the node by running the following commands:

    $ oc debug node/<node_name>
    # chroot /host
    # ip link show <pf_interface>

    where:

    <node_name>

    Specifies the name of the node where the PF is located.

    <pf_interface>

    Specifies the name of the PF interface, for example ens3f0.

    Example output
    2: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether aa:bb:cc:dd:ee:ff brd ff:ff:ff:ff:ff:ff

    The VFs retain their default MTU of 1500 at this stage.

  6. Deploy or update the application pod to set the VF MTU at container startup:

    1. Create or update the pod spec with a startup command that sets the VF MTU before the application starts:

      apiVersion: v1
      kind: Pod
      metadata:
        name: <pod_name>
        namespace: <namespace>
        annotations:
          k8s.v1.cni.cncf.io/networks: <sriov_network_name>
      spec:
        containers:
          - name: <container_name>
            image: <image>
            command: ["/bin/sh"]
            args:
              - "-c"
              - "ip link set mtu <mtu_value> dev <vf_interface>; <application_command>"
            securityContext:
              capabilities:
                add: ["NET_ADMIN"]
            resources:
              requests:
                <sriov_resource_name>: "1"
              limits:
                <sriov_resource_name>: "1"

      where:

      command and args

      Sets the VF MTU to the specified value before running the application command.

      NET_ADMIN

      The NET_ADMIN Linux capability is required for the container to change network interface settings.

      <pod_name>

      Specifies the name of the pod.

      <namespace>

      Specifies the namespace where the pod runs.

      <sriov_network_name>

      Specifies the name of the SriovNetwork CR that provides the VF to the pod.

      <container_name>

      Specifies the name of the container.

      <image>

      Specifies the container image to use.

      <mtu_value>

      Specifies the required MTU value, for example 9000.

      <vf_interface>

      Specifies the VF interface name as it is displayed inside the pod, typically net1.

      <application_command>

      Specifies the main application command to run after the MTU is set.

      <sriov_resource_name>

      Specifies the SR-IOV resource name defined in the spec.resourceName field of the SriovNetworkNodePolicy CR.

    2. Apply the pod spec by running the following command:

      $ oc apply -f <pod_spec_file>.yaml

      where:

      <pod_spec_file>

      Specifies the name of the file containing the pod specification.

  7. Verify that the VF MTU inside the pod has been set to the expected value by running the following command:

    $ oc exec <pod_name> -n <namespace> -- ip link show <vf_interface>

    where:

    <pod_name>

    Specifies the name of the pod.

    <namespace>

    Specifies the namespace where the pod is running.

    <vf_interface>

    Specifies the VF interface name inside the pod, for example net1.

    Example output
    3: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 00:00:5E:00:53:01 brd ff:ff:ff:ff:ff:ff

    The example output confirms that the VF MTU matches the value set by the pod startup command. The SR-IOV Network Operator preserves this value because the SriovNetworkNodePolicy CR delegates MTU management to the pod.