You can use Single Root I/O Virtualization (SR-IOV) network devices with additional networks on your OKD cluster for high performance applications.

About SR-IOV hardware on OKD

OKD includes the capability to use SR-IOV hardware on your nodes. You can attach SR-IOV virtual function (VF) interfaces to Pods on nodes with SR-IOV hardware.

You can use the OKD console to install SR-IOV by deploying the SR-IOV Network Operator. The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. The Operator provisions the following components:

  • Provision the SR-IOV network operator deployment on master nodes.

  • Provision the SR-IOV network config daemon on worker nodes.

  • Provision the Operator webhook on master nodes.

  • Provision the Network resources injector on master nodes.

  • Provision the SR-IOV network device plug-in on worker nodes.

  • Provision the SR-IOV CNI plug-in executable on worker nodes.

Here’s the function of each above mentioned SR-IOV components.

  • The SR-IOV Operator is a Kubernetes Deployment that manages all SR-IOV components in a cluster. It watches creation, update and deletion of Operator Custom Resources and takes corresponding actions such as generating NetworkAttachmentDefinition Custom Resources for SR-IOV CNI, creating and updating configuration of SR-IOV network device plug-in, creating node specific SriovNetworkNodeState Custom Resources and updating Spec.Interfaces field in each SriovNetworkNodeState Custom Resource, etc.

  • The SR-IOV network config daemon is a Kubernetes DaemonSet deployed on worker nodes when SR-IOV Operator is launched. It is responsible for discovering and initializing SR-IOV network devices in cluster.

  • The Operator webhook is a Kubernetes Dynamic Admission Controller Webhook that validates correctness of Operator Custom Resource and sets default values for fields in Operator Custom Resource that are not configured by user.

  • The Network resources injector is a Kubernetes Dynamic Admission Controller Webhook that provides functionality for patching Kubernetes Pod specifications with requests and limits for custom network resources such as SR-IOV VFs.

  • The SR-IOV network device plug-in is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function (VF) resources. Device plug-ins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plug-ins give the Kubernetes scheduler awareness of resource availability, so the scheduler can schedule Pods on nodes with sufficient resources.

  • The SR-IOV CNI plug-in plumbs VF interfaces allocated from the SR-IOV device plug-in directly into a Pod.

The Network resources injector and Operator webhook are enabled by default and can be disabled by editting default SriovOperatorConfig CR.

Supported devices

The following Network Interface Card (NIC) models are supported in OKD:

  • Intel XXV710 25GbE SFP28 with vendor ID 0x8086 and device ID 0x158b

  • Mellanox MT27710 Family [ConnectX-4 Lx] 25GbE dual-port SFP28 with vendor ID 0x15b3 and device ID 0x1015

  • Mellanox MT27800 Family [ConnectX-5] 25GbE dual-port SFP28 with vendor ID 0x15b3 and device ID 0x1017

  • Mellanox MT27800 Family [ConnectX-5] 100GbE with vendor ID 0x15b3 and device ID 0x1017

Example use of a virtual function (VF) in a Pod

You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a Pod with SR-IOV VF attached.

This example shows a Pod using a VF in RDMA mode:

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>
    imagePullPolicy: IfNotPresent
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    command: ["sleep", "infinity"]

The following example shows a Pod with a VF in DPDK mode:

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>
    securityContext:
     capabilities:
        add: ["IPC_LOCK"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
      requests:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

An optional library is available to aid the application running in a container in gathering network information associated with a pod. This library is called 'app-netutil'. See the library’s source code in the app-netutil GitHub repo.

This library is intended to ease the integration of the SR-IOV VFs in DPDK mode into the container. The library provides both a GO API and a C API, as well as examples of using both languages.

There is also a sample Docker image, 'dpdk-app-centos', which can run one of the following DPDK sample applications based on an environmental variable in the pod-spec: l2fwd, l3wd or testpmd. This Docker image provides an example of integrating the 'app-netutil' into the container image itself. The library can also integrate into an init-container which collects the desired data and passes the data to an existing DPDK workload.