×

Remote Direct Memory Access (RDMA) allows direct memory access between two systems without involving the operating system of either system. You can configure an RDMA Container Network Interface (CNI) on Single Root I/O Virtualization (SR-IOV) to enable high-performance, low-latency communication between containers. When you combine RDMA with SR-IOV, you provide a mechanism to expose hardware counters of Mellanox Ethernet devices for use inside Data Plane Development Kit (DPDK) applications.

Configuring SR-IOV RDMA CNI

Configure an RDMA CNI on SR-IOV.

This procedure applies only to Mellanox devices.

Prerequisites
  • You have installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the SR-IOV Network Operator.

Procedure
  1. Create an SriovNetworkPoolConfig CR and save it as sriov-nw-pool.yaml, as shown in the following example:

    Example SriovNetworkPoolConfig CR
    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkPoolConfig
    metadata:
      name: worker
      namespace: openshift-sriov-network-operator
    spec:
      maxUnavailable: 1
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
      rdmaMode: exclusive (1)
    1 Set RDMA network namespace mode to exclusive.
  2. Create the SriovNetworkPoolConfig resource by running the following command:

    $ oc create -f sriov-nw-pool.yaml
  3. Create an SriovNetworkNodePolicy CR and save it as sriov-node-policy.yaml, as shown in the following example:

    Example SriovNetworkNodePolicy CR
    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: sriov-nic-pf1
      namespace: openshift-sriov-network-operator
    spec:
      deviceType: netdevice
      isRdma: true (1)
      nicSelector:
        pfNames: ["ens3f0np0"]
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      numVfs: 4
      priority: 99
      resourceName: sriov_nic_pf1
    1 Activate RDMA mode.
  4. Create the SriovNetworkNodePolicy resource by running the following command:

    $ oc create -f sriov-node-policy.yaml
  5. Create an SriovNetwork CR and save it as sriov-network.yaml, as shown in the following example:

    Example SriovNetwork CR
    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetwork
    metadata:
      name: sriov-nic-pf1
      namespace: openshift-sriov-network-operator
    spec:
      networkNamespace: sriov-tests
      resourceName: sriov_nic_pf1
        ipam: |-
      metaPlugins: |
        {
          "type": "rdma" (1)
        }
    1 Create the RDMA plugin.
  6. Create the SriovNetwork resource by running the following command:

    $ oc create -f sriov-network.yaml
Verification
  1. Create a Pod CR and save it as sriov-test-pod.yaml, as shown in the following example:

    Example runtime configuration
    apiVersion: v1
    kind: Pod
    metadata:
      name: sample-pod
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
            {
              "name": "net1",
              "mac": "20:04:0f:f1:88:01",
              "ips": ["192.168.10.1/24", "2001::1/64"]
            }
          ]
    spec:
      containers:
      - name: sample-container
        image: <image>
        imagePullPolicy: IfNotPresent
        command: ["sleep", "infinity"]
  2. Create the test pod by running the following command:

    $ oc create -f sriov-test-pod.yaml
  3. Log in to the test pod by running the following command:

    $ oc rsh testpod1 -n sriov-tests
  4. Verify that the path to the hw-counters directory exists by running the following command:

    $ ls /sys/bus/pci/devices/${PCIDEVICE_OPENSHIFT_IO_SRIOV_NIC_PF1}/infiniband/*/ports/1/hw_counters/
    Example output
    duplicate_request       out_of_buffer req_cqe_flush_error           resp_cqe_flush_error        roce_adp_retrans        roce_slow_restart_trans
    implied_nak_seq_err     out_of_sequence req_remote_access_errors    resp_local_length_error     roce_adp_retrans_to     rx_atomic_requests
    lifespan                packet_seq_err req_remote_invalid_request   resp_remote_access_errors   roce_slow_restart       rx_read_requests
    local_ack_timeout_err  req_cqe_error resp_cqe_error                 rnr_nak_retry_err           roce_slow_restart_cnps  rx_write_requests