What Device Plug-ins Do

Device Plug-ins are in Technology Preview and not for production workloads.

Device plug-ins allow you to use a particular device type (GPU, InfiniBand, or other similar computing resources that require vendor-specific initialization and setup) in your OKD pod without needing to write custom code. The device plug-in provides a consistent and portable solution to consume hardware devices across clusters. The device plug-in provides support for these devices through an extension mechanism, which makes these devices available to containers, provides health checks of these devices, and securely shares them.

A device plug-in is a gRPC service running on the nodes (external to atomic-openshift-node.service) that is responsible for managing specific hardware resources. Any device plug-in must support following remote procedure calls (RPCs):

service DevicePlugin {
      // ListAndWatch returns a stream of List of Devices
      // Whenever a Device state change or a Device disappears, ListAndWatch
      // returns the new list
      rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}

      // Allocate is called during container creation so that the Device
      // Plugin can run device specific operations and instruct Kubelet
      // of the steps to make the Device available in the container
      rpc Allocate(AllocateRequest) returns (AllocateResponse) {}

Example Device Plug-ins

For easy device plug-in reference implementation, there is a stub device plug-in in the Device Manager code: vendor/k8s.io/kubernetes/pkg/kubelet/cm/deviceplugin/device_plugin_stub.go.

Methods for Deploying a Device Plug-in

  • Daemonsets are the recommended approach for device plug-in deployments.

  • Upon start, the device plug-in will try to create a UNIX domain socket at /var/lib/kubelet/device-plugin/ on the node to serve RPCs from Device Manager.

  • Since device plug-ins need to manage hardware resources, access to the host file system, as well as socket creation, they must be run in a privileged security context.

  • More specific details regarding deployment steps can be found with each device plug-in implementation.