Running a custom scheduler - Controlling pod placement onto nodes (scheduling) | Nodes

Deploying a custom scheduler
Deploying pods using a custom scheduler
Additional resources

You can run multiple custom schedulers alongside the default scheduler and configure which scheduler to use for each pod.

It is supported to use a custom scheduler with OKD, but Red Hat does not directly support the functionality of the custom scheduler.

For information on how to configure the default scheduler, see Configuring the default scheduler to control pod placement.

To schedule a given pod using a specific scheduler, specify the name of the scheduler in that Pod specification.

Deploying a custom scheduler

To include a custom scheduler in your cluster, include the image for a custom scheduler in a deployment.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.

You have a scheduler binary.

Information on how to create a scheduler binary is outside the scope of this document. For an example, see Configure Multiple Schedulers in the Kubernetes documentation. Note that the actual functionality of your custom scheduler is not supported by Red Hat.

You have created an image containing the scheduler binary and pushed it to a registry.

Procedure

Create a file that contains the deployment resources for the custom scheduler:

Example custom-scheduler.yaml file

apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-scheduler
  namespace: kube-system (1)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
  name: custom-scheduler
  namespace: kube-system (1)
roleRef:
  kind: ClusterRole
  name: system:kube-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-scheduler-as-volume-scheduler
subjects:
- kind: ServiceAccount
  name: custom-scheduler
  namespace: kube-system (1)
roleRef:
  kind: ClusterRole
  name: system:volume-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: scheduler
    tier: control-plane
  name: custom-scheduler
  namespace: kube-system (1)
spec:
  selector:
    matchLabels:
      component: scheduler
      tier: control-plane
  replicas: 1
  template:
    metadata:
      labels:
        component: scheduler
        tier: control-plane
        version: second
    spec:
      serviceAccountName: custom-scheduler
      containers:
      - command:
        - /usr/local/bin/kube-scheduler
        - --address=0.0.0.0
        - --leader-elect=false
        - --scheduler-name=custom-scheduler (2)
        image: "<namespace>/<image_name>:<tag>" (3)
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10251
          initialDelaySeconds: 15
        name: kube-second-scheduler
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10251
        resources:
          requests:
            cpu: '0.1'
        securityContext:
          privileged: false
        volumeMounts: []
      hostNetwork: false
      hostPID: false
      volumes: []

1	This procedure uses the `kube-system` namespace, but you can use the namespace of your choosing.
2	The command for your custom scheduler might require different arguments. For example, you can pass configuration as a mounted volume using the `--config` argument.
3	Specify the container image that you created for the custom scheduler.

Create the deployment resources in the cluster:
```
$ oc create -f custom-scheduler.yaml
```

Verification

Verify that the scheduler pod is running:

$ oc get pods -n kube-system

The custom scheduler pod is listed as Running:

NAME                                                       READY   STATUS    RESTARTS   AGE
custom-scheduler-6cd7c4b8bc-854zb                          1/1     Running   0          2m

Deploying pods using a custom scheduler

After the custom scheduler is deployed in your cluster, you can configure pods to use that scheduler instead of the default scheduler.

Each scheduler has a separate view of resources in a cluster. For that reason, each scheduler should operate over its own set of nodes.

If two or more schedulers operate on the same node, they might intervene with each other and schedule more pods on the same node than there are available resources for. Pods might get rejected due to insufficient resources in this case.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
The custom scheduler has been deployed in the cluster.

Procedure

If your cluster uses role-based access control (RBAC), add the custom scheduler name to the system:kube-scheduler cluster role.

Edit the system:kube-scheduler cluster role:

$ oc edit clusterrole system:kube-scheduler

Add the name of the custom scheduler to the resourceNames lists for the leases and endpoints resources:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2021-07-07T10:19:14Z"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:kube-scheduler
  resourceVersion: "125"
  uid: 53896c70-b332-420a-b2a4-f72c822313f2
rules:
...
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
- apiGroups:
  - coordination.k8s.io
  resourceNames:
  - kube-scheduler
  - custom-scheduler (1)
  resources:
  - leases
  verbs:
  - get
  - update
- apiGroups:
  - ""
  resources:
  - endpoints
  verbs:
  - create
- apiGroups:
  - ""
  resourceNames:
  - kube-scheduler
  - custom-scheduler (1)
  resources:
  - endpoints
  verbs:
  - get
  - update
...

1	This example uses `custom-scheduler` as the custom scheduler name.

Create a Pod configuration and specify the name of the custom scheduler in the schedulerName parameter:
Example custom-scheduler-example.yaml file
```
apiVersion: v1
kind: Pod
metadata:
  name: custom-scheduler-example
  labels:
    name: custom-scheduler-example
spec:
  schedulerName: custom-scheduler (1)
  containers:
  - name: pod-with-second-annotation-container
    image: docker.io/ocpqe/hello-pod
```
1 The name of the custom scheduler to use, which is custom-scheduler in this example. When no scheduler name is supplied, the pod is automatically scheduled using the default scheduler.

Create the pod:

$ oc create -f custom-scheduler-example.yaml

Verification

Enter the following command to check that the pod was created:

$ oc get pod custom-scheduler-example

The custom-scheduler-example pod is listed in the output:

NAME                       READY     STATUS    RESTARTS   AGE
custom-scheduler-example   1/1       Running   0          4m

Enter the following command to check that the custom scheduler has scheduled the pod:

$ oc describe pod custom-scheduler-example

The scheduler, custom-scheduler, is listed as shown in the following truncated output:

Events:
  Type    Reason          Age        From                                               Message
  ----    ------          ----       ----                                               -------
  Normal  Scheduled       <unknown>  custom-scheduler                                   Successfully assigned default/custom-scheduler-example to <node_name>

Additional resources

Learning container best practices