There might be some scenarios where you want a more controlled rollout of an update to the worker nodes in order to ensure that mission-critical applications stay available during the whole update, even if the update process causes your applications to fail. Depending on your organizational needs, you might want to update a small subset of worker nodes, evaluate cluster and workload health over a period of time, then update the remaining nodes. This is commonly referred to as a canary update. Or, you might also want to fit worker node updates, which often require a host reboot, into smaller defined maintenance windows when it is not possible to take a large maintenance window to update the entire cluster at one time.

In these scenarios, you can create multiple custom machine config pools (MCPs) to prevent certain worker nodes from updating when you update the cluster. After the rest of the cluster is updated, you can update those worker nodes in batches at appropriate times.

For example, if you have a cluster with 100 nodes with 10% excess capacity, maintenance windows that must not exceed 4 hours, and you know that it takes no longer than 8 minutes to drain and reboot a worker node, you can leverage MCPs to meet your goals. For example, you could define four MCPs, named workerpool-canary, workerpool-A, workerpool-B, and workerpool-C, with 10, 30, 30, and 30 nodes respectively.

During your first maintenance window, you would pause the MCP for workerpool-A, workerpool-B, and workerpool-C, then initiate the cluster update. This updates components that run on top of OKD and the 10 nodes which are members of the workerpool-canary MCP, because that pool was not paused. The other three MCPs are not updated, because they were paused. If for some reason, you determine that your cluster or workload health was negatively affected by the workerpool-canary update, you would then cordon and drain all nodes in that pool while still maintaining sufficient capacity until you have diagnosed the problem. When everything is working as expected, you would then evaluate the cluster and workload health before deciding to unpause, and thus update, workerpool-A, workerpool-B, and workerpool-C in succession during each additional maintenance window.

While managing worker node updates using custom MCPs provides flexibility, it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that can affect the entire cluster. It is recommended that you carefully consider your organizational needs and carefully plan the implemention of the process before you start.

It is not recommended to update the MCPs to different OKD versions. For example, do not update one MCP from 4.y.10 to 4.y.11 and another to 4.y.12. This scenario has not been tested and might result in an undefined cluster state.

Pausing a machine config pool prevents the Machine Config Operator from applying any configuration changes on the associated nodes. Pausing an MCP also prevents any automatically-rotated certificates from being pushed to the associated nodes, including the automatic CA rotation of the kube-apiserver-to-kubelet-signer CA certificate. If the MCP is paused when the kube-apiserver-to-kubelet-signer CA certificate expires and the MCO attempts to automatially renew the certificate, the new certificate is created but not applied across the nodes in the respective machine config pool. This causes failure in multiple oc commands, including but not limited to oc debug, oc logs, oc exec, and oc attach. Pausing an MCP should be done with careful consideration about the kube-apiserver-to-kubelet-signer CA certificate expiration and for short periods of time only.

About the canary rollout update process and MCPs

In OKD, nodes are not considered individually. Nodes are grouped into machine config pools (MCP). There are two MCPs in a default OKD cluster: one for the control plane nodes and one for the worker nodes. An OKD update affects all MCPs concurrently.

During the update, the Machine Config Operator (MCO) drains and cordons all nodes within a MCP up to the specified maxUnavailable number of nodes (if specified), by default 1. Draining and cordoning a node deschedules all pods on the node and marks the node as unschedulable. After the node is drained, the Machine Config Daemon applies a new machine configuration, which can include updating the operating system (OS). Updating the OS requires the host to reboot.

To prevent specific nodes from being updated, and thus, not drained, cordoned, and updated, you can create custom MCPs. Then, pause those MCPs to ensure that the nodes associated with those MCPs are not updated. The MCO does not update any paused MCPs. You can create one or more custom MCPs, which can give you more control over the sequence in which you update those nodes. After you update the nodes in the first MCP, you can verify the application compatibility, and then update the rest of the nodes gradually to the new version.

To ensure the stability of the control plane, creating a custom MCP from the control plane nodes (also known as the master nodes) is not supported. The Machine Config Operator (MCO) ignores any custom MCP created for the control plane nodes.

You should give careful consideration to the number of MCPs you create and the number of nodes in each MCP, based on your workload deployment topology. For example, If you need to fit updates into specific maintenance windows, you need to know how many nodes that OKD can update within a window. This number is dependent on your unique cluster and workload characteristics.

Also, you need to consider how much extra capacity you have available in your cluster. For example, in the case where your applications fail to work as expected on the updated nodes, you can cordon and drain those nodes in the pool, which moves the application pods to other nodes. You need to consider how much extra capacity you have available in order to determine the number of custom MCPs you need and how many nodes are in each MCP. For example, if you use two custom MCPs and 50% of your nodes are in each pool, you need to determine if running 50% of your nodes would provide sufficient quality-of-service (QoS) for your applications.

You can use this update process with all documented OKD update processes. However, the process does not work with Fedora machines, which are updated using Ansible playbooks.

About performing a canary rollout update

This topic describes the general workflow of this canary rollout update process. The steps to perform each task in the workflow are described in the following sections.

  1. Create MCPs based on the worker pool. The number of nodes in each MCP depends on a few factors, such as your maintenance window duration for each MCP, and the amount of reserve capacity, meaning extra worker nodes, available in your cluster.

    You can change the maxUnavailable setting in an MCP to specify the percentage or the number of machines that can be updating at any given time. The default is 1.

  2. Add a node selector to the custom MCPs. For each node that you do not want to update simultaneously with the rest of the cluster, add a matching label to the nodes. This label associates the node to the MCP.

    Do not remove the default worker label from the nodes. The nodes must have a role label to function properly in the cluster.

  3. Pause the MCPs you do not want to update as part of the update process.

    Pausing the MCP also pauses the kube-apiserver-to-kubelet-signer automatic CA certificates rotation. New CA certificates are generated at 292 days from the installation date and old certificates are removed 365 days from the installation date. See the Understand CA cert auto renewal in Red Hat OpenShift 4 to find out how much time you have before the next automatic CA certificate rotation. Make sure the pools are unpaused when the CA cert rotation happens. If the MCPs are paused, the cert rotation does not happen, which causes the cluster to become degraded and causes failure in multiple oc commands, including but not limited to oc debug, oc logs, oc exec, and oc attach.

  4. Perform the cluster update. The update process updates the MCPs that are not paused, including the control plane nodes (also known as the master nodes).

  5. Test the applications on the updated nodes to ensure they are working as expected.

  6. Unpause the remaining MCPs one-by-one and test the applications on those nodes until all worker nodes are updated. Unpausing an MCP starts the update process for the nodes associated with that MCP. You can check the progress of the update from the web console by clicking AdministrationCluster settings. Or, use the oc get machineconfigpools CLI command.

  7. Optionally, remove the custom label from updated nodes and delete the custom MCPs.

Creating machine config pools to perform a canary rollout update

The first task in performing this canary rollout update is to create one or more machine config pools (MCP).

  1. Create an MCP from a worker node.

    1. List the worker nodes in your cluster.

      $ oc get -l 'node-role.kubernetes.io/master!=' -o 'jsonpath={range .items[*]}{.metadata.name}{"\n"}{end}' nodes
      Example output
      ci-ln-pwnll6b-f76d1-s8t9n-worker-a-s75z4
      ci-ln-pwnll6b-f76d1-s8t9n-worker-b-dglj2
      ci-ln-pwnll6b-f76d1-s8t9n-worker-c-lldbm
    2. For the nodes you want to delay, add a custom label to the node:

      $ oc label node <node name> node-role.kubernetes.io/<custom-label>=

      For example:

      $ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary=
      Example output
      node/ci-ln-gtrwm8t-f76d1-spbl7-worker-a-xk76k labeled
    3. Create the new MCP:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: workerpool-canary (1)
      spec:
        machineConfigSelector:
          matchExpressions: (2)
            - {
               key: machineconfiguration.openshift.io/role,
               operator: In,
               values: [worker,workerpool-canary]
              }
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/workerpool-canary: "" (3)
      1 Specify a name for the MCP.
      2 Specify the worker and custom MCP name.
      3 Specify the custom label you added to the nodes that you want in this pool.
      $ oc create -f <file_name>
      Example output
      machineconfigpool.machineconfiguration.openshift.io/workerpool-canary created
    4. View the list of MCPs in the cluster and their current state:

      $ oc get machineconfigpool
      Example output
      NAME              CONFIG                                                        UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master            rendered-master-b0bb90c4921860f2a5d8a2f8137c1867              True      False      False      3              3                   3                     0                      97m
      workerpool-canary rendered-workerpool-canary-87ba3dec1ad78cb6aecebf7fbb476a36   True      False      False      1              1                   1                     0                      2m42s
      worker            rendered-worker-87ba3dec1ad78cb6aecebf7fbb476a36              True      False      False      2              2                   2                     0                      97m

      The new machine config pool, workerpool-canary, is created and the number of nodes to which you added the custom label are shown in the machine counts. The worker MCP machine counts are reduced by the same number. It can take several minutes to update the machine counts. In this example, one node was moved from the worker MCP to the workerpool-canary MCP.

Pausing the machine config pools

In this canary rollout update process, after you label the nodes that you do not want to update with the rest of your OKD cluster and create the machine config pools (MCPs), you pause those MCPs. Pausing an MCP prevents the Machine Config Operator (MCO) from updating the nodes associated with that MCP.

Pausing the MCP also pauses the kube-apiserver-to-kubelet-signer automatic CA certificates rotation. New CA certificates are generated at 292 days from the installation date and old certificates are removed 365 days from the installation date. See the Understand CA cert auto renewal in Red Hat OpenShift 4 to find out how much time you have before the next automatic CA certificate rotation. Make sure the pools are unpaused when the CA cert rotation happens. If the MCPs are paused, the cert rotation does not happen, which causes the cluster to become degraded and causes failure in multiple oc commands, including but not limited to oc debug, oc logs, oc exec, and oc attach.

To pause an MCP:

  1. Patch the MCP that you want paused:

    $ oc patch mcp/<mcp_name> --patch '{"spec":{"paused":true}}' --type=merge

    For example:

    $  oc patch mcp/workerpool-canary --patch '{"spec":{"paused":true}}' --type=merge
    Example output
    machineconfigpool.machineconfiguration.openshift.io/workerpool-canary patched

Performing the cluster update

When the MCPs enter ready state, you can peform the cluster update. See one of the following update methods, as appropriate for your cluster:

After the update is complete, you can start to unpause the MCPs one-by-one.

Unpausing the machine config pools

In this canary rollout update process, after the OKD update is complete, unpause your custom MCPs one-by-one. Unpausing an MCP allows the Machine Config Operator (MCO) to update the nodes associated with that MCP.

To unpause an MCP:

  1. Patch the MCP that you want to unpause:

    $ oc patch mcp/<mcp_name> --patch '{"spec":{"paused":false}}' --type=merge

    For example:

    $  oc patch mcp/workerpool-canary --patch '{"spec":{"paused":false}}' --type=merge
    Example output
    machineconfigpool.machineconfiguration.openshift.io/workerpool-canary patched

    You can check the progress of the update by using the oc get machineconfigpools command.

  2. Test your applications on the updated nodes to ensure that they are working as expected.

  3. Unpause any other paused MCPs one-by-one and verify that your applications work.

In case of application failure

In case of a failure, such as your applications not working on the updated nodes, you can cordon and drain the nodes in the pool, which moves the application pods to other nodes to help maintain the quality-of-service for the applications. This first MCP should be no larger than the excess capacity.

Moving a node to the original machine config pool

In this canary rollout update process, after you have unpaused a custom machine config pool (MCP) and verified that the applications on the nodes associated with that MCP are working as expected, you should move the node back to its original MCP by removing the custom label you added to the node.

A node must have a role to be properly functioning in the cluster.

To move a node to its original MCP:

  1. Remove the custom label from the node.

    $ oc label node <node_name> node-role.kubernetes.io/<custom-label>-

    For example:

    $ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary-
    Example output
    node/ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz labeled

    The MCO moves the nodes back to the original MCP and reconciles the node to the MCP configuration.

  2. View the list of MCPs in the cluster and their current state:

    $oc get mcp
    NAME                CONFIG                                                   UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master              rendered-master-1203f157d053fd987c7cbd91e3fbc0ed         True      False      False      3              3                   3                     0                      61m
    workerpool-canary   rendered-mcp-noupdate-5ad4791166c468f3a35cd16e734c9028   True      False      False      0              0                   0                     0                      21m
    worker              rendered-worker-5ad4791166c468f3a35cd16e734c9028         True      False      False      3              3                   3                     0                      61m

    The node is removed from the custom MCP and moved back to the original MCP. It can take several minutes to update the machine counts. In this example, one node was moved from the removed workerpool-canary MCP to the `worker`MCP.

  3. Optional: Delete the custom MCP:

    $ oc delete mcp <mcp_name>