Prerequisites

If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see BZ#1925061.

About the OKD update service

The OKD update service is the hosted service that provides over-the-air updates to both OKD and Fedora CoreOS (FCOS). It provides a graph, or diagram that contain vertices and the edges that connect them, of component Operators. The edges in the graph show which versions you can safely update to, and the vertices are update payloads that specify the intended state of the managed cluster components.

The Cluster Version Operator (CVO) in your cluster checks with the OKD update service to see the valid updates and update paths based on current component versions and information in the graph. When you request an update, the OKD CVO uses the release image for that update to upgrade your cluster. The release artifacts are hosted in Quay as container images.

To allow the OKD update service to provide only compatible updates, a release verification pipeline exists to drive automation. Each release artifact is verified for compatibility with supported cloud platforms and system architectures as well as other component packages. After the pipeline confirms the suitability of a release, the OKD update service notifies you that it is available.

Because the update service displays all valid updates, you must not force an update to a version that the update service does not display.

During continuous update mode, two controllers run. One continuously updates the payload manifests, applies them to the cluster, and outputs the status of the controlled rollout of the Operators, whether they are available, upgrading, or failed. The second controller polls the OKD update service to determine if updates are available.

Reverting your cluster to a previous version, or a rollback, is not supported. Only upgrading to a newer version is supported. If your upgrade fails, contact Red Hat support.

During the upgrade process, the Machine Config Operator (MCO) applies the new configuration to your cluster machines. It cordons the number of nodes that is specified by the maxUnavailable field on the machine configuration pool and marks them as unavailable. By default, this value is set to 1. It then applies the new configuration and reboots the machine. If you use Red Hat Enterprise Linux (RHEL) machines as workers, the MCO does not update the kubelet on these machines because you must update the OpenShift API on them first. Because the specification for the new version is applied to the old kubelet, the RHEL machine cannot return to the Ready state. You cannot complete the update until the machines are available. However, the maximum number of nodes that are unavailable is set to ensure that normal cluster operations are likely to continue with that number of machines out of service.

OKD upgrade channels and releases

In OKD 4.1, Red Hat introduced the concept of channels for recommending the appropriate release versions for cluster upgrades. By controlling the pace of upgrades, these upgrade channels allow you to choose an upgrade strategy. Upgrade channels are tied to a minor version of OKD. For instance, OKD 4.8 upgrade channels recommend upgrades to 4.8 and upgrades within 4.8. They also recommend upgrades within 4.7 and from 4.7 to 4.8, to allow clusters on 4.7 to eventually upgrade to 4.8. They do not recommend upgrades to 4.9 or later releases. This strategy ensures that administrators explicitly decide to upgrade to the next minor version of OKD.

Upgrade channels control only release selection and do not impact the version of the cluster that you install; the openshift-install binary file for a specific version of OKD always installs that version.

OKD 4 offers the following upgrade channel:

  • stable-4

stable-4 channel

Releases are added to the stable-4 channel after passing all tests.

You can use the stable-4 channel to upgrade from a previous minor version of OKD.

Upgrade version paths

OKD maintains an upgrade recommendation service that understands the version of OKD you have installed as well as the path to take within the channel you choose to get you to the next release.

You can imagine seeing the following in the stable-4 channel:

  • 4.0

  • 4.1

  • 4.3

  • 4.4

The service recommends only upgrades that have been tested and have no serious issues. It will not suggest updating to a version of OKD that contains known vulnerabilities. For example, if your cluster is on 4.1 and OKD suggests 4.4, then it is safe for you to update from 4.1 to 4.4. Do not rely on consecutive patch numbers. In this example, 4.2 is not and never was available in the channel.

The presence of an update recommendation in the stable-4 channel is a declaration that the update is fully supported while it is in the channel. While releases will never be removed from the channel, update recommendations that exhibit serious issues will be removed from the channel. Updates initiated after the update recommendation has been removed might not be supported.

Restricted network clusters

If you manage the container images for your OKD clusters yourself, you must consult the Red Hat errata that is associated with product releases and note any comments that impact upgrades. During upgrade, the user interface might warn you about switching between these versions, so you must ensure that you selected an appropriate version before you bypass those warnings.

Updating a cluster by using the web console

If updates are available, you can update your cluster from the web console.

You can find information about available OKD advisories and updates in the errata section of the Customer Portal.

Prerequisites
  • Have access to the web console as a user with admin privileges.

Procedure
  1. From the web console, click Administration > Cluster Settings and review the contents of the Details tab.

  2. For production clusters, ensure that the Channel is set to the correct channel for the version that you want to update to, such as stable-4.

    • If the Update Status is not Updates Available, you cannot upgrade your cluster.

    • Select Channel indicates the cluster version that your cluster is running or is updating to.

  3. Select a version to update to, and click Update. The Update Status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.