...
spec:
clusterID: db93436d-7b05-42cc-b856-43e11ad2d31a
upstream: '<update-server-url>' (1)
...
You can perform minor version and patch updates on an OKD cluster by using the web console.
Use the web console or |
Before updating, consider the following:
You have recently backed up etcd.
In PodDisruptionBudget
, if minAvailable
is set to 1
, the nodes are drained to apply pending machine configs that might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and the PodDisruptionBudget
field can prevent the node drain.
You might need to update the cloud provider resources for the new release if your cluster uses manually maintained credentials.
You must review administrator acknowledgement requests, take any recommended actions, and provide the acknowledgement when you are ready.
You can perform a partial update by updating the worker or custom pool nodes to accommodate the time it takes to update. You can pause and resume within the progress bar of each pool.
|
Changing the update server is optional.
You have access to the cluster with cluster-admin
privileges.
You have access to the OKD web console.
Navigate to Administration → Cluster Settings, click version.
Click the YAML tab and then edit the upstream
parameter value:
...
spec:
clusterID: db93436d-7b05-42cc-b856-43e11ad2d31a
upstream: '<update-server-url>' (1)
...
1 | The <update-server-url> variable specifies the URL for the update server. |
The default upstream
is https://api.openshift.com/api/upgrades_info/v1/graph
.
Click Save.
During the update process, nodes in the cluster might become temporarily unavailable. In the case of worker nodes, the machine health check might identify such nodes as unhealthy and reboot them. To avoid rebooting such nodes, pause all the MachineHealthCheck
resources before updating the cluster.
You have access to the cluster with cluster-admin
privileges.
You have access to the OKD web console.
Log in to the OKD web console.
Navigate to Compute → MachineHealthChecks.
To pause the machine health checks, add the cluster.x-k8s.io/paused=""
annotation to each MachineHealthCheck
resource. For example, to add the annotation to the machine-api-termination-handler
resource, complete the following steps:
Click the Options menu next to the machine-api-termination-handler
and click Edit annotations.
In the Edit annotations dialog, click Add more.
In the Key and Value fields, add cluster.x-k8s.io/paused
and ""
values, respectively, and click Save.
If updates are available, you can update your cluster from the web console.
You can find information about available OKD advisories and updates in the errata section of the Customer Portal.
Have access to the web console as a user with cluster-admin
privileges.
You have access to the OKD web console.
Pause all MachineHealthCheck
resources.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing a canary rollout update strategy.
Your Fedora7 workers are replaced with Fedora8 or FCOS workers. Red Hat does not support in-place Fedora7 to Fedora8 updates for Fedora workers; those hosts must be replaced with a clean operating system install.
From the web console, click Administration → Cluster Settings and review the contents of the Details tab.
For production clusters, ensure that the Channel is set to stable-4
.
When you are ready to move to the next minor version, choose the channel that corresponds to that minor version. The sooner the update channel is declared, the more effectively the cluster can recommend update paths to your target version. The cluster might take some time to evaluate all the possible updates that are available and offer the best update recommendations to choose from. Update recommendations can change over time, as they are based on what update options are available at the time. If you cannot see an update path to your target minor version, keep updating your cluster to the latest patch release for your current version until the next minor version is available in the path. |
If the Update status is not Updates available, you cannot update your cluster.
Select channel indicates the cluster version that your cluster is running or is updating to.
Select a version to update to, and click Save.
The Input channel Update status changes to Update to <product-version> in progress, and you can review the progress of the cluster update by watching the progress bars for the Operators and nodes.
If you are updating your cluster to the next minor version, for example from version 4.10 to 4.11, confirm that your nodes are updated before deploying workloads that rely on a new feature. Any pools with worker nodes that are not yet updated are displayed on the Cluster Settings page. |
After the update completes and the Cluster Version Operator refreshes the available updates, check if more updates are available in your current channel.
If updates are available, continue to perform updates in the current channel until you can no longer update.
If no updates are available, change the Channel to the stable-*
channel for the next minor version, and update to the version that you want in that channel.
You might need to perform several intermediate updates until you reach the version that you want.
You can view and assess the risks associated with particular updates with conditional updates.
You have access to the cluster with cluster-admin
privileges.
You have access to the OKD web console.
Pause all MachineHealthCheck
resources.
You have updated all Operators previously installed through Operator Lifecycle Manager (OLM) to a version that is compatible with your target release. Updating the Operators ensures they have a valid update path when the default OperatorHub catalogs switch from the current minor version to the next during a cluster update. See "Updating installed Operators" in the "Additional resources" section for more information on how to check compatibility and, if necessary, update the installed Operators.
Your machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. You can pause the MCPs if you are performing an advanced update strategy, such as a canary rollout, an EUS update, or a control-plane update.
From the web console, click Administration → Cluster settings page and review the contents of the Details tab.
You can enable the Include versions with known issues
feature in the Select new version dropdown of the Update cluster modal to populate the dropdown list with conditional updates.
If a version with known issues is selected, more information is provided with potential risks that are associated with the version. |
Review the notification detailing the potential risks to updating.
In some specific use cases, you might want a more controlled update process where you do not want specific nodes updated concurrently with the rest of the cluster. These use cases include, but are not limited to:
You have mission-critical applications that you do not want unavailable during the update. You can slowly test the applications on your nodes in small batches after the update.
You have a small maintenance window that does not allow the time for all nodes to be updated, or you have multiple maintenance windows.
The rolling update process is not a typical update workflow. With larger clusters, it can be a time-consuming process that requires you execute multiple commands. This complexity can result in errors that can affect the entire cluster. It is recommended that you carefully consider whether your organization wants to use a rolling update and carefully plan the implementation of the process before you start.
The rolling update process described in this topic involves:
Creating one or more custom machine config pools (MCPs).
Labeling each node that you do not want to update immediately to move those nodes to the custom MCPs.
Pausing those custom MCPs, which prevents updates to those nodes.
Performing the cluster update.
Unpausing one custom MCP, which triggers the update on those nodes.
Testing the applications on those nodes to make sure the applications work as expected on those newly-updated nodes.
Optionally removing the custom labels from the remaining nodes in small batches and testing the applications on those nodes.
Pausing an MCP should be done with careful consideration and for short periods of time only. |
If you want to use the canary rollout update process, see Performing a canary rollout update.
You can update, or upgrade, a single-node OKD cluster by using either the console or CLI.
However, note the following limitations:
The prerequisite to pause the MachineHealthCheck
resources is not required because there is no other node to perform the health check.
Restoring a single-node OKD cluster using an etcd backup is not officially supported. However, it is good practice to perform the etcd backup in case your update fails. If your control plane is healthy, you might be able to restore your cluster to a previous state by using the backup.
Updating a single-node OKD cluster requires downtime and can include an automatic reboot. The amount of downtime depends on the update payload, as described in the following scenarios:
If the update payload contains an operating system update, which requires a reboot, the downtime is significant and impacts cluster management and user workloads.
If the update contains machine configuration changes that do not require a reboot, the downtime is less, and the impact on the cluster management and user workloads is lessened. In this case, the node draining step is skipped with single-node OKD because there is no other node in the cluster to reschedule the workloads to.
If the update payload does not contain an operating system update or machine configuration changes, a short API outage occurs and resolves quickly.
There are conditions, such as bugs in an updated package, that can cause the single node to not restart after a reboot. In this case, the update does not rollback automatically. |