$ oc get csv -A
On bare-metal hardware, you often must update the firmware to take on important security fixes, take on new functionality, or maintain compatibility with the new release of OKD.
You are responsible for the firmware versions that you run in your clusters. Updating host firmware is not a part of the OKD update process. It is not recommended to update firmware in conjunction with the OKD version.
|
Hardware vendors advise that it is best to apply the latest certified firmware version for the specific hardware that you are running. For each different use case, always verify firmware updates in test environments before applying them in production. For example, workloads with high throughput requirements can be negatively affected outdated host firmware. You should thoroughly test new firmware updates to ensure that they work as expected with the current version of OKD. For best results, test the latest firmware version with the target OKD update version. |
Verify that all layered products run on the version of OKD that you are updating to before you begin the update. This generally includes all Operators.
+ . Verify the currently installed Operators in the cluster. For example, run the following command:
+
$ oc get csv -A
+ .Example output
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE
gitlab-operator-kubernetes.v0.17.2 GitLab 0.17.2 gitlab-operator-kubernetes.v0.17.1 Succeeded
openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded
Check that Operators that you install with OLM are compatible with the update version.
Operators that are installed with the Operator Lifecycle Manager (OLM) are not part of the standard cluster Operators set.
Use the Operator Update Information Checker to understand if you must update an Operator after each y-stream update or if you can wait until you have fully updated to the next EUS release.
|
You can also use the Operator Update Information Checker to see what versions of OKD are compatible with specific releases of an Operator. |
Check that Operators that you install outside of OLM are compatible with the update version.
For all OLM-installed Operators that are not directly supported by Red Hat, contact the Operator vendor to ensure release compatibility.
Some Operators are compatible with several releases of OKD.
See "Updating the worker nodes" for more information.
See "Updating all the OLM Operators" for information about updating an Operator after performing the first y-stream control plane update.
Prepare MachineConfigPool (MCP) node labels to group nodes together in groups of roughly 8 to 10 nodes.
With MCP groups, you can reboot groups of nodes independently from the rest of the cluster.
You use the MCP node labels to pause and unpause the set of nodes during the update process so that you can do the update and reboot at a time of your choosing.
Sometimes there are problems during the update. Often the problem is related to hardware failure or nodes needing to be reset. Using MCP node labels, you can update nodes in stages by pausing the update at critical moments, tracking paused and unpaused nodes as you proceed. When a problem occurs, you use the nodes that are in an unpaused state to ensure that there are enough nodes running to keep all applications pods running.
How you divide worker nodes into MCPs can vary depending on how many nodes are in the cluster or how many nodes you assign to a node role. By default, the two roles in a cluster are control plane and worker roles.
You can also move nodes between MCP groups if both groups have the same machine config, which is important if you have too many nodes in one large machine config pool. For more information about MCP groups, see Additional resources.
|
Larger clusters can have as many as 100 worker nodes.
No matter how many nodes there are in the cluster, keep each |
Consider a cluster with 15 worker nodes:
10 worker nodes are control plane nodes.
5 worker nodes are data plane nodes.
Split the control plane and data plane worker node roles into at least 2 MCP groups each. Having 2 MCP groups per role means that you can have one set of nodes that are not affected by the update.
Consider a cluster with 6 worker nodes:
Split the worker nodes into 3 MCP groups of 2 nodes each.
Upgrade one of the MCP groups. Allow the updated nodes to sit through a day to allow for verification of application compatibility before completing the update on the other 4 nodes.
|
The process and pace at which you unpause the MCP groups is determined by your applications and configuration. If your pod can handle being scheduled across nodes in a cluster, you can unpause several MCP groups at a time and set the |
Review the currently configured MachineConfigPool roles in the cluster.
+
. Get the currently configured mcp groups in the cluster:
+
$ oc get mcp
+ .Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-bere83 True False False 3 3 3 0 25d
worker rendered-worker-245c4f True False False 2 2 2 0 25d
Compare the list of mcp roles to list of nodes in the cluster:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 39d v1.27.15+6147456
worker-0 Ready worker 39d v1.27.15+6147456
worker-1 Ready worker 39d v1.27.15+6147456
|
When you apply an |
Determine how you want to separate the worker nodes into mcp groups.
Creating mcp groups is a 2-step process:
Add an mcp label to the nodes in the cluster
Apply an mcp CR to the cluster that organizes the nodes based on their labels
Label the nodes so that they can be put into mcp groups.
Run the following commands:
$ oc label node worker-0 node-role.kubernetes.io/mcp-1=
$ oc label node worker-1 node-role.kubernetes.io/mcp-2=
The mcp-1 and mcp-2 labels are applied to the nodes.
For example:
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 39d v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 39d v1.27.15+6147456
worker-0 Ready mcp-1,worker 39d v1.27.15+6147456
worker-1 Ready mcp-2,worker 39d v1.27.15+6147456
Create YAML custom resources (CRs) that apply the labels as mcp CRs in the cluster.
Save the following YAML in the mcps.yaml file:
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: mcp-2
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,mcp-2]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/mcp-2: ""
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: mcp-1
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,mcp-1]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/mcp-1: ""
Create the MachineConfigPool resources:
$ oc apply -f mcps.yaml
machineconfigpool.machineconfiguration.openshift.io/mcp-2 created
Monitor the MachineConfigPool resources as they are applied in the cluster.
After you apply the mcp resources, the nodes are added into the new machine config pools.
This takes a few minutes.
|
The nodes do not reboot while being added into the |
Check the status of the new mcp resources:
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-be3e83 True False False 3 3 3 0 25d
mcp-1 rendered-mcp-1-2f4c4f False True True 1 0 0 0 10s
mcp-2 rendered-mcp-2-2r4s1f False True True 1 0 0 0 10s
worker rendered-worker-23fc4f False True True 0 0 0 2 25d
Eventually, the resources are fully applied:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-be3e83 True False False 3 3 3 0 25d
mcp-1 rendered-mcp-1-2f4c4f True False False 1 1 1 0 7m33s
mcp-2 rendered-mcp-2-2r4s1f True False False 1 1 1 0 51s
worker rendered-worker-23fc4f True False False 0 0 0 0 25d
To update clusters in disconnected environments, you must update your offline image repository.
Before you update the cluster, perform basic checks and verifications to ensure that the cluster is ready for the update.
+ . Verify that there are no failed or in progress pods in the cluster by running the following command:
+
$ oc get pods -A | grep -E -vi 'complete|running'
+
|
You might have to run this command more than once if there are pods that are in a pending state. |
Verify that all nodes in the cluster are available:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 32d v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 32d v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 32d v1.27.15+6147456
worker-0 Ready mcp-1,worker 32d v1.27.15+6147456
worker-1 Ready mcp-2,worker 32d v1.27.15+6147456
Verify that all bare-metal nodes are provisioned and ready.
$ oc get bmh -n openshift-machine-api
NAME STATE CONSUMER ONLINE ERROR AGE
ctrl-plane-0 unmanaged cnf-58879-master-0 true 33d
ctrl-plane-1 unmanaged cnf-58879-master-1 true 33d
ctrl-plane-2 unmanaged cnf-58879-master-2 true 33d
worker-0 unmanaged cnf-58879-worker-0-45879 true 33d
worker-1 progressing cnf-58879-worker-0-dszsh false 1d
An error occurred while provisioning the worker-1 node.
+ * Verify that all cluster Operators are ready:
+
$ oc get co
+ .Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.14.34 True False False 17h
baremetal 4.14.34 True False False 32d
...
service-ca 4.14.34 True False False 32d
storage 4.14.34 True False False 32d