$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-<update_version_from>-kube-<kube_api_version>-api-removals-in-<update_version_to>":"true"}}' --type=merge
Follow these steps to perform the y-stream cluster update and monitor the update through to completion. Completing a y-stream update is more straightforward than a Control Plane Only update.
When you update to all versions from 4.11 and later, you must manually acknowledge that the update can continue.
Before you acknowledge the update, verify that there are no Kubernetes APIs in use that are removed in the version you are updating to. For example, in OKD 4.17, there are no API removals. See "Kubernetes API removals" for more information. |
Run the following command:
$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-<update_version_from>-kube-<kube_api_version>-api-removals-in-<update_version_to>":"true"}}' --type=merge
where:
Is the cluster version you are moving from, for example, 4.14
.
Is kube API version, for example, 1.28
.
Is the cluster version you are moving to, for example, 4.15
.
Verify the update. Run the following command:
$ oc get configmap admin-acks -n openshift-config -o json | jq .data
{
"ack-4.14-kube-1.28-api-removals-in-4.15": "true",
"ack-4.15-kube-1.29-api-removals-in-4.16": "true"
}
In this example, the cluster is updated from version 4.14 to 4.15, and then from 4.15 to 4.16 in a Control Plane Only update. |
When updating from one y-stream release to the next, you must ensure that the intermediate z-stream releases are also compatible.
You can verify that you are updating to a viable release by running the |
Start the update:
$ oc adm upgrade --to=4.15.33
|
Requested update to 4.15.33 (1)
1 | The Requested update value changes depending on your particular update. |
You should check the cluster health often during the update. Check for the node status, cluster Operators status and failed pods.
Monitor the cluster update. For example, to monitor the cluster update from version 4.14 to 4.15, run the following command:
$ watch "oc get clusterversion; echo; oc get co | head -1; oc get co | grep 4.14; oc get co | grep 4.15; echo; oc get no; echo; oc get po -A | grep -E -iv 'running|complete'"
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.34 True True 4m6s Working towards 4.15.33: 111 of 873 done (12% complete), waiting on kube-apiserver
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.14.34 True False False 4d22h
baremetal 4.14.34 True False False 4d23h
cloud-controller-manager 4.14.34 True False False 4d23h
cloud-credential 4.14.34 True False False 4d23h
cluster-autoscaler 4.14.34 True False False 4d23h
console 4.14.34 True False False 4d22h
...
storage 4.14.34 True False False 4d23h
config-operator 4.15.33 True False False 4d23h
etcd 4.15.33 True False False 4d23h
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 4d23h v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 4d23h v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 4d23h v1.27.15+6147456
worker-0 Ready mcp-1,worker 4d23h v1.27.15+6147456
worker-1 Ready mcp-2,worker 4d23h v1.27.15+6147456
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-marketplace redhat-marketplace-rf86t 0/1 ContainerCreating 0 0s
During the update the watch
command cycles through one or several of the cluster Operators at a time, providing a status of the Operator update in the MESSAGE
column.
When the cluster Operators update process is complete, each control plane nodes is rebooted, one at a time.
During this part of the update, messages are reported that state cluster Operators are being updated again or are in a degraded state. This is because the control plane node is offline while it reboots nodes. |
As soon as the last control plane node reboot is complete, the cluster version is displayed as updated.
When the control plane update is complete a message such as the following is displayed. This example shows an update completed to the intermediate y-stream release.
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.15.33 True False 28m Cluster version is 4.15.33
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.15.33 True False False 5d
baremetal 4.15.33 True False False 5d
cloud-controller-manager 4.15.33 True False False 5d1h
cloud-credential 4.15.33 True False False 5d1h
cluster-autoscaler 4.15.33 True False False 5d
config-operator 4.15.33 True False False 5d
console 4.15.33 True False False 5d
...
service-ca 4.15.33 True False False 5d
storage 4.15.33 True False False 5d
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 5d v1.28.13+2ca1a23
ctrl-plane-1 Ready control-plane,master 5d v1.28.13+2ca1a23
ctrl-plane-2 Ready control-plane,master 5d v1.28.13+2ca1a23
worker-0 Ready mcp-1,worker 5d v1.28.13+2ca1a23
worker-1 Ready mcp-2,worker 5d v1.28.13+2ca1a23
In telco environments, software needs to vetted before it is loaded onto a production cluster. Production clusters are also configured in a disconnected network, which means that they are not always directly connected to the internet. Because the clusters are in a disconnected network, the OpenShift Operators are configured for manual update during installation so that new versions can be managed on a cluster-by-cluster basis. Perform the following procedure to move the Operators to the newer versions.
Check to see which Operators need to be updated:
$ oc get installplan -A | grep -E 'APPROVED|false'
NAMESPACE NAME CSV APPROVAL APPROVED
metallb-system install-nwjnh metallb-operator.v4.16.0-202409202304 Manual false
openshift-nmstate install-5r7wr kubernetes-nmstate-operator.4.16.0-202409251605 Manual false
Patch the InstallPlan
resources for those Operators:
$ oc patch installplan -n metallb-system install-nwjnh --type merge --patch \
'{"spec":{"approved":true}}'
installplan.operators.coreos.com/install-nwjnh patched
Monitor the namespace by running the following command:
$ oc get all -n metallb-system
NAME READY STATUS RESTARTS AGE
pod/metallb-operator-controller-manager-69b5f884c-8bp22 0/1 ContainerCreating 0 4s
pod/metallb-operator-controller-manager-77895bdb46-bqjdx 1/1 Running 0 4m1s
pod/metallb-operator-webhook-server-5d9b968896-vnbhk 0/1 ContainerCreating 0 4s
pod/metallb-operator-webhook-server-d76f9c6c8-57r4w 1/1 Running 0 4m1s
...
NAME DESIRED CURRENT READY AGE
replicaset.apps/metallb-operator-controller-manager-69b5f884c 1 1 0 4s
replicaset.apps/metallb-operator-controller-manager-77895bdb46 1 1 1 4m1s
replicaset.apps/metallb-operator-controller-manager-99b76f88 0 0 0 4m40s
replicaset.apps/metallb-operator-webhook-server-5d9b968896 1 1 0 4s
replicaset.apps/metallb-operator-webhook-server-6f7dbfdb88 0 0 0 4m40s
replicaset.apps/metallb-operator-webhook-server-d76f9c6c8 1 1 1 4m1s
When the update is complete, the required pods should be in a Running
state, and the required ReplicaSet
resources should be ready:
NAME READY STATUS RESTARTS AGE
pod/metallb-operator-controller-manager-69b5f884c-8bp22 1/1 Running 0 25s
pod/metallb-operator-webhook-server-5d9b968896-vnbhk 1/1 Running 0 25s
...
NAME DESIRED CURRENT READY AGE
replicaset.apps/metallb-operator-controller-manager-69b5f884c 1 1 1 25s
replicaset.apps/metallb-operator-controller-manager-77895bdb46 0 0 0 4m22s
replicaset.apps/metallb-operator-webhook-server-5d9b968896 1 1 1 25s
replicaset.apps/metallb-operator-webhook-server-d76f9c6c8 0 0 0 4m22s
Verify that the Operators do not need to be updated for a second time:
$ oc get installplan -A | grep -E 'APPROVED|false'
There should be no output returned.
Sometimes you have to approve an update twice because some Operators have interim z-stream release versions that need to be installed before the final version. |
You upgrade the worker nodes after you have updated the control plane by unpausing the relevant mcp
groups you created.
Unpausing the mcp
group starts the upgrade process for the worker nodes in that group.
Each of the worker nodes in the cluster reboot to upgrade to the new EUS, y-stream or z-stream version as required.
In the case of Control Plane Only upgrades note that when a worker node is updated it will only require one reboot and will jump <y+2>-release versions. This is a feature that was added to decrease the amount of time that it takes to upgrade large bare-metal clusters.
This is a potential holding point. You can have a cluster version that is fully supported to run in production with the control plane that is updated to a new EUS release while the worker nodes are at a <y-2>-release. This allows large clusters to upgrade in steps across several maintenance windows. |
You can check how many nodes are managed in an mcp
group.
Run the following command to get the list of mcp
groups:
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-c9a52144456dbff9c9af9c5a37d1b614 True False False 3 3 3 0 36d
mcp-1 rendered-mcp-1-07fe50b9ad51fae43ed212e84e1dcc8e False False False 1 0 0 0 47h
mcp-2 rendered-mcp-2-07fe50b9ad51fae43ed212e84e1dcc8e False False False 1 0 0 0 47h
worker rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0 True False False 0 0 0 0 36d
You decide how many |
Get the list of nodes in the cluster:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 5d8h v1.29.8+f10c92d
ctrl-plane-1 Ready control-plane,master 5d8h v1.29.8+f10c92d
ctrl-plane-2 Ready control-plane,master 5d8h v1.29.8+f10c92d
worker-0 Ready mcp-1,worker 5d8h v1.27.15+6147456
worker-1 Ready mcp-2,worker 5d8h v1.27.15+6147456
Confirm the MachineConfigPool
groups that are paused:
$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP Paused
--- ------
master false
mcp-1 true
mcp-2 true
Each |
Unpause the required mcp
group to begin the upgrade:
$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":false}}'
machineconfigpool.machineconfiguration.openshift.io/mcp-1 patched
Confirm that the required mcp
group is unpaused:
$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP Paused
--- ------
master false
mcp-1 false
mcp-2 true
As each mcp
group is upgraded, continue to unpause and upgrade the remaining nodes.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 5d8h v1.29.8+f10c92d
ctrl-plane-1 Ready control-plane,master 5d8h v1.29.8+f10c92d
ctrl-plane-2 Ready control-plane,master 5d8h v1.29.8+f10c92d
worker-0 Ready mcp-1,worker 5d8h v1.29.8+f10c92d
worker-1 NotReady,SchedulingDisabled mcp-2,worker 5d8h v1.27.15+6147456
Run the following commands after updating the cluster to verify that the cluster is back up and running.
Check the cluster version by running the following command:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.14 True False 4h38m Cluster version is 4.16.14
This should return the new cluster version and the PROGRESSING
column should return False
.
Check that all nodes are ready:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 5d9h v1.29.8+f10c92d
ctrl-plane-1 Ready control-plane,master 5d9h v1.29.8+f10c92d
ctrl-plane-2 Ready control-plane,master 5d9h v1.29.8+f10c92d
worker-0 Ready mcp-1,worker 5d9h v1.29.8+f10c92d
worker-1 Ready mcp-2,worker 5d9h v1.29.8+f10c92d
All nodes in the cluster should be in a Ready
status and running the same version.
Check that there are no paused mcp
resources in the cluster:
$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP Paused
--- ------
master false
mcp-1 false
mcp-2 false
Check that all cluster Operators are available:
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.16.14 True False False 5d9h
baremetal 4.16.14 True False False 5d9h
cloud-controller-manager 4.16.14 True False False 5d10h
cloud-credential 4.16.14 True False False 5d10h
cluster-autoscaler 4.16.14 True False False 5d9h
config-operator 4.16.14 True False False 5d9h
console 4.16.14 True False False 5d9h
control-plane-machine-set 4.16.14 True False False 5d9h
csi-snapshot-controller 4.16.14 True False False 5d9h
dns 4.16.14 True False False 5d9h
etcd 4.16.14 True False False 5d9h
image-registry 4.16.14 True False False 85m
ingress 4.16.14 True False False 5d9h
insights 4.16.14 True False False 5d9h
kube-apiserver 4.16.14 True False False 5d9h
kube-controller-manager 4.16.14 True False False 5d9h
kube-scheduler 4.16.14 True False False 5d9h
kube-storage-version-migrator 4.16.14 True False False 4h48m
machine-api 4.16.14 True False False 5d9h
machine-approver 4.16.14 True False False 5d9h
machine-config 4.16.14 True False False 5d9h
marketplace 4.16.14 True False False 5d9h
monitoring 4.16.14 True False False 5d9h
network 4.16.14 True False False 5d9h
node-tuning 4.16.14 True False False 5d7h
openshift-apiserver 4.16.14 True False False 5d9h
openshift-controller-manager 4.16.14 True False False 5d9h
openshift-samples 4.16.14 True False False 5h24m
operator-lifecycle-manager 4.16.14 True False False 5d9h
operator-lifecycle-manager-catalog 4.16.14 True False False 5d9h
operator-lifecycle-manager-packageserver 4.16.14 True False False 5d9h
service-ca 4.16.14 True False False 5d9h
storage 4.16.14 True False False 5d9h
All cluster Operators should report True
in the AVAILABLE
column.
Check that all pods are healthy:
$ oc get po -A | grep -E -iv 'complete|running'
This should not return any pods.
You might see a few pods still moving after the update. Watch this for a while to make sure all pods are cleared. |