$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":true}}'
Before you start the cluster update, you must pause worker nodes, back up the etcd database, and do a final cluster health check before proceeding.
You must pause the worker nodes before you proceed with the update.
In the following example, there are 2 mcp
groups, mcp-1
and mcp-2
.
You patch the spec.paused
field to true
for each of these MachineConfigPool
groups.
Patch the mcp
CRs to pause the nodes and drain and remove the pods from those nodes by running the following command:
$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":true}}'
$ oc patch mcp/mcp-2 --type merge --patch '{"spec":{"paused":true}}'
Get the status of the paused mcp
groups:
$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP Paused
--- ------
master false
mcp-1 true
mcp-2 true
The default control plane and worker |
You must backup the etcd database before you proceed with the update.
Follow these steps to back up etcd data by creating an etcd snapshot and backing up the resources for the static pods. This backup can be saved and used at a later time if you need to restore etcd.
Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster. |
You have access to the cluster as a user with the cluster-admin
role.
You have checked whether the cluster-wide proxy is enabled.
You can check whether the proxy is enabled by reviewing the output of |
Start a debug session as root for a control plane node:
$ oc debug --as-root node/<node_name>
Change your root directory to /host
in the debug shell:
sh-4.4# chroot /host
If the cluster-wide proxy is enabled, export the NO_PROXY
, HTTP_PROXY
, and HTTPS_PROXY
environment variables by running the following commands:
$ export HTTP_PROXY=http://<your_proxy.example.com>:8080
$ export HTTPS_PROXY=https://<your_proxy.example.com>:8080
$ export NO_PROXY=<example.com>
Run the cluster-backup.sh
script in the debug shell and pass in the location to save the backup to.
The |
sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup
found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6
found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7
found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6
found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3
ede95fe6b88b87ba86a03c15e669fb4aa5bf0991c180d3c6895ce72eaade54a1
etcdctl version: 3.4.14
API version: 3.4
{"level":"info","ts":1624647639.0188997,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db.part"}
{"level":"info","ts":"2021-06-25T19:00:39.030Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1624647639.0301006,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.0.5:2379"}
{"level":"info","ts":"2021-06-25T19:00:40.215Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1624647640.6032252,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.0.5:2379","size":"114 MB","took":1.584090459}
{"level":"info","ts":1624647640.6047094,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/assets/backup/snapshot_2021-06-25_190035.db"}
Snapshot saved at /home/core/assets/backup/snapshot_2021-06-25_190035.db
{"hash":3866667823,"revision":31407,"totalKey":12828,"totalSize":114446336}
snapshot db and kube resources are successfully saved to /home/core/assets/backup
In this example, two files are created in the /home/core/assets/backup/
directory on the control plane host:
snapshot_<datetimestamp>.db
: This file is the etcd snapshot. The cluster-backup.sh
script confirms its validity.
static_kuberesources_<datetimestamp>.tar.gz
: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.
If etcd encryption is enabled, it is recommended to store this second file separately from the etcd snapshot for security reasons. However, this file is required to restore from the etcd snapshot. Keep in mind that etcd encryption only encrypts values, not keys. This means that resource types, namespaces, and object names are unencrypted. |
Follow these steps to create a single etcd backup by creating and applying a custom resource (CR).
You have access to the cluster as a user with the cluster-admin
role.
You have access to the OpenShift CLI (oc
).
If dynamically-provisioned storage is available, complete the following steps to create a single automated etcd backup:
Create a persistent volume claim (PVC) named etcd-backup-pvc.yaml
with contents such as the following example:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: etcd-backup-pvc
namespace: openshift-etcd
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi (1)
volumeMode: Filesystem
1 | The amount of storage available to the PVC. Adjust this value for your requirements. |
Apply the PVC by running the following command:
$ oc apply -f etcd-backup-pvc.yaml
Verify the creation of the PVC by running the following command:
$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
etcd-backup-pvc Bound 51s
Dynamic PVCs stay in the |
Create a CR file named etcd-single-backup.yaml
with contents such as the following example:
apiVersion: operator.openshift.io/v1alpha1
kind: EtcdBackup
metadata:
name: etcd-single-backup
namespace: openshift-etcd
spec:
pvcName: etcd-backup-pvc (1)
1 | The name of the PVC to save the backup to. Adjust this value according to your environment. |
Apply the CR to start a single backup:
$ oc apply -f etcd-single-backup.yaml
If dynamically-provisioned storage is not available, complete the following steps to create a single automated etcd backup:
Create a StorageClass
CR file named etcd-backup-local-storage.yaml
with the following contents:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: etcd-backup-local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
Apply the StorageClass
CR by running the following command:
$ oc apply -f etcd-backup-local-storage.yaml
Create a PV named etcd-backup-pv-fs.yaml
with contents such as the following example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: etcd-backup-pv-fs
spec:
capacity:
storage: 100Gi (1)
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: etcd-backup-local-storage
local:
path: /mnt
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- <example_master_node> (2)
1 | The amount of storage available to the PV. Adjust this value for your requirements. |
2 | Replace this value with the node to attach this PV to. |
Verify the creation of the PV by running the following command:
$ oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
etcd-backup-pv-fs 100Gi RWO Retain Available etcd-backup-local-storage 10s
Create a PVC named etcd-backup-pvc.yaml
with contents such as the following example:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: etcd-backup-pvc
namespace: openshift-etcd
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi (1)
1 | The amount of storage available to the PVC. Adjust this value for your requirements. |
Apply the PVC by running the following command:
$ oc apply -f etcd-backup-pvc.yaml
Create a CR file named etcd-single-backup.yaml
with contents such as the following example:
apiVersion: operator.openshift.io/v1alpha1
kind: EtcdBackup
metadata:
name: etcd-single-backup
namespace: openshift-etcd
spec:
pvcName: etcd-backup-pvc (1)
1 | The name of the persistent volume claim (PVC) to save the backup to. Adjust this value according to your environment. |
Apply the CR to start a single backup:
$ oc apply -f etcd-single-backup.yaml
You should check the cluster health often during the update. Check for the node status, cluster Operators status and failed pods.
Check the status of the cluster Operators by running the following command:
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.14.34 True False False 4d22h
baremetal 4.14.34 True False False 4d22h
cloud-controller-manager 4.14.34 True False False 4d23h
cloud-credential 4.14.34 True False False 4d23h
cluster-autoscaler 4.14.34 True False False 4d22h
config-operator 4.14.34 True False False 4d22h
console 4.14.34 True False False 4d22h
...
service-ca 4.14.34 True False False 4d22h
storage 4.14.34 True False False 4d22h
Check the status of the cluster nodes:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-0 Ready control-plane,master 4d22h v1.27.15+6147456
ctrl-plane-1 Ready control-plane,master 4d22h v1.27.15+6147456
ctrl-plane-2 Ready control-plane,master 4d22h v1.27.15+6147456
worker-0 Ready mcp-1,worker 4d22h v1.27.15+6147456
worker-1 Ready mcp-2,worker 4d22h v1.27.15+6147456
Check that there are no in-progress or failed pods. There should be no pods returned when you run the following command.
$ oc get po -A | grep -E -iv 'running|complete'