Automated In-place Upgrades | Upgrading Clusters

Overview
Running Upgrade Playbooks
- Upgrading to OpenShift Origin 1.1
- Upgrading to OpenShift Origin 1.1.z Releases
Updating Master and Node Certificates
- Node Certificates
- Master Certificates
Upgrading the Service Catalog
Upgrading the EFK Logging Stack
Upgrading Cluster Metrics
Special Considerations for Large-scale Upgrades
Special Considerations for Mixed Environments
Special Considerations When Using Containerized GlusterFS
Special considerations when using gcePD
Verifying the Upgrade

Overview

If you installed using the advanced installation and the inventory file that was used is available, you can use the upgrade playbook to automate the OpenShift cluster upgrade process.

The automated upgrade performs the following steps for you:

Applies the latest configuration.
Upgrades master and etcd components and restarts services.
Upgrades node components and restarts services.
Applies the latest cluster policies.
Updates the default router if one exists.
Updates the default registry if one exists.
Updates default image streams and InstantApp templates.

Ensure that you have met all prerequisites before proceeding with an upgrade. Failure to do so can result in a failed upgrade.
If you are using GlusterFS, see Special Considerations When Using Containerized GlusterFS before proceeding.
If you are using GCE Persistent Disk (gcePD), see Special Considerations When Using gcePD before proceeding.
The day before the upgrade, validate OKD storage migration to ensure potential issues are resolved prior to the outage window:
```
$ oc adm migrate storage --include=* --loglevel=2 --confirm --config /etc/origin/master/admin.kubeconfig
```

Running Ansible playbooks with the --tags or --check options is not supported by Red Hat.

Running Upgrade Playbooks

Ensure that you have the latest openshift-ansible code checked out:

# cd ~/openshift-ansible
# git pull https://github.com/openshift/openshift-ansible master

Then run one of the following upgrade playbooks utilizing the inventory file you used during the advanced installation. If your inventory file is located somewhere other than the default /etc/ansible/hosts, add the -i flag to specify the location.

Upgrading to OpenShift Origin 1.1

To upgrade from OpenShift Origin 1.0 to 1.1, run the following playbook:

# ansible-playbook \
    -i </path/to/inventory/file> \
    playbooks/byo/openshift-cluster/upgrades/v3_0_to_v3_1/upgrade.yml

The v3_0_to_v3_1 in the above path is a reference to the related OpenShift Enterprise versions, however it is also the correct playbook to use when upgrading from OpenShift Origin 1.0 to 1.1.

When the upgrade finishes, a recommendation will be printed to reboot all hosts. After rebooting, continue to Updating Master and Node Certificates.

Upgrading to OpenShift Origin 1.1.z Releases

To upgrade an existing OpenShift Origin 1.1 cluster to the latest 1.1.z release, run the following playbook:

# ansible-playbook \
    -i </path/to/inventory/file> \
    playbooks/byo/openshift-cluster/upgrades/v3_1_minor/upgrade.yml

The v3_1_minor in the above path is a reference to the related OpenShift Enterprise versions, however it is also the correct playbook to use when upgrading from OpenShift Origin 1.1 to the latest 1.1.z release.

When the upgrade finishes, a recommendation will be printed to reboot all hosts. After rebooting, continue to Verifying the Upgrade.

Updating Master and Node Certificates

The following steps may be required for any OpenShift cluster that was originally installed prior to the OpenShift Origin 1.0.8 release. This may include any and all updates from that version.

Node Certificates

With the 1.0.8 release, certificates for each of the kubelet nodes were updated to include the IP address of the node. Any node certificates generated before the 1.0.8 release may not contain the IP address of the node.

If a node is missing the IP address as part of its certificate, clients may refuse to connect to the kubelet endpoint. Usually this will result in errors regarding the certificate not containing an IP SAN.

In order to remedy this situation, you may need to manually update the certificates for your node.

Checking the Node’s Certificate

The following command can be used to determine which Subject Alternative Names (SANs) are present in the node’s serving certificate. In this example, the Subject Alternative Names are mynode, mynode.mydomain.com, and 1.2.3.4:

# openssl x509 -in /etc/origin/node/server.crt -text -noout | grep -A 1 "Subject Alternative Name"
X509v3 Subject Alternative Name:
DNS:mynode, DNS:mynode.mydomain.com, IP: 1.2.3.4

Ensure that the nodeIP value set in the /etc/origin/node/node-config.yaml file is present in the IP values from the Subject Alternative Names listed in the node’s serving certificate. If the nodeIP is not present, then it will need to be added to the node’s certificate.

If the nodeIP value is already contained within the Subject Alternative Names, then no further steps are required.

You will need to know the Subject Alternative Names and nodeIP value for the following steps.

Generating a New Node Certificate

If your current node certificate does not contain the proper IP address, then you must regenerate a new certificate for your node.

Node certificates will be regenerated on the master (or first master) and are then copied into place on node systems.

Create a temporary directory in which to perform the following steps on the first master listed in the Ansible host inventory file by default /etc/ansible/hosts:
```
# mkdir /tmp/node_certificate_update
# cd /tmp/node_certificate_update
```

Export the signing options:

# export signing_opts="--signer-cert=/etc/origin/master/ca.crt \
    --signer-key=/etc/origin/master/ca.key \
    --signer-serial=/etc/origin/master/ca.serial.txt"

Generate the new certificate:

# oc adm ca create-server-cert --cert=server.crt \
  --key=server.key $signing_opts \
  --hostnames=<existing_SANs>,<nodeIP>

For example, if the Subject Alternative Names from before were mynode, mynode.mydomain.com, and 1.2.3.4, and the nodeIP was 10.10.10.1, then you would need to run the following command:

# oc adm ca create-server-cert --cert=server.crt \
  --key=server.key $signing_opts \
  --hostnames=mynode,mynode.mydomain.com,1.2.3.4,10.10.10.1

Replace Node Serving Certificates

Back up the existing /etc/origin/node/server.crt and /etc/origin/node/server.key files for your node:

# mv /etc/origin/node/server.crt /etc/origin/node/server.crt.bak
# mv /etc/origin/node/server.key /etc/origin/node/server.key.bak

You must now copy the new server.crt and server.key created in the temporary directory during the previous step:

# mv /tmp/node_certificate_update/server.crt /etc/origin/node/server.crt
# mv /tmp/node_certificate_update/server.key /etc/origin/node/server.key

After you have replaced the node’s certificate, restart the node service:

# systemctl restart origin-node

Master Certificates

With the 1.0.8 release, certificates for each of the masters were updated to include all names that pods may use to communicate with masters. Any master certificates generated before the 1.0.8 release may not contain these additional service names.

Checking the Master’s Certificate

The following command can be used to determine which Subject Alternative Names (SANs) are present in the master’s serving certificate. In this example, the Subject Alternative Names are mymaster, mymaster.mydomain.com, and 1.2.3.4:

# openssl x509 -in /etc/origin/master/master.server.crt -text -noout | grep -A 1 "Subject Alternative Name"
X509v3 Subject Alternative Name:
DNS:mymaster, DNS:mymaster.mydomain.com, IP: 1.2.3.4

Ensure that the following entries are present in the Subject Alternative Names for the master’s serving certificate:

Entry	Example
Kubernetes service IP address	172.30.0.1
All master host names	master1.example.com
All master IP addresses	192.168.122.1
Public master host name in clustered environments	public-master.example.com
kubernetes
kubernetes.default
kubernetes.default.svc
kubernetes.default.svc.cluster.local
openshift
openshift.default
openshift.default.svc
openshift.default.svc.cluster.local

Entry

Example

Kubernetes service IP address

172.30.0.1

All master host names

master1.example.com

All master IP addresses

192.168.122.1

Public master host name in clustered environments

public-master.example.com

kubernetes

kubernetes.default

kubernetes.default.svc

kubernetes.default.svc.cluster.local

openshift

openshift.default

openshift.default.svc

openshift.default.svc.cluster.local

If these names are already contained within the Subject Alternative Names, then no further steps are required.

Generating a New Master Certificate

If your current master certificate does not contain all names from the list above, then you generate a new certificate for your mater. Perform the following steps on the first master listed in the Ansible host inventory file by default /etc/ansible/hosts:

Back up the existing /etc/origin/master/master.server.crt and /etc/origin/master/master.server.key files for your master:

# mv /etc/origin/master/master.server.crt /etc/origin/master/master.server.crt.bak
# mv /etc/origin/master/master.server.key /etc/origin/master/master.server.key.bak

Export the service names. These names will be used when generating the new certificate:

# export service_names="kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local,openshift,openshift.default,openshift.default.svc,openshift.default.svc.cluster.local"

You will need the first IP in the services subnet (the kubernetes service IP) as well as the values of masterIP, masterURL and publicMasterURL contained in the /etc/origin/master/master-config.yaml file for the following steps.

The kubernetes service IP can be obtained with:
```
# oc get svc/kubernetes --template='{{.spec.clusterIP}}'
```

Generate the new certificate:

# oc adm ca create-master-certs \
      --hostnames=<master_hostnames>,<master_IP_addresses>,<kubernetes_service_IP>,$service_names \ (1) (2) (3)
      --master=<internal_master_address> \ (4)
      --public-master=<public_master_address> \ (5)
      --cert-dir=/etc/origin/master/ \
      --overwrite=false

1	Adjust `<master_hostnames>` to match your master host name. In a clustered environment, add all master host names.
2	Adjust `<master_IP_addresses>` to match the value of `masterIP`. In a clustered environment, add all master IP addresses.
3	Adjust `<kubernetes_service_IP>` to the first IP in the kubernetes services subnet.
4	Adjust `<internal_master_address>` to match the value of `masterURL`.
5	Adjust `<public_master_address>` to match the value of `masterPublicURL`.

Restart master services. For single master deployments:
```
# systemctl restart origin-master-api origin-master-controllers
```
After the service restarts, the certificate update is complete.

Upgrading the Service Catalog

Starting with OKD 3.7, the service catalog, OpenShift Ansible broker, and template service broker are enabled and deployed by default for new cluster installations. However, they are not deployed by default during the upgrade from OKD 3.6 to 3.7, so you must run an individual component playbook separate post-upgrade.

Upgrading from the OKD 3.6 Technology Preview version of the service catalog and service brokers is not supported.

To upgrade to these features:

See the following three sections in the Advanced Installation topic and update your inventory file accordingly:

Run the following playbook:

# ansible-playbook -i </path/to/inventory/file> \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/service-catalog.yml

Upgrading the EFK Logging Stack

To upgrade an existing EFK logging stack deployment, you must use the provided /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml Ansible playbook. This is the playbook to use if you were deploying logging for the first time on an existing cluster, but is also used to upgrade existing logging deployments.

If you have not already done so, see Specifying Logging Ansible Variables in the Aggregating Container Logs topic and update your Ansible inventory file to at least set the following required variable within the [OSEv3:vars] section:
```
[OSEv3:vars]

openshift_logging_install_logging=true (1)
openshift_logging_image_version=<tag> (2)
```
1 Enables the ability to upgrade the logging stack.

2 Replace <tag> with v3.7.119 for the latest version.
Add any other openshift_logging_* variables that you want to specify to override the defaults, as described in Specifying Logging Ansible Variables.
When you have finished updating your inventory file, follow the instructions in Deploying the EFK Stack to run the openshift-logging.yml playbook and complete the logging deployment upgrade.

If your Fluentd DeploymentConfig and DaemonSet for the EFK components are already set with:

        image: <image_name>:<vX.Y>
        imagePullPolicy: IfNotPresent

The latest version <image_name> might not be pulled if there is already one with the same <image_name:vX.Y> stored locally on the node where the pod is being re-deployed. If so, manually change the DeploymentConfig and DaemonSet to imagePullPolicy: Always to make sure it is re-pulled.

Upgrading Cluster Metrics

To upgrade an existing cluster metrics deployment, you must use the provided /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml Ansible playbook. This is the playbook to use if you were deploying metrics for the first time on an existing cluster, but is also used to upgrade existing metrics deployments.

If you have not already done so, see Specifying Metrics Ansible Variables in the Enabling Cluster Metrics topic and update your Ansible inventory file to at least set the following required variables within the [OSEv3:vars] section:

[OSEv3:vars]

openshift_metrics_install_metrics=true (1)
openshift_metrics_image_version=<tag> (2)
openshift_metrics_hawkular_hostname=<fqdn> (3)
openshift_metrics_cassandra_storage_type=(emptydir|pv|dynamic) (4)

1	Enables the ability to upgrade the metrics deployment.
2	Replace `<tag>` with `v3.7.119` for the latest version.
3	Used for the Hawkular Metrics route. Should correspond to a fully qualified domain name.
4	Choose a type that is consistent with the previous deployment.

Add any other openshift_metrics_* variables that you want to specify to override the defaults, as described in Specifying Metrics Ansible Variables.
When you have finished updating your inventory file, follow the instructions in Deploying the Metrics Deployment to run the openshift_metrics.yml playbook and complete the metrics deployment upgrade.

Special Considerations for Large-scale Upgrades

For large-scale cluster upgrades, which involve at least 10 worker nodes and thousands of projects and pods, the API object storage migration should be performed prior to running the upgrade playbooks, and then again after the upgrade has successfully completed. Otherwise, the upgrade process will fail.

Refer to the Running the pre- and post- API server model object migration outside of the upgrade window section of the Recommendations for large-scale OpenShift upgrades for further guidance.

Special Considerations for Mixed Environments

Mixed environment upgrades (for example, those with Red Hat Enterprise Linux and Red Hat Enterprise Linux Atomic Host) require setting both openshift_pkg_version and openshift_image_tag. In mixed environments, if you only specify openshift_pkg_version, then that number is used for the packages for Red Hat Enterprise Linux and and the image for Red Hat Enterprise Linux Atomic Host.

Special Considerations When Using Containerized GlusterFS

When upgrading OKD, you must upgrade the set of nodes where GlusterFS pods are running.

Special consideration must be taken when upgrading these nodes, as drain and unschedule will not terminate and evacuate the GlusterFS pods because they are running as part of a daemonset.

There is also the potential for someone to run an upgrade on multiple nodes at the same time, which would lead to data availability issues if more than one was hosting GlusterFS pods.

Even if a serial upgrade is running, there is no guarantee sufficient time will be given for GlusterFS to complete all of its healing operations before GlusterFS on the next node is terminated. This could leave the cluster in a bad or unknown state. Therefore, the following procedure is recommended.

Upgrade the control plane (the master nodes and etcd nodes).

Upgrade standard infra nodes (router, registry, logging, and metrics).

If any of the nodes in those groups are running GlusterFS, perform step 4 of this procedure at the same time. GlusterFS nodes must be upgraded along with other nodes in their class (app versus infra), one at a time.

Upgrade standard nodes running application containers.

Upgrade the OKD nodes running GlusterFS one at a time.
1. Run oc get daemonset to verify the label found under NODE-SELECTOR. The default value is storagenode=glusterfs.
2. Remove the daemonset label from the node:
  $ oc label node <node_name> <daemonset_label>-
  This will cause the GlusterFS pod to terminate on that node.
3. Add an additional label (for example, type=upgrade) to the node you want to upgrade.
4. To run the upgrade playbook on the single node where you terminated GlusterFS, use -e openshift_upgrade_nodes_label="type=upgrade".
5. When the upgrade completes, relabel the node with the daemonset selector:
  $ oc label node <node_name> <daemonset_label>
6. Wait for the GlusterFS pod to respawn and appear.
7. oc rsh into the pod and verify all volumes are healed:
  $ oc rsh <GlusterFS_pod_name> $ for vol in `gluster volume list`; do gluster volume heal $vol info; done
  Ensure all of the volumes are healed and there are no outstanding tasks. The heal info command lists all pending entries for a given volume’s heal process. A volume is considered healed when Number of entries for that volume is 0.
8. Remove the upgrade label (for example, type=upgrade) and go to the next GlusterFS node.

Special considerations when using gcePD

Because the default gcePD storage provider uses an RWO (Read-Write Only) access mode, you cannot perform a rolling upgrade on the registry or scale the registry to multiple pods. Therefore, when upgrading OKD, you must specify the following environment variables in your Ansible inventory file:

[OSEv3:vars]

openshift_hosted_registry_storage_provider=gcs
openshift_hosted_registry_storage_gcs_bucket=bucket01
openshift_hosted_registry_storage_gcs_keyfile=test.key
openshift_hosted_registry_storage_gcs_rootdirectory=/registry

Verifying the Upgrade

To verify the upgrade:

Check that all nodes are marked as Ready:

# oc get nodes
NAME                        STATUS                     AGE
master.example.com          Ready,SchedulingDisabled   165d
node1.example.com           Ready                      165d
node2.example.com           Ready                      165d

Verify that you are running the expected versions of the docker-registry and router images, if deployed.

# oc get -n default dc/docker-registry -o json | grep \"image\"
    "image": "openshift/origin-docker-registry:v1.0.6",
# oc get -n default dc/router -o json | grep \"image\"
    "image": "openshift/origin-haproxy-router:v1.0.6",

If you upgraded from Origin 1.0 to Origin 1.1, verify in your old /etc/sysconfig/openshift-master and /etc/sysconfig/openshift-node files that any custom configuration is added to your new /etc/sysconfig/origin-master and /etc/sysconfig/origin-node files.

Use the diagnostics tool on the master to look for common issues:

# oc adm diagnostics
...
[Note] Summary of diagnostics execution:
[Note] Completed with no errors or warnings seen.

1	Enables the ability to upgrade the logging stack.
2	Replace `<tag>` with `v3.7.119` for the latest version.

Performing Automated In-place Cluster Upgrades