Understanding the image-based upgrade for single-node OpenShift clusters - Image-based upgrade for single-node OpenShift clusters | Edge computing

Stages of the image-based upgrade
Guidelines for the image-based upgrade

From OKD 4.14.13, the Lifecycle Agent provides you with an alternative way to upgrade the platform version of a single-node OpenShift cluster. The image-based upgrade is faster than the standard upgrade method and allows you to directly upgrade from OKD <4.y> to <4.y+2>, and <4.y.z> to <4.y.z+n>.

This upgrade method utilizes a generated OCI image from a dedicated seed cluster that is installed on the target single-node OpenShift cluster as a new ostree stateroot. A seed cluster is a single-node OpenShift cluster deployed with the target OKD version, Day 2 Operators, and configurations that are common to all target clusters.

You can use the seed image, which is generated from the seed cluster, to upgrade the platform version on any single-node OpenShift cluster that has the same combination of hardware, Day 2 Operators, and cluster configuration as the seed cluster.

The image-based upgrade uses custom images that are specific to the hardware platform that the clusters are running on. Each different hardware platform requires a separate seed image.

The Lifecycle Agent uses two custom resources (CRs) on the participating clusters to orchestrate the upgrade:

On the seed cluster, the SeedGenerator CR allows for the seed image generation. This CR specifies the repository to push the seed image to.
On the target cluster, the ImageBasedUpgrade CR specifies the seed image for the upgrade of the target cluster and the backup configurations for your workloads.

Example SeedGenerator CR

apiVersion: lca.openshift.io/v1
kind: SeedGenerator
metadata:
  name: seedimage
spec:
  seedImage: <seed_image>

Example ImageBasedUpgrade CR

apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
  name: upgrade
spec:
  stage: Idle
  seedImageRef:
    version: <target_version>
    image: <seed_container_image>
    pullSecretRef:
      name: <seed_pull_secret>
  autoRollbackOnFailure: {}
  extraManifests:
  - name: example-extra-manifests
    namespace: openshift-lifecycle-agent
  # List of ConfigMap resources that contain the OADP Backup and Restore CRs.
  oadpContent:
  - name: oadp-cm-example
    namespace: openshift-adp

where

spec.stage: Defines the stage of the ImageBasedUpgrade CR. The value can be Idle, Prep, Upgrade, or Rollback.
spec.seedImageRef: Defines the seed image to be used, the target platform version, and the secret required to access the image.
initMonitorTimeoutSeconds: Optionally defines the time frame in seconds to roll back when the upgrade does not complete within that time frame after the first reboot. If not defined or set to 0, the default value of 1800 seconds (30 minutes) is used.
spec.extraManifests: Optionally defines the list of ConfigMap resources that contain your custom catalog sources to retain after the upgrade, and your extra manifests to apply to the target cluster that are not part of the seed image.
spec.oadpContent: Defines the list of ConfigMap resources that contain the OADP Backup and Restore CRs.

Stages of the image-based upgrade

The image-based upgrade consists of four stages that you control by setting the spec.stage field in the ImageBasedUpgrade CR: Idle, Prep, Upgrade, and Rollback.

After generating the seed image on the seed cluster, you can move through the stages on the target cluster by setting the spec.stage field to one of the following values in the ImageBasedUpgrade CR:

Idle
Prep
Upgrade
Rollback (Optional)

Figure 1. Stages of the image-based upgrade

Idle stage

The Lifecycle Agent creates an ImageBasedUpgrade CR set to stage: Idle when the Operator is first deployed. This is the default stage. There is no ongoing upgrade and the cluster is ready to move to the Prep stage.

Figure 2. Transition from Idle stage

You also move to the Idle stage to do one of the following steps:

Finalize a successful upgrade
Finalize a rollback
Cancel an ongoing upgrade until the pre-pivot phase in the Upgrade stage

Moving to the Idle stage ensures that the Lifecycle Agent cleans up resources, so that the cluster is ready for upgrades again.

Figure 3. Transitions to Idle stage

If using RHACM when you cancel an upgrade, you must remove the import.open-cluster-management.io/disable-auto-import annotation from the target managed cluster to re-enable the automatic import of the cluster.

Prep stage

You can complete this stage before a scheduled maintenance window.

For the Prep stage, you specify the following upgrade details in the ImageBasedUpgrade CR:

seed image to use
resources to back up
extra manifests to apply and custom catalog sources to retain after the upgrade, if any

Then, based on what you specify, the Lifecycle Agent prepares for the upgrade without impacting the current running version. During this stage, the Lifecycle Agent ensures that the target cluster is ready to proceed to the Upgrade stage by checking if it meets certain conditions. The Operator pulls the seed image to the target cluster with additional container images specified in the seed image. The Lifecycle Agent checks if there is enough space on the container storage disk and if necessary, the Operator deletes unpinned images until the disk usage is below the specified threshold. For more information about how to configure or disable the cleaning up of the container storage disk, see "Configuring the automatic image cleanup of the container storage disk".

You also prepare backup resources with the OADP Operator’s Backup and Restore CRs. These CRs are used in the Upgrade stage to reconfigure the cluster, register the cluster with RHACM, and restore application artifacts.

In addition to the OADP Operator, the Lifecycle Agent uses the ostree versioning system to create a backup, which allows complete cluster reconfiguration after both upgrade and rollback.

After the Prep stage finishes, you can cancel the upgrade process by moving to the Idle stage or you can start the upgrade by moving to the Upgrade stage in the ImageBasedUpgrade CR. If you cancel the upgrade, the Operator performs cleanup operations.

Figure 4. Transition from Prep stage

Upgrade stage

The Upgrade stage consists of two phases:

pre-pivot: Just before pivoting to the new stateroot, the Lifecycle Agent collects the required cluster specific artifacts and stores them in the new stateroot. The backup of your cluster resources specified in the Prep stage are created on a compatible Object storage solution. The Lifecycle Agent exports CRs specified in the extraManifests field in the ImageBasedUpgrade CR or the CRs described in the ZTP policies that are bound to the target cluster. After pre-pivot phase has completed, the Lifecycle Agent sets the new stateroot deployment as the default boot entry and reboots the node.
post-pivot: After booting from the new stateroot, the Lifecycle Agent also regenerates the seed image’s cluster cryptography. This ensures that each single-node OpenShift cluster upgraded with the same seed image has unique and valid cryptographic objects. The Operator then reconfigures the cluster by applying cluster-specific artifacts that were collected in the pre-pivot phase. The Operator applies all saved CRs, and restores the backups.

After the upgrade has completed and you are satisfied with the changes, you can finalize the upgrade by moving to the Idle stage.

When you finalize the upgrade, you cannot roll back to the original release.

Figure 5. Transitions from Upgrade stage

If you want to cancel the upgrade, you can do so until the pre-pivot phase of the Upgrade stage. If you encounter issues after the upgrade, you can move to the Rollback stage for a manual rollback.

Rollback stage

The Rollback stage can be initiated manually or automatically upon failure. During the Rollback stage, the Lifecycle Agent sets the original ostree stateroot deployment as default. Then, the node reboots with the previous release of OKD and application configurations.

If you move to the Idle stage after a rollback, the Lifecycle Agent cleans up resources that can be used to troubleshoot a failed upgrade.

The Lifecycle Agent initiates an automatic rollback if the upgrade does not complete within a specified time limit. For more information about the automatic rollback, see the "Moving to the Rollback stage with Lifecycle Agent" or "Moving to the Rollback stage with Lifecycle Agent and GitOps ZTP" sections.

Figure 6. Transition from Rollback stage

Additional resources

Guidelines for the image-based upgrade

Your deployments must meet specific requirements for a successful image-based upgrade, which can be performed using either GitOps ZTP or non-GitOps deployment methods.

For a successful image-based upgrade, your deployments must meet certain requirements.

There are different deployment methods in which you can perform the image-based upgrade:

GitOps ZTP: You use the GitOps Zero Touch Provisioning (ZTP) to deploy and configure your clusters.
Non-GitOps: You manually deploy and configure your clusters.

You can perform an image-based upgrade in disconnected environments. For more information about how to mirror images for a disconnected environment, see "Mirroring images for a disconnected installation".

Additional resources

Mirroring images for a disconnected installation

Minimum software version of components

The image-based upgrade requires specific minimum software versions for various components depending on your deployment method.

Depending on your deployment method, the image-based upgrade requires the following minimum software versions.

Table 1. Minimum software version of components
Component	Software version	Required
Lifecycle Agent	4.16	Yes
OADP Operator	1.4.1	Yes
Managed cluster version	4.14.13	Yes
Hub cluster version	4.16	No
RHACM	2.10.2	No
GitOps ZTP plugin	4.16	Only for GitOps ZTP deployment method
Red Hat OpenShift GitOps	1.12	Only for GitOps ZTP deployment method
Topology Aware Lifecycle Manager (TALM)	4.16	Only for GitOps ZTP deployment method
Local Storage Operator ^[1]	4.14	Yes
Logical Volume Manager (LVM) Storage ^[1]	4.14.2	Yes

The persistent storage must be provided by either the LVM Storage or the Local Storage Operator, not both.

Hub cluster guidelines

When using RHACM, the hub cluster must meet specific conditions including disabling optional add-ons and upgrading to at least the target version.

If you are using Red Hat Advanced Cluster Management (RHACM), your hub cluster needs to meet the following conditions:

To avoid including any RHACM resources in your seed image, you need to disable all optional RHACM add-ons before generating the seed image.
Your hub cluster must be upgraded to at least the target version before performing an image-based upgrade on a target single-node OpenShift cluster.

Seed image guidelines

The seed image targets single-node OpenShift clusters with matching hardware and configuration, requiring the seed cluster to match specific aspects of the target clusters.

The seed image targets a set of single-node OpenShift clusters with the same hardware and similar configuration. This means that the seed cluster must match the configuration of the target clusters for the following items:

CPU topology
- Number of CPU cores
- Tuned performance configuration, such as number of reserved CPUs
MachineConfig resources for the target cluster
IP version

Dual-stack networking is not supported in this release.
Set of Day 2 Operators, including the Lifecycle Agent and the OADP Operator
Disconnected registry
FIPS configuration

The following configurations only have to partially match on the participating clusters:

If the target cluster has a proxy configuration, the seed cluster must have a proxy configuration too but the configuration does not have to be the same.
A dedicated partition on the primary disk for container storage is required on all participating clusters. However, the size and start of the partition does not have to be the same. Only the spec.config.storage.disks.partitions.label: varlibcontainers label in the MachineConfig CR must match on both the seed and target clusters. For more information about how to create the disk partition, see "Configuring a shared container partition between ostree stateroots" or "Configuring a shared container partition between ostree stateroots when using GitOps ZTP".

For more information about what to include in the seed image, see "Seed image configuration" and "Seed image configuration using the RAN DU profile".

Additional resources

OADP backup and restore guidelines

Use the OADP Operator to back up and restore applications during the image-based upgrade by creating Backup and Restore CRs wrapped in ConfigMap objects.

With the OADP Operator, you can back up and restore your applications on your target clusters by using Backup and Restore CRs wrapped in ConfigMap objects. The application must work on the current and the target OKD versions so that they can be restored after the upgrade. The backups must include resources that were initially created.

The following resources must be excluded from the backup:

pods
endpoints
controllerrevision
podmetrics
packagemanifest
replicaset
localvolume, if using Local Storage Operator (LSO)

There are two local storage implementations for single-node OpenShift:

Local Storage Operator (LSO): The Lifecycle Agent automatically backs up and restores the required artifacts, including localvolume resources and their associated StorageClass resources. You must exclude the persistentvolumes resource in the application Backup CR.
LVM Storage: You must create the Backup and Restore CRs for LVM Storage artifacts. You must include the persistentVolumes resource in the application Backup CR.

For the image-based upgrade, only one Operator is supported on a given target cluster.

For both Operators, you must not apply the Operator CRs as extra manifests through the ImageBasedUpgrade CR.

The persistent volume contents are preserved and used after the pivot. When you are configuring the DataProtectionApplication CR, you must ensure that the .spec.configuration.restic.enable is set to false for an image-based upgrade. This disables Container Storage Interface integration.

lca.openshift.io/apply-wave guidelines

The lca.openshift.io/apply-wave annotation determines the apply order of Backup or Restore CRs. The value of the annotation must be a string number. If you define the lca.openshift.io/apply-wave annotation in the Backup or Restore CRs, they are applied in increasing order based on the annotation value. If you do not define the annotation, they are applied together.

The lca.openshift.io/apply-wave annotation must be numerically lower in your platform Restore CRs, for example RHACM and LVM Storage artifacts, than that of the application. This way, the platform artifacts are restored before your applications.

If your application includes cluster-scoped resources, you must create separate Backup and Restore CRs to scope the backup to the specific cluster-scoped resources created by the application. The Restore CR for the cluster-scoped resources must be restored before the remaining application Restore CR(s).

lca.openshift.io/apply-label guidelines

You can back up specific resources exclusively with the lca.openshift.io/apply-label annotation. Based on which resources you define in the annotation, the Lifecycle Agent applies the lca.openshift.io/backup: <backup_name> label and adds the labelSelector.matchLabels.lca.openshift.io/backup: <backup_name> label selector to the specified resources when creating the Backup CRs.

To use the lca.openshift.io/apply-label annotation for backing up specific resources, the resources listed in the annotation must also be included in the spec section. If the lca.openshift.io/apply-label annotation is used in the Backup CR, only the resources listed in the annotation are backed up, even if other resource types are specified in the spec section or not.

Example CR

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: acm-klusterlet
  namespace: openshift-adp
  annotations:
    lca.openshift.io/apply-label: rbac.authorization.k8s.io/v1/clusterroles/klusterlet,apps/v1/deployments/open-cluster-management-agent/klusterlet
  labels:
    velero.io/storage-location: default
spec:
  includedNamespaces:
   - open-cluster-management-agent
  includedClusterScopedResources:
   - clusterroles
  includedNamespaceScopedResources:
   - deployments

The metadata.annotations.lca.openshift.io/apply-label value must be a list of comma-separated objects in group/version/resource/name format for cluster-scoped resources or group/version/resource/namespace/name format for namespace-scoped resources, and it must be attached to the related Backup CR.

Extra manifest guidelines

Extra manifests enable the Lifecycle Agent to restore cluster-specific configurations after the upgrade pivot but before restoring application artifacts.

The Lifecycle Agent uses extra manifests to restore your target clusters after rebooting with the new stateroot deployment and before restoring application artifacts.

Different deployment methods require a different way to apply the extra manifests:

GitOps ZTP

You use the lca.openshift.io/target-ocp-version: <target_ocp_version> label to mark the extra manifests that the Lifecycle Agent must extract and apply after the pivot. You can specify the number of manifests labeled with lca.openshift.io/target-ocp-version by using the lca.openshift.io/target-ocp-version-manifest-count annotation in the ImageBasedUpgrade CR. If specified, the Lifecycle Agent verifies that the number of manifests extracted from policies matches the number provided in the annotation during the prep and upgrade stages.

Example for the lca.openshift.io/target-ocp-version-manifest-count annotation

apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
  annotations:
    lca.openshift.io/target-ocp-version-manifest-count: "5"
  name: upgrade

Non-Gitops

You mark your extra manifests with the lca.openshift.io/apply-wave annotation to determine the apply order. The labeled extra manifests are wrapped in ConfigMap objects and referenced in the ImageBasedUpgrade CR that the Lifecycle Agent uses after the pivot.

If the target cluster uses custom catalog sources, you must include them as extra manifests that point to the correct release version.

You cannot apply the following items as extra manifests:

MachineConfig objects
OLM Operator subscriptions

Additional resources