Control plane backup and restore operations

As a cluster administrator, you might need to stop an OKD cluster for a period and restart it later. Some reasons for restarting a cluster are that you need to perform maintenance on a cluster or want to reduce resource costs. In OKD, you can perform a graceful shutdown of a cluster so that you can easily restart the cluster later.

You must back up etcd data before shutting down a cluster; etcd is the key-value store for OKD, which persists the state of all resource objects. An etcd backup plays a crucial role in disaster recovery. In OKD, you can also replace an unhealthy etcd member.

When you want to get your cluster running again, restart the cluster gracefully.

A cluster’s certificates expire one year after the installation date. You can shut down a cluster and expect it to restart gracefully while the certificates are still valid. Although the cluster automatically retrieves the expired control plane certificates, you must still approve the certificate signing requests (CSRs).

You might run into several situations where OKD does not work as expected, such as:

  • You have a cluster that is not functional after the restart because of unexpected conditions, such as node failure, or network connectivity issues.

  • You have deleted something critical in the cluster by mistake.

  • You have lost the majority of your control plane hosts, leading to etcd quorum loss.

You can always recover from a disaster situation by restoring your cluster to its previous state using the saved etcd snapshots.

Application backup and restore operations

As a cluster administrator, you can back up and restore applications running on OKD by using the OpenShift API for Data Protection (OADP).

OADP backs up and restores Kubernetes resources and internal images, at the granularity of a namespace, by using the version of Velero that is appropriate for the version of OADP you install, according to the table in Downloading the Velero CLI tool. OADP backs up and restores persistent volumes (PVs) by using snapshots or Restic. For details, see OADP features.

OADP requirements

OADP has the following requirements:

  • You must be logged in as a user with a cluster-admin role.

  • You must have object storage for storing backups, such as one of the following storage types:

    • OpenShift Data Foundation

    • Amazon Web Services

    • Microsoft Azure

    • Google Cloud Platform

    • S3-compatible object storage

The CloudStorage API for S3 storage is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

  • To back up PVs with snapshots, you must have cloud storage that has a native snapshot API or supports Container Storage Interface (CSI) snapshots, such as the following providers:

    • Amazon Web Services

    • Microsoft Azure

    • Google Cloud Platform

    • CSI snapshot-enabled cloud storage, such as Ceph RBD or Ceph FS

If you do not want to back up PVs by using snapshots, you can use Restic, which is installed by the OADP Operator by default.

Backing up and restoring applications

You back up applications by creating a Backup custom resource (CR). You can configure the following backup options:

You restore applications by creating a Restore CR. You can configure restore hooks to run commands in init containers or in the application container during the restore operation.