Backing up and restoring etcd on AWS - High availability for hosted control planes | Hosted control planes

Taking a snapshot of etcd for a hosted cluster
Restoring an etcd snapshot on a hosted cluster

You can back up and restore etcd on a hosted cluster on Amazon Web Services (AWS) to fix failures.

Taking a snapshot of etcd for a hosted cluster

To back up etcd for a hosted cluster, you must take a snapshot of etcd. Later, you can restore etcd by using the snapshot.

This procedure requires API downtime.

Procedure

Pause reconciliation of the hosted cluster by entering the following command:

$ oc patch -n clusters hostedclusters/<hosted_cluster_name> \
  -p '{"spec":{"pausedUntil":"true"}}' --type=merge

Stop all etcd-writer deployments by entering the following command:

$ oc scale deployment -n <hosted_cluster_namespace> --replicas=0 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver

To take an etcd snapshot, use the exec command in each etcd container by entering the following command:

$ oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl \
  --cacert /etc/etcd/tls/etcd-ca/ca.crt \
  --cert /etc/etcd/tls/client/etcd-client.crt \
  --key /etc/etcd/tls/client/etcd-client.key \
  --endpoints=localhost:2379 \
  snapshot save /var/lib/data/snapshot.db

To check the snapshot status, use the exec command in each etcd container by running the following command:

$ oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status \
  /var/lib/data/snapshot.db

Copy the snapshot data to a location where you can retrieve it later, such as an S3 bucket. See the following example.

The following example uses signature version 2. If you are in a region that supports signature version 4, such as the us-east-2 region, use signature version 4. Otherwise, when copying the snapshot to an S3 bucket, the upload fails.

Example

BUCKET_NAME=somebucket
CLUSTER_NAME=cluster_name
FILEPATH="/${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"
CONTENT_TYPE="application/x-compressed-tar"
DATE_VALUE=`date -R`
SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"
ACCESS_KEY=accesskey
SECRET_KEY=secret
SIGNATURE_HASH=`echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
HOSTED_CLUSTER_NAMESPACE=hosted_cluster_namespace

oc exec -it etcd-0 -n ${HOSTED_CLUSTER_NAMESPACE} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
  -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
  -H "Date: ${DATE_VALUE}" \
  -H "Content-Type: ${CONTENT_TYPE}" \
  -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
  https://${BUCKET_NAME}.s3.amazonaws.com/${CLUSTER_NAME}-snapshot.db

To restore the snapshot on a new cluster later, save the encryption secret that the hosted cluster references.
1. Get the secret encryption key by entering the following command:
  $ oc get hostedcluster <hosted_cluster_name> \ -o=jsonpath='{.spec.secretEncryption.aescbc}' {"activeKey":{"name":"<hosted_cluster_name>-etcd-encryption-key"}}
2. Save the secret encryption key by entering the following command:
  $ oc get secret <hosted_cluster_name>-etcd-encryption-key \ -o=jsonpath='{.data.key}'
  You can decrypt this key when restoring a snapshot on a new cluster.

Restart all etcd-writer deployments by entering the following command:

$ oc scale deployment -n <control_plane_namespace> --replicas=3 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver

Resume the reconciliation of the hosted cluster by entering the following command:

$ oc patch -n <hosted_cluster_namespace> \
  -p '[\{"op": "remove", "path": "/spec/pausedUntil"}]' --type=json

Next steps

Restore the etcd snapshot.

Restoring an etcd snapshot on a hosted cluster

If you have a snapshot of etcd from your hosted cluster, you can restore it. Currently, you can restore an etcd snapshot only during cluster creation.

To restore an etcd snapshot, you modify the output from the create cluster --render command and define a restoreSnapshotURL value in the etcd section of the HostedCluster specification.

The --render flag in the hcp create command does not render the secrets. To render the secrets, you must use both the --render and the --render-sensitive flags in the hcp create command.

Prerequisites

You took an etcd snapshot on a hosted cluster.

Procedure

On the aws command-line interface (CLI), create a pre-signed URL so that you can download your etcd snapshot from S3 without passing credentials to the etcd deployment:
```
ETCD_SNAPSHOT=${ETCD_SNAPSHOT:-"s3://${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"}
ETCD_SNAPSHOT_URL=$(aws s3 presign ${ETCD_SNAPSHOT})
```

Modify the HostedCluster specification to refer to the URL:

spec:
  etcd:
    managed:
      storage:
        persistentVolume:
          size: 4Gi
        type: PersistentVolume
        restoreSnapshotURL:
        - "${ETCD_SNAPSHOT_URL}"
    managementType: Managed

Ensure that the secret that you referenced from the spec.secretEncryption.aescbc value contains the same AES key that you saved in the previous steps.