Guidance for clusters that span data centers | etcd

Deployment caveats for spanned clusters
Infrastructure as a Service (IaaS) and cloud provider considerations
Site recommendations
Requirements for etcd, networking, and storage
Workload placement considerations

Red Hat strongly recommends a deployment model where OKD clusters are deployed within a data center, but also acknowledges that there can be scenarios where a provider can use a deployment model where a cluster can span across data centers. This document outlines considerations when exploring the use of cluster deployments that span many data centers and describes important metrics that affect the supportability of such deployments. The design of such deployments should adhere to these guidelines for the product to function optimally and ensure the highest quality of support with the appropriate product support subscriptions.

A cluster deployment that spans many data centers extends the cluster as a single failure domain across locations and should not be considered a replacement for a disaster recovery plan.

Clusters with cluster deployments that span many data centers are bound by standard Red Hat OKD support guidance. See the Red Hat OKD Lifecycle and Red Hat Production Support Scope of Coverage for more information.

It is not recommended to deploy an OKD cluster that spans many sites. If you need to be in many data centers or regions, deploy one cluster per region or site and use tools such as Red Hat Advanced Cluster Management for Kubernetes (ACM) to manage these clusters and deployments.

Some OKD platforms have specific support for many data center deployments. Check the platform-specific product documentation and release notes for details. Other platforms can span data centers, depending on the quality of the network connectivity between nodes. For more information, see Understanding etcd and the tunables/conditions affecting performance.

When implementing a cluster deployment that spans many data centers, you should strive to implement the practices detailed in Red Hat OKD High Availability, and Recommended Practices. An alternative to multisite deployments is to deploy one OKD cluster per site, managed by ACM.

Deployment caveats for spanned clusters

The guidance provided in this documentation focuses on general aspects of a cluster deployment that spans data centers. Some caveats to remember:

Although the designs for deployments that span data centers are not bound by any special support requirements, these clusters do have additional inherent complexities that can require additional consideration or support involvement (time to identify, remediate and resolve issues) when compared to a standard single-site cluster.
Applications might not work well or not work at all in clusters with high Kube API latency or low transaction rates.
Layered products, such as storage providers, have lower latency requirements. In those cases, the latency limits are dictated by the architectures that are supported by the layered product.
The failure scenarios are amplified with stretched control planes, and the way they are affected is specific to the deployment. Because of this, before using a deployment that spans data centers on a production environment, the organization should test and document the behavior of the cluster during disruptions such as:
- When there is a network partition leaving one, two, or all control plane nodes isolated
- When there are MTU mismatches on the transport network among the control plane nodes
- When there is a sustained spike in latency as a Day 2 event towards one or more of the control plane nodes
- When there is a considerable change in jitter due to network congestion, misconfiguration, or lack of QoS, an intermediate network device causing packet errors, and others
Clusters deployed across many sites, network infrastructures, storage infrastructures, or other components inherently have a higher number of points of failure. Network disruptions or splits become a larger threat to such clusters especially, putting the nodes at risk of losing contact with each other. These multisite clusters must be designed with the potential for such failures in mind. Organizations deploying multisite clusters should extensively test failure scenarios, and should consider whether the cluster has protection from all points of failure. Consult with Red Hat Support for assistance in considering the important aspects of a resilient High Availability cluster design.
In some cases, GEO awareness is a requirement or issue that must be solved to minimize latency, so a proper implementation of a Global Service Load Balancing (GSLB) method must be available.

Infrastructure as a Service (IaaS) and cloud provider considerations

This guidance applies to any infrastructure provider for which OKD control plane nodes are supported by the user-provisioned infrastructure installer (platform=none) or the agent-based installer (platform=metal) using the ”User Managed Network” option. Installer-provisioned infrastructure installers are not covered by these guidelines, however, where possible, installer-provisioned infrastructure deployments will span zones or availability zones on cloud or IaaS providers if possible by following these or similar guidelines. This means infrastructure provider-specific integrations will not be available (for example, integration with Cloud provider services such as storage services and load balancers). Provider-specific services might still be used as external services.

Using different infrastructure platform providers for control plane nodes is discouraged (for example, mixing nodes across IaaS, cloud, and bare metal as control plane nodes). Consider the following guidelines when such combinations are needed:

The minimum effective MTU across the infrastructure should be the maximum MTU used for the deployment. Using a lower MTU is acceptable. See Understanding and Validating MTU setting with OKD 4.x for more information.
The combined disk and network latency and jitter must maintain an etcd peer round trip time of less than 100ms. This is not the same as the network round trip time.
Layered products might have lower latency requirements. In those cases, the latency limits are dictated by the requirements of the architecture supported by the layered product. For example, OKD cluster deployments that span data centers with Red Hat OpenShift Data Foundation must have a latency requirement of less than 10ms RTT. For those cases, follow the specific product guidance.
For guidance on cluster deployments that span data centers using OpenShift Data Foundation as the storage provider, see Configuring OpenShift Data Foundation Disaster Recovery for OpenShift Workloads.

Additional resources

Site recommendations

Assuming each site gets one control plane member, you theoretically define three sites, which is what Red Hat recommends. This allows for one data center to go into an inactive state and the cluster still maintains quorum and operational consistency.

When this assumption is not met, attention should be given to the desired and actual fault tolerance state of the cluster, as it will often outline or dictate the operational capabilities (uptime and stability) of the deployment.

Requirements for etcd, networking, and storage

Consider the following requirements for clusters that span data centers.

etcd requirements

There is a large list of factors and considerations that go into planning an etcd cluster deployment. When planning an OKD cluster that spans data centers, you need to plan for situations that will likely stress or push etcd to the edge of its operational limits.

See Understanding etcd and the tunables/conditions affecting performance for more details on how to maintain operational capabilities and reduce service-affecting events and instability of the cluster.

Network requirements

The chosen network topology must yield direct IP connectivity between nodes. The minimum effective MTU across the infrastructure should be the maximum MTU used for the deployment. Using a lower MTU is acceptable.

For more information, see Understanding and Validating MTU setting with OKD 4.x. The latency needs are ultimately defined by the services that use the network. See the sections related to etcd and storage for more details on requirements.

In addition to the base networking requirements, you need to think about how applications will be accessed. A top-level Global Service Load Balancing (GSLB) method will be needed outside of OKD to enable external traffic to connect to the OKD control plane services and ingress controllers.

Storage requirements

When considering a cluster deployment that spans data centers, special consideration needs to be given to the selected storage integration to ensure that it also meets multisite requirements, as it pertains to accessibility from all sites, fault tolerance, high availability, and so on.

An object storage solution should be used for the registry, and this storage solution needs to be in addition to any PV storage integration used for application volumes or workloads. This object storage solution should also have the same special considerations given to accessibility from all sites, fault tolerance, high availability, and so on.

Because disk I/O is a critical factor in the health of etcd database, it is required that they are deployed on a high speed, low latency media. See etcd guidance on etcd peer round trip time and etcd database size for more details on the exact requirements to meet.

Additional resources

Workload placement considerations

With multisite clusters, administrators and developers must take special considerations into account to ensure that critical workloads are scheduled or placed based on the proper hardware or hosts within the topology of the cluster. This ensures that the applications and services are Highly Available and Fault Tolerant based on the topology of the cluster’s deployment.

Without considering this, it is possible for OKD to schedule workloads on hosts within the cluster so that a Single Point of Failure (SPoF) is created for OKD infrastructure services and other application services if there is a data center outage.