×

As an administrator, you can specify multiple regions and zones for your OKD cluster that runs on a VMware vSphere instance. This configuration reduces the risk of a hardware failure or network outage causing your cluster to fail.

A failure domain configuration lists parameters that create a topology. The following list states some of these parameters:

  • computeCluster

  • datacenter

  • datastore

  • networks

  • resourcePool

After you define multiple regions and zones for your OKD cluster, you can create or migrate nodes to another failure domain.

If you want to migrate pre-existing OKD cluster compute nodes to a failure domain, you must define a new compute machine set for the compute node. This new machine set can scale up a compute node according to the topology of the failure domain, and scale down the pre-existing compute node.

The cloud provider adds topology.kubernetes.io/zone and topology.kubernetes.io/region labels to any compute node provisioned by a machine set resource.

For more information, see Creating a compute machine set.

Specifying multiple regions and zones for your cluster on vSphere

You can configure the infrastructures.config.openshift.io configuration resource to specify multiple regions and zones for your OKD cluster that runs on a VMware vSphere instance.

Topology-aware features for the cloud controller manager and the vSphere Container Storage Interface (CSI) Operator Driver require information about the vSphere topology where you host your OKD cluster. This topology information exists in the infrastructures.config.openshift.io configuration resource.

Before you specify regions and zones for your cluster, you must ensure that all data centers and compute clusters contain tags, so that the cloud provider can add labels to your node. For example, if data-center-1 represents region-a and compute-cluster-1 represents zone-1, the cloud provider adds an openshift-region category label with a value of region-a to data-center-1. Additionally, the cloud provider adds an openshift-zone category tag with a value of zone-1 to compute-cluster-1.

You can migrate control plane nodes with vMotion capabilities to a failure domain. After you add these nodes to a failure domain, the cloud provider adds topology.kubernetes.io/zone and topology.kubernetes.io/region labels to these nodes.

Prerequisites
  • You created the openshift-region and openshift-zone tag categories on the vCenter server.

  • You ensured that each data center and compute cluster contains tags that represent the name of their associated region or zone, or both.

  • Optional: If you defined API and Ingress static IP addresses to the installation program, you must ensure that all regions and zones share a common layer 2 network. This configuration ensures that API and Ingress Virtual IP (VIP) addresses can interact with your cluster.

If you do not supply tags to all data centers and compute clusters before you create a node or migrate a node, the cloud provider cannot add the topology.kubernetes.io/zone and topology.kubernetes.io/region labels to the node. This means that services cannot route traffic to your node.

Procedure
  1. Edit the infrastructures.config.openshift.io custom resource definition (CRD) of your cluster to specify multiple regions and zones in the failureDomains section of the resource by running the following command:

    $ oc edit infrastructures.config.openshift.io cluster
    Example infrastructures.config.openshift.io CRD for a instance named cluster with multiple regions and zones defined in its configuration
    spec:
      cloudConfig:
        key: config
        name: cloud-provider-config
      platformSpec:
        type: vSphere
        vsphere:
          vcenters:
            - datacenters:
                - <region_a_data_center>
                - <region_b_data_center>
              port: 443
              server: <your_vcenter_server>
          failureDomains:
            - name: <failure_domain_1>
              region: <region_a>
              zone: <zone_a>
              server: <your_vcenter_server>
              topology:
                datacenter: <region_a_dc>
                computeCluster: "</region_a_dc/host/zone_a_cluster>"
                resourcePool: "</region_a_dc/host/zone_a_cluster/Resources/resource_pool>"
                datastore: "</region_a_dc/datastore/datastore_a>"
                networks:
                - port-group
            - name: <failure_domain_2>
              region: <region_a>
              zone: <zone_b>
              server: <your_vcenter_server>
              topology:
                computeCluster: </region_a_dc/host/zone_b_cluster>
                datacenter: <region_a_dc>
                datastore: </region_a_dc/datastore/datastore_a>
                networks:
                - port-group
            - name: <failure_domain_3>
              region: <region_b>
              zone: <zone_a>
              server: <your_vcenter_server>
              topology:
                computeCluster: </region_b_dc/host/zone_a_cluster>
                datacenter: <region_b_dc>
                datastore: </region_b_dc/datastore/datastore_b>
                networks:
                - port-group
          nodeNetworking:
            external: {}
            internal: {}

    After you create a failure domain and you define it in a CRD for a VMware vSphere cluster, you must not modify or delete the failure domain. Doing any of these actions with this configuration can impact the availability and fault tolerance of a control plane machine.

  2. Save the resource file to apply the changes.

Enabling a multiple layer 2 network for your cluster

You can configure your cluster to use a multiple layer 2 network configuration so that data transfer among nodes can span across multiple networks.

Prerequisites
  • You configured network connectivity among machines so that cluster components can communicate with each other.

Procedure
  • If you installed your cluster with installer-provisioned infrastructure, you must ensure that all control plane nodes share a common layer 2 network. Additionally, ensure compute nodes that are configured for Ingress pod scheduling share a common layer 2 network.

    • If you need compute nodes to span multiple layer 2 networks, you can create infrastructure nodes that can host Ingress pods.

    • If you need to provision workloads across additional layer 2 networks, you can create compute machine sets on vSphere and then move these workloads to your target layer 2 networks.

  • If you installed your cluster on infrastructure that you provided, which is defined as a user-provisioned infrastructure, complete the following actions to meet your needs:

    • Configure your API load balancer and network so that the load balancer can reach the API and Machine Config Server on the control plane nodes.

    • Configure your Ingress load balancer and network so that the load balancer can reach the Ingress pods on the compute or infrastructure nodes.

Parameters for the cluster-wide infrastructure CRD

You must set values for specific parameters in the cluster-wide infrastructure, infrastructures.config.openshift.io, Custom Resource Definition (CRD) to define multiple regions and zones for your OKD cluster that runs on a VMware vSphere instance.

The following table lists mandatory parameters for defining multiple regions and zones for your OKD cluster:

Parameter Description

vcenters

The vCenter server for your OKD cluster. You can only specify one vCenter for your cluster.

datacenters

vCenter data centers where VMs associated with the OKD cluster will be created or presently exist.

port

The TCP port of the vCenter server.

server

The fully qualified domain name (FQDN) of the vCenter server.

failureDomains

The list of failure domains.

name

The name of the failure domain.

region

The value of the openshift-region tag assigned to the topology for the failure failure domain.

zone

The value of the openshift-zone tag assigned to the topology for the failure failure domain.

topology

The vCenter reources associated with the failure domain.

datacenter

The data center associated with the failure domain.

computeCluster

The full path of the compute cluster associated with the failure domain.

resourcePool

The full path of the resource pool associated with the failure domain.

datastore

The full path of the datastore associated with the failure domain.

networks

A list of port groups associated with the failure domain. Only one portgroup may be defined.