×

In OKD version 4, you can install a cluster on your VMware vSphere instance by using installer-provisioned infrastructure.

OKD supports deploying a cluster to a single VMware vCenter only. Deploying a cluster with machines/machine sets on multiple vCenters is not supported.

Prerequisites

Deploying the cluster

You can install OKD on a compatible cloud platform.

You can run the create cluster command of the installation program only once, during initial installation.

Prerequisites
  • You have the OKD installation program and the pull secret for your cluster.

  • You have verified that the cloud provider account on your host has the correct permissions to deploy the cluster. An account with incorrect permissions causes the installation process to fail with an error message that displays the missing permissions.

  • Optional: Before you create the cluster, configure an external load balancer in place of the default load balancer.

    You do not need to specify API and Ingress static addresses for your installation program. If you choose this configuration, you must take additional actions to define network targets that accept an IP address from each referenced vSphere subnet. See the section "Configuring an external load balancer".

Procedure
  1. Change to the directory that contains the installation program and initialize the cluster deployment:

    $ ./openshift-install create cluster --dir <installation_directory> \ (1)
        --log-level=info (2)
    
    1 For <installation_directory>, specify the directory name to store the files that the installation program creates.
    2 To view different installation details, specify warn, debug, or error instead of info.

    When specifying the directory:

    • Verify that the directory has the execute permission. This permission is required to run Terraform binaries under the installation directory.

    • Use an empty directory. Some installation assets, such as bootstrap X.509 certificates, have short expiration intervals, therefore you must not reuse an installation directory. If you want to reuse individual files from another cluster installation, you can copy them into your directory. However, the file names for the installation assets might change between releases. Use caution when copying installation files from an earlier OKD version.

  2. Provide values at the prompts:

    1. Optional: Select an SSH key to use to access your cluster machines.

      For production OKD clusters on which you want to perform installation debugging or disaster recovery, specify an SSH key that your ssh-agent process uses.

    2. Select vsphere as the platform to target.

    3. Specify the name of your vCenter instance.

    4. Specify the user name and password for the vCenter account that has the required permissions to create the cluster.

      The installation program connects to your vCenter instance.

      Some VMware vCenter Single Sign-On (SSO) environments with Active Directory (AD) integration might primarily require you to use the traditional login method, which requires the <domain>\ construct.

      To ensure that vCenter account permission checks complete properly, consider using the User Principal Name (UPN) login method, such as <username>@<fully_qualified_domainname>.

    5. Select the data center in your vCenter instance to connect to.

    6. Select the default vCenter datastore to use.

      Datastore and cluster names cannot exceed 60 characters; therefore, ensure the combined string length does not exceed the 60 character limit.

    7. Select the vCenter cluster to install the OKD cluster in. The installation program uses the root resource pool of the vSphere cluster as the default resource pool.

    8. Select the network in the vCenter instance that contains the virtual IP addresses and DNS records that you configured.

    9. Enter the virtual IP address that you configured for control plane API access.

    10. Enter the virtual IP address that you configured for cluster ingress.

    11. Enter the base domain. This base domain must be the same one that you used in the DNS records that you configured.

    12. Enter a descriptive name for your cluster. The cluster name must be the same one that you used in the DNS records that you configured.

      Datastore and cluster names cannot exceed 60 characters; therefore, ensure the combined string length does not exceed the 60 character limit.

    13. Paste the pull secret from Red Hat OpenShift Cluster Manager.

      • If you do not have a pull secret from Red Hat OpenShift Cluster Manager, you can paste the pull secret another private registry.

      • If you do not need the cluster to pull images from a private registry, you can paste {"auths":{"fake":{"auth":"aWQ6cGFzcwo="}}} as the pull secret.

Verification

When the cluster deployment completes successfully:

  • The terminal displays directions for accessing your cluster, including a link to the web console and credentials for the kubeadmin user.

  • Credential information also outputs to <installation_directory>/.openshift_install.log.

Do not delete the installation program or the files that the installation program creates. Both are required to delete the cluster.

Example output
...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/myuser/install_dir/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.mycluster.example.com
INFO Login to the console with user: "kubeadmin", and password: "password"
INFO Time elapsed: 36m22s
  • The Ignition config files that the installation program generates contain certificates that expire after 24 hours, which are then renewed at that time. If the cluster is shut down before renewing the certificates and the cluster is later restarted after the 24 hours have elapsed, the cluster automatically recovers the expired certificates. The exception is that you must manually approve the pending node-bootstrapper certificate signing requests (CSRs) to recover kubelet certificates. See the documentation for Recovering from expired control plane certificates for more information.

  • It is recommended that you use Ignition config files within 12 hours after they are generated because the 24-hour certificate rotates from 16 to 22 hours after the cluster is installed. By using the Ignition config files within 12 hours, you can avoid installation failure if the certificate update runs during installation.

Logging in to the cluster by using the CLI

You can log in to your cluster as a default system user by exporting the cluster kubeconfig file. The kubeconfig file contains information about the cluster that is used by the CLI to connect a client to the correct cluster and API server. The file is specific to a cluster and is created during OKD installation.

Prerequisites
  • You deployed an OKD cluster.

  • You installed the oc CLI.

Procedure
  1. Export the kubeadmin credentials:

    $ export KUBECONFIG=<installation_directory>/auth/kubeconfig (1)
    1 For <installation_directory>, specify the path to the directory that you stored the installation files in.
  2. Verify you can run oc commands successfully using the exported configuration:

    $ oc whoami
    Example output
    system:admin

Creating registry storage

After you install the cluster, you must create storage for the registry Operator.

Image registry removed during installation

On platforms that do not provide shareable object storage, the OpenShift Image Registry Operator bootstraps itself as Removed. This allows openshift-installer to complete installations on these platform types.

After installation, you must edit the Image Registry Operator configuration to switch the managementState from Removed to Managed.

Image registry storage configuration

The Image Registry Operator is not initially available for platforms that do not provide default storage. After installation, you must configure your registry to use storage so that the Registry Operator is made available.

Instructions are shown for configuring a persistent volume, which is required for production clusters. Where applicable, instructions are shown for configuring an empty directory as the storage location, which is available for only non-production clusters.

Additional instructions are provided for allowing the image registry to use block storage types by using the Recreate rollout strategy during upgrades.

Configuring registry storage for VMware vSphere

As a cluster administrator, following installation you must configure your registry to use storage.

Prerequisites
  • Cluster administrator permissions.

  • A cluster on VMware vSphere.

  • Persistent storage provisioned for your cluster, such as Red Hat OpenShift Data Foundation.

    OKD supports ReadWriteOnce access for image registry storage when you have only one replica. ReadWriteOnce access also requires that the registry uses the Recreate rollout strategy. To deploy an image registry that supports high availability with two or more replicas, ReadWriteMany access is required.

  • Must have "100Gi" capacity.

Testing shows issues with using the NFS server on RHEL as storage backend for core services. This includes the OpenShift Container Registry and Quay, Prometheus for monitoring storage, and Elasticsearch for logging storage. Therefore, using RHEL NFS to back PVs used by core services is not recommended.

Other NFS implementations on the marketplace might not have these issues. Contact the individual NFS implementation vendor for more information on any testing that was possibly completed against these OKD core components.

Procedure
  1. To configure your registry to use storage, change the spec.storage.pvc in the configs.imageregistry/cluster resource.

    When you use shared storage, review your security settings to prevent outside access.

  2. Verify that you do not have a registry pod:

    $ oc get pod -n openshift-image-registry -l docker-registry=default
    Example output
    No resourses found in openshift-image-registry namespace

    If you do have a registry pod in your output, you do not need to continue with this procedure.

  3. Check the registry configuration:

    $ oc edit configs.imageregistry.operator.openshift.io
    Example output
    storage:
      pvc:
        claim: (1)
    1 Leave the claim field blank to allow the automatic creation of an image-registry-storage persistent volume claim (PVC). The PVC is generated based on the default storage class. However, be aware that the default storage class might provide ReadWriteOnce (RWO) volumes, such as a RADOS Block Device (RBD), which can cause issues when you replicate to more than one replica.
  4. Check the clusteroperator status:

    $ oc get clusteroperator image-registry
    Example output
    NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    image-registry   4.7                                  True        False         False      6h50m

Configuring block registry storage for VMware vSphere

To allow the image registry to use block storage types such as vSphere Virtual Machine Disk (VMDK) during upgrades as a cluster administrator, you can use the Recreate rollout strategy.

Block storage volumes are supported but not recommended for use with image registry on production clusters. An installation where the registry is configured on block storage is not highly available because the registry cannot have more than one replica.

Procedure
  1. Enter the following command to set the image registry storage as a block storage type, patch the registry so that it uses the Recreate rollout strategy, and runs with only 1 replica:

    $ oc patch config.imageregistry.operator.openshift.io/cluster --type=merge -p '{"spec":{"rolloutStrategy":"Recreate","replicas":1}}'
  2. Provision the PV for the block storage device, and create a PVC for that volume. The requested block volume uses the ReadWriteOnce (RWO) access mode.

    1. Create a pvc.yaml file with the following contents to define a VMware vSphere PersistentVolumeClaim object:

      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: image-registry-storage (1)
        namespace: openshift-image-registry (2)
      spec:
        accessModes:
        - ReadWriteOnce (3)
        resources:
          requests:
            storage: 100Gi (4)
      1 A unique name that represents the PersistentVolumeClaim object.
      2 The namespace for the PersistentVolumeClaim object, which is openshift-image-registry.
      3 The access mode of the persistent volume claim. With ReadWriteOnce, the volume can be mounted with read and write permissions by a single node.
      4 The size of the persistent volume claim.
    2. Enter the following command to create the PersistentVolumeClaim object from the file:

      $ oc create -f pvc.yaml -n openshift-image-registry
  3. Enter the following command to edit the registry configuration so that it references the correct PVC:

    $ oc edit config.imageregistry.operator.openshift.io -o yaml
    Example output
    storage:
      pvc:
        claim: (1)
    1 By creating a custom PVC, you can leave the claim field blank for the default automatic creation of an image-registry-storage PVC.

For instructions about configuring registry storage so that it references the correct PVC, see Configuring the registry for vSphere.

Additional resources

Configuring an external load balancer

You can configure an OKD cluster to use an external load balancer in place of the default load balancer.

MetalLB, that runs on a cluster, functions as an external load balancer.

Configuring an external load balancer depends on your vendor’s load balancer.

The information and examples in this section are for guideline purposes only. Consult the vendor documentation for more specific information about the vendor’s load balancer.

Red Hat supports the following services for an external load balancer:

  • Ingress Controller

  • OpenShift API

  • OpenShift MachineConfig API

You can choose whether you want to configure one or all of these services for an external load balancer. Configuring only the Ingress Controller service is a common configuration option. To better understand each service, view the following diagrams:

An image that shows an example network workflow of an Ingress Controller operating in an OKD environment.
Figure 1. Example network workflow that shows an Ingress Controller operating in an OKD environment
An image that shows an example network workflow of an OpenShift API operating in an OKD environment.
Figure 2. Example network workflow that shows an OpenShift API operating in an OKD environment
An image that shows an example network workflow of an OpenShift MachineConfig API operating in an OKD environment.
Figure 3. Example network workflow that shows an OpenShift MachineConfig API operating in an OKD environment

The following configuration options are supported for external load balancers:

  • Use a node selector to map the Ingress Controller to a specific set of nodes. You must assign a static IP address to each node in this set, or configure each node to receive the same IP address from the Dynamic Host Configuration Protocol (DHCP). Infrastructure nodes commonly receive this type of configuration.

  • Target all IP addresses on a subnet. This configuration can reduce maintenance overhead, because you can create and destroy nodes within those networks without reconfiguring the load balancer targets. If you deploy your ingress pods by using a machine set on a smaller network, such as a /27 or /28, you can simplify your load balancer targets.

    You can list all IP addresses that exist in a network by checking the machine config pool’s resources.

Considerations
  • For a front-end IP address, you can use the same IP address for the front-end IP address, the Ingress Controller’s load balancer, and API load balancer. Check the vendor’s documentation for this capability.

  • For a back-end IP address, ensure that an IP address for an OKD control plane node does not change during the lifetime of the external load balancer. You can achieve this by completing one of the following actions:

    • Assign a static IP address to each control plane node.

    • Configure each node to receive the same IP address from the DHCP every time the node requests a DHCP lease. Depending on the vendor, the DHCP lease might be in the form of an IP reservation or a static DHCP assignment.

  • Manually define each node that runs the Ingress Controller in the external load balancer for the Ingress Controller back-end service. For example, if the Ingress Controller moves to an undefined node, a connection outage can occur.

OpenShift API prerequisites
  • You defined a front-end IP address.

  • TCP ports 6443 and 22623 are exposed on the front-end IP address of your load balancer. Check the following items:

    • Port 6443 provides access to the OpenShift API service.

    • Port 22623 can provide ignition startup configurations to nodes.

  • The front-end IP address and port 6443 are reachable by all users of your system with a location external to your OKD cluster.

  • The front-end IP address and port 22623 are reachable only by OKD nodes.

  • The load balancer backend can communicate with OKD control plane nodes on port 6443 and 22623.

Ingress Controller prerequisites
  • You defined a front-end IP address.

  • TCP ports 443 and 80 are exposed on the front-end IP address of your load balancer.

  • The front-end IP address, port 80 and port 443 are be reachable by all users of your system with a location external to your OKD cluster.

  • The front-end IP address, port 80 and port 443 are reachable to all nodes that operate in your OKD cluster.

  • The load balancer backend can communicate with OKD nodes that run the Ingress Controller on ports 80, 443, and 1936.

Prerequisite for health check URL specifications

You can configure most load balancers by setting health check URLs that determine if a service is available or unavailable. OKD provides these health checks for the OpenShift API, Machine Configuration API, and Ingress Controller backend services.

The following examples demonstrate health check specifications for the previously listed backend services:

Example of a Kubernetes API health check specification
Path: HTTPS:6443/readyz
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 10
Interval: 10
Example of a Machine Config API health check specification
Path: HTTPS:22623/healthz
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 10
Interval: 10
Example of an Ingress Controller health check specification
Path: HTTP:1936/healthz/ready
Healthy threshold: 2
Unhealthy threshold: 2
Timeout: 5
Interval: 10
Procedure
  1. Configure the HAProxy Ingress Controller, so that you can enable access to the cluster from your load balancer on ports 6443, 443, and 80:

    Example HAProxy configuration
    #...
    listen my-cluster-api-6443
        bind 192.168.1.100:6443
        mode tcp
        balance roundrobin
      option httpchk
      http-check connect
      http-check send meth GET uri /readyz
      http-check expect status 200
        server my-cluster-master-2 192.168.1.101:6443 check inter 10s rise 2 fall 2
        server my-cluster-master-0 192.168.1.102:6443 check inter 10s rise 2 fall 2
        server my-cluster-master-1 192.168.1.103:6443 check inter 10s rise 2 fall 2
    
    listen my-cluster-machine-config-api-22623
        bind 192.168.1.1000.0.0.0:22623
        mode tcp
        balance roundrobin
      option httpchk
      http-check connect
      http-check send meth GET uri /healthz
      http-check expect status 200
        server my-cluster-master-2 192.0168.21.2101:22623 check inter 10s rise 2 fall 2
        server my-cluster-master-0 192.168.1.1020.2.3:22623 check inter 10s rise 2 fall 2
        server my-cluster-master-1 192.168.1.1030.2.1:22623 check inter 10s rise 2 fall 2
    
    listen my-cluster-apps-443
            bind 192.168.1.100:443
            mode tcp
            balance roundrobin
        option httpchk
        http-check connect
        http-check send meth GET uri /healthz/ready
        http-check expect status 200
            server my-cluster-worker-0 192.168.1.111:443 check port 1936 inter 10s rise 2 fall 2
            server my-cluster-worker-1 192.168.1.112:443 check port 1936 inter 10s rise 2 fall 2
            server my-cluster-worker-2 192.168.1.113:443 check port 1936 inter 10s rise 2 fall 2
    
    listen my-cluster-apps-80
            bind 192.168.1.100:80
            mode tcp
            balance roundrobin
        option httpchk
        http-check connect
        http-check send meth GET uri /healthz/ready
        http-check expect status 200
            server my-cluster-worker-0 192.168.1.111:80 check port 1936 inter 10s rise 2 fall 2
            server my-cluster-worker-1 192.168.1.112:80 check port 1936 inter 10s rise 2 fall 2
            server my-cluster-worker-2 192.168.1.113:80 check port 1936 inter 10s rise 2 fall 2
    # ...
  2. Use the curl CLI command to verify that the external load balancer and its resources are operational:

    1. Verify that the cluster machine configuration API is accessible to the Kubernetes API server resource, by running the following command and observing the response:

      $ curl https://<loadbalancer_ip_address>:6443/version --insecure

      If the configuration is correct, you receive a JSON object in response:

      {
        "major": "1",
        "minor": "11+",
        "gitVersion": "v1.11.0+ad103ed",
        "gitCommit": "ad103ed",
        "gitTreeState": "clean",
        "buildDate": "2019-01-09T06:44:10Z",
        "goVersion": "go1.10.3",
        "compiler": "gc",
        "platform": "linux/amd64"
      }
    2. Verify that the cluster machine configuration API is accessible to the Machine config server resource, by running the following command and observing the output:

      $ curl -v https://<loadbalancer_ip_address>:22623/healthz --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      Content-Length: 0
    3. Verify that the controller is accessible to the Ingress Controller resource on port 80, by running the following command and observing the output:

      $ curl -I -L -H "Host: console-openshift-console.apps.<cluster_name>.<base_domain>" http://<load_balancer_front_end_IP_address>

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 302 Found
      content-length: 0
      location: https://console-openshift-console.apps.ocp4.private.opequon.net/
      cache-control: no-cache
    4. Verify that the controller is accessible to the Ingress Controller resource on port 443, by running the following command and observing the output:

      $ curl -I -L --insecure --resolve console-openshift-console.apps.<cluster_name>.<base_domain>:443:<Load Balancer Front End IP Address> https://console-openshift-console.apps.<cluster_name>.<base_domain>

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=UlYWOyQ62LWjw2h003xtYSKlh1a0Py2hhctw0WmV2YEdhJjFyQwWcGBsja261dGLgaYO0nxzVErhiXt6QepA7g==; Path=/; Secure; SameSite=Lax
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Wed, 04 Oct 2023 16:29:38 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=1bf5e9573c9a2760c964ed1659cc1673; path=/; HttpOnly; Secure; SameSite=None
      cache-control: private
  3. Configure the DNS records for your cluster to target the front-end IP addresses of the external load balancer. You must update records to your DNS server for the cluster API and applications over the load balancer.

    Examples of modified DNS records
    <load_balancer_ip_address>  A  api.<cluster_name>.<base_domain>
    A record pointing to Load Balancer Front End
    <load_balancer_ip_address>   A apps.<cluster_name>.<base_domain>
    A record pointing to Load Balancer Front End

    DNS propagation might take some time for each DNS record to become available. Ensure that each DNS record propagates before validating each record.

  4. Use the curl CLI command to verify that the external load balancer and DNS record configuration are operational:

    1. Verify that you can access the cluster API, by running the following command and observing the output:

      $ curl https://api.<cluster_name>.<base_domain>:6443/version --insecure

      If the configuration is correct, you receive a JSON object in response:

      {
        "major": "1",
        "minor": "11+",
        "gitVersion": "v1.11.0+ad103ed",
        "gitCommit": "ad103ed",
        "gitTreeState": "clean",
        "buildDate": "2019-01-09T06:44:10Z",
        "goVersion": "go1.10.3",
        "compiler": "gc",
        "platform": "linux/amd64"
        }
    2. Verify that you can access the cluster machine configuration, by running the following command and observing the output:

      $ curl -v https://api.<cluster_name>.<base_domain>:22623/healthz --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      Content-Length: 0
    3. Verify that you can access each cluster application on port, by running the following command and observing the output:

      $ curl http://console-openshift-console.apps.<cluster_name>.<base_domain> -I -L --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 302 Found
      content-length: 0
      location: https://console-openshift-console.apps.<cluster-name>.<base domain>/
      cache-control: no-cacheHTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=39HoZgztDnzjJkq/JuLJMeoKNXlfiVv2YgZc09c3TBOBU4NI6kDXaJH1LdicNhN1UsQWzon4Dor9GWGfopaTEQ==; Path=/; Secure
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Tue, 17 Nov 2020 08:42:10 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=9b714eb87e93cf34853e87a92d6894be; path=/; HttpOnly; Secure; SameSite=None
      cache-control: private
    4. Verify that you can access each cluster application on port 443, by running the following command and observing the output:

      $ curl https://console-openshift-console.apps.<cluster_name>.<base_domain> -I -L --insecure

      If the configuration is correct, the output from the command shows the following response:

      HTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=UlYWOyQ62LWjw2h003xtYSKlh1a0Py2hhctw0WmV2YEdhJjFyQwWcGBsja261dGLgaYO0nxzVErhiXt6QepA7g==; Path=/; Secure; SameSite=Lax
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Wed, 04 Oct 2023 16:29:38 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=1bf5e9573c9a2760c964ed1659cc1673; path=/; HttpOnly; Secure; SameSite=None
      cache-control: private

Next steps