×

When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. Consider this use case in situations where you need to recover your cluster from a degraded state, perform deep-level debugging, or ensure stability and security of the control planes in complex scenarios.

Red Hat supports a cluster that has 4 or 5 control plane nodes only on bare-metal infrastructure.

Adding a control plane node to your cluster

When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. The example in the procedure uses node-5 as the new control plane node.

Prerequisites
  • You have installed a healthy cluster with at least three control plane nodes.

  • You have created a single control plane node that you intend to add to your cluster as a postinstalltion task.

Procedure
  1. Retrieve pending Certificate Signing Requests (CSRs) for the new control plane node by entering the following command:

    $ oc get csr | grep Pending
  2. Approve all pending CSRs for the control plane node by entering the following command:

    $ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

    You must approve the CSRs to complete the installation.

  3. Confirm that the control plane node is in the Ready status by entering the following command:

    $ oc get nodes

    On installer-provisioned infrastructure, the etcd Operator relies on the Machine API to manage the control plane and ensure etcd quorum. The Machine API then uses Machine CRs to represent and manage the underlying control plane nodes.

  4. Create the BareMetalHost and Machine CRs and link them to the Node CR of the control plane node.

    1. Create the BareMetalHost CR with a unique .metadata.name value as demonstrated in the following example:

      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        name: node-5
        namespace: openshift-machine-api
      spec:
        automatedCleaningMode: metadata
        bootMACAddress: 00:00:00:00:00:02
        bootMode: UEFI
        customDeploy:
          method: install_coreos
        externallyProvisioned: true
        online: true
        userData:
          name: master-user-data-managed
          namespace: openshift-machine-api
      # ...
    2. Apply the BareMetalHost CR by entering the following command:

      $ oc apply -f <filename> (1)
      1 Replace <filename> with the name of the BareMetalHost CR.
    3. Create the Machine CR by using the unique .metadata.name value as demonstrated in the following example:

      apiVersion: machine.openshift.io/v1beta1
      kind: Machine
      metadata:
        annotations:
          machine.openshift.io/instance-state: externally provisioned
          metal3.io/BareMetalHost: openshift-machine-api/node-5
        finalizers:
        - machine.machine.openshift.io
        labels:
          machine.openshift.io/cluster-api-cluster: <cluster_name> (1)
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
        name: node-5
        namespace: openshift-machine-api
      spec:
        metadata: {}
        providerSpec:
          value:
            apiVersion: baremetal.cluster.k8s.io/v1alpha1
            customDeploy:
              method: install_coreos
            hostSelector: {}
            image:
              checksum: ""
              url: ""
            kind: BareMetalMachineProviderSpec
            metadata:
              creationTimestamp: null
            userData:
              name: master-user-data-managed
      # ...
      1 Replace <cluster_name> with the name of the specific cluster, for example, test-day2-1-6qv96.
    4. Get the cluster name by running the following command:

      $ oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}'
    5. Apply the Machine CR by entering the following command:

      $ oc apply -f <filename> (1)
      1 Replace <filename> with the name of the Machine CR.
    6. Link BareMetalHost, Machine, and Node objects by running the link-machine-and-node.sh script:

      1. Copy the following link-machine-and-node.sh script to a local machine:

        #!/bin/bash
        
        # Credit goes to
        # https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
        # This script will link Machine object
        # and Node object. This is needed
        # in order to have IP address of
        # the Node present in the status of the Machine.
        
        set -e
        
        machine="$1"
        node="$2"
        
        if [ -z "$machine" ] || [ -z "$node" ]; then
            echo "Usage: $0 MACHINE NODE"
            exit 1
        fi
        
        node_name=$(echo "${node}" | cut -f2 -d':')
        
        oc proxy &
        proxy_pid=$!
        function kill_proxy {
            kill $proxy_pid
        }
        trap kill_proxy EXIT SIGINT
        
        HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts"
        
        function print_nics() {
            local ips
            local eob
            declare -a ips
        
            readarray -t ips < <(echo "${1}" \
                                 | jq '.[] | select(. | .type == "InternalIP") | .address' \
                                 | sed 's/"//g')
        
            eob=','
            for (( i=0; i<${#ips[@]}; i++ )); do
                if [ $((i+1)) -eq ${#ips[@]} ]; then
                    eob=""
                fi
                cat <<- EOF
                  {
                    "ip": "${ips[$i]}",
                    "mac": "00:00:00:00:00:00",
                    "model": "unknown",
                    "speedGbps": 10,
                    "vlanId": 0,
                    "pxe": true,
                    "name": "eth1"
                  }${eob}
        EOF
            done
        }
        
        function wait_for_json() {
            local name
            local url
            local curl_opts
            local timeout
        
            local start_time
            local curr_time
            local time_diff
        
            name="$1"
            url="$2"
            timeout="$3"
            shift 3
            curl_opts="$@"
            echo -n "Waiting for $name to respond"
            start_time=$(date +%s)
            until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do
                echo -n "."
                curr_time=$(date +%s)
                time_diff=$((curr_time - start_time))
                if [[ $time_diff -gt $timeout ]]; then
                    printf '\nTimed out waiting for %s' "${name}"
                    return 1
                fi
                sleep 5
            done
            echo " Success!"
            return 0
        }
        wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json"
        
        addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses')
        
        machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}")
        host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g')
        
        if [ -z "$host" ]; then
            echo "Machine $machine is not linked to a host yet." 1>&2
            exit 1
        fi
        
        # The address structure on the host doesn't match the node, so extract
        # the values we want into separate variables so we can build the patch
        # we need.
        hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g')
        
        set +e
        read -r -d '' host_patch << EOF
        {
          "status": {
            "hardware": {
              "hostname": "${hostname}",
              "nics": [
        $(print_nics "${addresses}")
              ],
              "systemVendor": {
                "manufacturer": "Red Hat",
                "productName": "product name",
                "serialNumber": ""
              },
              "firmware": {
                "bios": {
                  "date": "04/01/2014",
                  "vendor": "SeaBIOS",
                  "version": "1.11.0-2.el7"
                }
              },
              "ramMebibytes": 0,
              "storage": [],
              "cpu": {
                "arch": "x86_64",
                "model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
                "clockMegahertz": 2199.998,
                "count": 4,
                "flags": []
              }
            }
          }
        }
        EOF
        set -e
        
        echo "PATCHING HOST"
        echo "${host_patch}" | jq .
        
        curl -s \
             -X PATCH \
             "${HOST_PROXY_API_PATH}/${host}/status" \
             -H "Content-type: application/merge-patch+json" \
             -d "${host_patch}"
        
        oc get baremetalhost -n openshift-machine-api -o yaml "${host}"
      2. Make the script executable by entering the following command:

        $ chmod +x link-machine-and-node.sh
      3. Run the script by entering the following command:

        $ bash link-machine-and-node.sh node-5 node-5

        The first node-5 instance represents the machine, and the second instance represents the node.

Verification
  1. Confirm members of etcd by executing into one of the pre-existing control plane nodes:

    1. Open a remote shell session to the control plane node by entering the following command:

      $ oc rsh -n openshift-etcd etcd-node-0
    2. List etcd members:

      # etcdctl member list -w table
  2. Check the etcd Operator configuration process until completion by entering the following command. Expected output shows False under the PROGRESSING column.

    $ oc get clusteroperator etcd
  3. Confirm etcd health by running the following commands:

    1. Open a remote shell session to the control plane node:

      $ oc rsh -n openshift-etcd etcd-node-0
    2. Check endpoint health. Expected output shows is healthy for the endpoint.

      # etcdctl endpoint health
  4. Verify that all nodes are ready by entering the following command. The expected output shows the Ready status beside each node entry.

    $ oc get nodes
  5. Verify that the cluster Operators are all available by entering the following command. Expected output lists each Operator and shows the available status as True beside each listed Operator.

    $ oc get ClusterOperators
  6. Verify that the cluster version is correct by entering the following command:

    $ oc get ClusterVersion
    Example output
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   OKD.5    True        False         5h57m   Cluster version is OKD.5