$ oc get csr | grep Pending
When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. Consider this use case in situations where you need to recover your cluster from a degraded state, perform deep-level debugging, or ensure stability and security of the control planes in complex scenarios.
Red Hat supports a cluster that has 4 or 5 control plane nodes only on bare-metal infrastructure. |
When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. The example in the procedure uses node-5
as the new control plane node.
You have installed a healthy cluster with at least three control plane nodes.
You have created a single control plane node that you intend to add to your cluster as a postinstalltion task.
Retrieve pending Certificate Signing Requests (CSRs) for the new control plane node by entering the following command:
$ oc get csr | grep Pending
Approve all pending CSRs for the control plane node by entering the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
You must approve the CSRs to complete the installation. |
Confirm that the control plane node is in the Ready
status by entering the following command:
$ oc get nodes
On installer-provisioned infrastructure, the etcd Operator relies on the Machine API to manage the control plane and ensure etcd quorum. The Machine API then uses |
Create the BareMetalHost
and Machine
CRs and link them to the Node
CR of the control plane node.
Create the BareMetalHost
CR with a unique .metadata.name
value as demonstrated in the following example:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-5
namespace: openshift-machine-api
spec:
automatedCleaningMode: metadata
bootMACAddress: 00:00:00:00:00:02
bootMode: UEFI
customDeploy:
method: install_coreos
externallyProvisioned: true
online: true
userData:
name: master-user-data-managed
namespace: openshift-machine-api
# ...
Apply the BareMetalHost
CR by entering the following command:
$ oc apply -f <filename> (1)
1 | Replace <filename> with the name of the BareMetalHost CR. |
Create the Machine
CR by using the unique .metadata.name
value as demonstrated in the following example:
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
annotations:
machine.openshift.io/instance-state: externally provisioned
metal3.io/BareMetalHost: openshift-machine-api/node-5
finalizers:
- machine.machine.openshift.io
labels:
machine.openshift.io/cluster-api-cluster: <cluster_name> (1)
machine.openshift.io/cluster-api-machine-role: master
machine.openshift.io/cluster-api-machine-type: master
name: node-5
namespace: openshift-machine-api
spec:
metadata: {}
providerSpec:
value:
apiVersion: baremetal.cluster.k8s.io/v1alpha1
customDeploy:
method: install_coreos
hostSelector: {}
image:
checksum: ""
url: ""
kind: BareMetalMachineProviderSpec
metadata:
creationTimestamp: null
userData:
name: master-user-data-managed
# ...
1 | Replace <cluster_name> with the name of the specific cluster, for example, test-day2-1-6qv96 . |
Get the cluster name by running the following command:
$ oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}'
Apply the Machine
CR by entering the following command:
$ oc apply -f <filename> (1)
1 | Replace <filename> with the name of the Machine CR. |
Link BareMetalHost
, Machine
, and Node
objects by running the link-machine-and-node.sh
script:
Copy the following link-machine-and-node.sh
script to a local machine:
#!/bin/bash
# Credit goes to
# https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
# This script will link Machine object
# and Node object. This is needed
# in order to have IP address of
# the Node present in the status of the Machine.
set -e
machine="$1"
node="$2"
if [ -z "$machine" ] || [ -z "$node" ]; then
echo "Usage: $0 MACHINE NODE"
exit 1
fi
node_name=$(echo "${node}" | cut -f2 -d':')
oc proxy &
proxy_pid=$!
function kill_proxy {
kill $proxy_pid
}
trap kill_proxy EXIT SIGINT
HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts"
function print_nics() {
local ips
local eob
declare -a ips
readarray -t ips < <(echo "${1}" \
| jq '.[] | select(. | .type == "InternalIP") | .address' \
| sed 's/"//g')
eob=','
for (( i=0; i<${#ips[@]}; i++ )); do
if [ $((i+1)) -eq ${#ips[@]} ]; then
eob=""
fi
cat <<- EOF
{
"ip": "${ips[$i]}",
"mac": "00:00:00:00:00:00",
"model": "unknown",
"speedGbps": 10,
"vlanId": 0,
"pxe": true,
"name": "eth1"
}${eob}
EOF
done
}
function wait_for_json() {
local name
local url
local curl_opts
local timeout
local start_time
local curr_time
local time_diff
name="$1"
url="$2"
timeout="$3"
shift 3
curl_opts="$@"
echo -n "Waiting for $name to respond"
start_time=$(date +%s)
until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do
echo -n "."
curr_time=$(date +%s)
time_diff=$((curr_time - start_time))
if [[ $time_diff -gt $timeout ]]; then
printf '\nTimed out waiting for %s' "${name}"
return 1
fi
sleep 5
done
echo " Success!"
return 0
}
wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json"
addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses')
machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}")
host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g')
if [ -z "$host" ]; then
echo "Machine $machine is not linked to a host yet." 1>&2
exit 1
fi
# The address structure on the host doesn't match the node, so extract
# the values we want into separate variables so we can build the patch
# we need.
hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g')
set +e
read -r -d '' host_patch << EOF
{
"status": {
"hardware": {
"hostname": "${hostname}",
"nics": [
$(print_nics "${addresses}")
],
"systemVendor": {
"manufacturer": "Red Hat",
"productName": "product name",
"serialNumber": ""
},
"firmware": {
"bios": {
"date": "04/01/2014",
"vendor": "SeaBIOS",
"version": "1.11.0-2.el7"
}
},
"ramMebibytes": 0,
"storage": [],
"cpu": {
"arch": "x86_64",
"model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
"clockMegahertz": 2199.998,
"count": 4,
"flags": []
}
}
}
}
EOF
set -e
echo "PATCHING HOST"
echo "${host_patch}" | jq .
curl -s \
-X PATCH \
"${HOST_PROXY_API_PATH}/${host}/status" \
-H "Content-type: application/merge-patch+json" \
-d "${host_patch}"
oc get baremetalhost -n openshift-machine-api -o yaml "${host}"
Make the script executable by entering the following command:
$ chmod +x link-machine-and-node.sh
Run the script by entering the following command:
$ bash link-machine-and-node.sh node-5 node-5
The first |
Confirm members of etcd by executing into one of the pre-existing control plane nodes:
Open a remote shell session to the control plane node by entering the following command:
$ oc rsh -n openshift-etcd etcd-node-0
List etcd members:
# etcdctl member list -w table
Check the etcd Operator configuration process until completion by entering the following command. Expected output shows False
under the PROGRESSING
column.
$ oc get clusteroperator etcd
Confirm etcd health by running the following commands:
Open a remote shell session to the control plane node:
$ oc rsh -n openshift-etcd etcd-node-0
Check endpoint health. Expected output shows is healthy
for the endpoint.
# etcdctl endpoint health
Verify that all nodes are ready by entering the following command. The expected output shows the Ready
status beside each node entry.
$ oc get nodes
Verify that the cluster Operators are all available by entering the following command. Expected output lists each Operator and shows the available status as True
beside each listed Operator.
$ oc get ClusterOperators
Verify that the cluster version is correct by entering the following command:
$ oc get ClusterVersion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version OKD.5 True False 5h57m Cluster version is OKD.5