$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get nodes
After you deploy hosted control planes on bare metal, you can manage a hosted cluster by completing the following tasks.
You can access the hosted cluster by either getting the kubeconfig
file and kubeadmin
credential directly from resources, or by using the hcp
command line interface to generate a kubeconfig
file.
To access the hosted cluster by getting the kubeconfig
file and credentials directly from resources, you must be familiar with the access secrets for hosted clusters. The hosted cluster (hosting) namespace contains hosted cluster resources and the access secrets. The hosted control plane namespace is where the hosted control plane runs.
The secret name formats are as follows:
kubeconfig
secret: <hosted_cluster_namespace>-<name>-admin-kubeconfig
. For example, clusters-hypershift-demo-admin-kubeconfig
.
kubeadmin
password secret: <hosted_cluster_namespace>-<name>-kubeadmin-password
. For example, clusters-hypershift-demo-kubeadmin-password
.
The kubeconfig
secret contains a Base64-encoded kubeconfig
field, which you can decode and save into a file to use with the following command:
$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get nodes
The kubeadmin
password secret is also Base64-encoded. You can decode it and use the password to log in to the API server or console of the hosted cluster.
To access the hosted cluster by using the hcp
CLI to generate the kubeconfig
file, take the following steps:
Generate the kubeconfig
file by entering the following command:
$ hcp create kubeconfig --namespace <hosted_cluster_namespace> --name <hosted_cluster_name> > <hosted_cluster_name>.kubeconfig
After you save the kubeconfig
file, you can access the hosted cluster by entering the following example command:
$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get nodes
You can scale up the NodePool
object by adding nodes to your hosted cluster. When you scale a node pool, consider the following information:
When you scale a replica by the node pool, a machine is created. For every machine, the Cluster API provider finds and installs an Agent that meets the requirements that are specified in the node pool specification. You can monitor the installation of an Agent by checking its status and conditions.
When you scale down a node pool, Agents are unbound from the corresponding cluster. Before you can reuse the Agents, you must restart them by using the Discovery image.
Scale the NodePool
object to two nodes:
$ oc -n <hosted_cluster_namespace> scale nodepool <nodepool_name> --replicas 2
The Cluster API agent provider randomly picks two agents that are then assigned to the hosted cluster. Those agents go through different states and finally join the hosted cluster as OKD nodes. The agents pass through states in the following order:
binding
discovering
insufficient
installing
installing-in-progress
added-to-existing-cluster
Enter the following command:
$ oc -n <hosted_control_plane_namespace> get agent
NAME CLUSTER APPROVED ROLE STAGE
4dac1ab2-7dd5-4894-a220-6a3473b67ee6 hypercluster1 true auto-assign
d9198891-39f4-4930-a679-65fb142b108b true auto-assign
da503cf1-a347-44f2-875c-4960ddb04091 hypercluster1 true auto-assign
Enter the following command:
$ oc -n <hosted_control_plane_namespace> get agent -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state}{"\n"}{end}'
BMH: ocp-worker-2 Agent: 4dac1ab2-7dd5-4894-a220-6a3473b67ee6 State: binding
BMH: ocp-worker-0 Agent: d9198891-39f4-4930-a679-65fb142b108b State: known-unbound
BMH: ocp-worker-1 Agent: da503cf1-a347-44f2-875c-4960ddb04091 State: insufficient
Obtain the kubeconfig for your new hosted cluster by entering the extract command:
$ oc extract -n <hosted_cluster_namespace> secret/<hosted_cluster_name>-admin-kubeconfig --to=- > kubeconfig-<hosted_cluster_name>
After the agents reach the added-to-existing-cluster
state, verify that you can see the OKD nodes in the hosted cluster by entering the following command:
$ oc --kubeconfig kubeconfig-<hosted_cluster_name> get nodes
NAME STATUS ROLES AGE VERSION
ocp-worker-1 Ready worker 5m41s v1.24.0+3882f8f
ocp-worker-2 Ready worker 6m3s v1.24.0+3882f8f
Cluster Operators start to reconcile by adding workloads to the nodes.
Enter the following command to verify that two machines were created when you scaled up the NodePool
object:
$ oc -n <hosted_control_plane_namespace> get machines
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
hypercluster1-c96b6f675-m5vch hypercluster1-b2qhl ocp-worker-1 agent://da503cf1-a347-44f2-875c-4960ddb04091 Running 15m 4.x.z
hypercluster1-c96b6f675-tl42p hypercluster1-b2qhl ocp-worker-2 agent://4dac1ab2-7dd5-4894-a220-6a3473b67ee6 Running 15m 4.x.z
The clusterversion
reconcile process eventually reaches a point where only Ingress and Console cluster operators are missing.
Enter the following command:
$ oc --kubeconfig kubeconfig-<hosted_cluster_name> get clusterversion,co
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
clusterversion.config.openshift.io/version False True 40m Unable to apply 4.x.z: the cluster operator console has not yet successfully rolled out
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
clusteroperator.config.openshift.io/console 4.12z False False False 11m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hypercluster1.domain.com): Get "https://console-openshift-console.apps.hypercluster1.domain.com": dial tcp 10.19.3.29:443: connect: connection refused
clusteroperator.config.openshift.io/csi-snapshot-controller 4.12z True False False 10m
clusteroperator.config.openshift.io/dns 4.12z True False False 9m16s
You can create node pools for a hosted cluster by specifying a name, number of replicas, and any additional information, such as an agent label selector.
To create a node pool, enter the following information:
$ hcp create nodepool agent \
--cluster-name <hosted_cluster_name> \(1)
--name <nodepool_name> \(2)
--node-count <worker_node_count> \(3)
--agentLabelSelector '{"matchLabels": {"size": "medium"}}' (4)
1 | Replace <hosted_cluster_name> with your hosted cluster name. |
2 | Replace <nodepool_name> with the name of your node pool, for example, <hosted_cluster_name>-extra-cpu . |
3 | Replace <worker_node_count> with the worker node count, for example, 2 . |
4 | The --agentLabelSelector flag is optional. The node pool uses agents with the "size" : "medium" label. |
Check the status of the node pool by listing nodepool
resources in the clusters
namespace:
$ oc get nodepools --namespace clusters
Extract the admin-kubeconfig
secret by entering the following command:
$ oc extract -n <hosted_control_plane_namespace> secret/admin-kubeconfig --to=./hostedcluster-secrets --confirm
hostedcluster-secrets/kubeconfig
After some time, you can check the status of the node pool by entering the following command:
$ oc --kubeconfig ./hostedcluster-secrets get nodes
Verify that the number of available node pools match the number of expected node pools by entering this command:
$ oc get nodepools --namespace clusters
When you need more capacity in your hosted cluster and spare agents are available, you can enable auto-scaling to install new worker nodes.
To enable auto-scaling, enter the following command:
$ oc -n <hosted_cluster_namespace> patch nodepool <hosted_cluster_name> --type=json -p '[{"op": "remove", "path": "/spec/replicas"},{"op":"add", "path": "/spec/autoScaling", "value": { "max": 5, "min": 2 }}]'
In the example, the minimum number of nodes is 2, and the maximum is 5. The maximum number of nodes that you can add might be bound by your platform. For example, if you use the Agent platform, the maximum number of nodes is bound by the number of available agents. |
Create a workload that requires a new node.
Create a YAML file that contains the workload configuration, by using the following example:
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: reversewords
name: reversewords
namespace: default
spec:
replicas: 40
selector:
matchLabels:
app: reversewords
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: reversewords
spec:
containers:
- image: quay.io/mavazque/reversewords:latest
name: reversewords
resources:
requests:
memory: 2Gi
status: {}
Save the file as workload-config.yaml
.
Apply the YAML by entering the following command:
$ oc apply -f workload-config.yaml
Extract the admin-kubeconfig
secret by entering the following command:
$ oc extract -n <hosted_cluster_namespace> secret/<hosted_cluster_name>-admin-kubeconfig --to=./hostedcluster-secrets --confirm
hostedcluster-secrets/kubeconfig
You can check if new nodes are in the Ready
status by entering the following command:
$ oc --kubeconfig ./hostedcluster-secrets get nodes
To remove the node, delete the workload by entering the following command:
$ oc --kubeconfig ./hostedcluster-secrets -n <namespace> delete deployment <deployment_name>
Wait for several minutes to pass without requiring the additional capacity. On the Agent platform, the agent is decommissioned and can be reused. You can confirm that the node was removed by entering the following command:
$ oc --kubeconfig ./hostedcluster-secrets get nodes
For IBM Z agents, compute nodes are detached from the cluster only for IBM Z with KVM agents. For z/VM and LPAR, you must delete the compute nodes manually. Agents can be reused only for IBM Z with KVM. For z/VM and LPAR, re-create the agents to use them as compute nodes. |
To disable node auto-scaling, complete the following procedure.
Enter the following command to disable node auto-scaling for the hosted cluster:
$ oc -n <hosted_cluster_namespace> patch nodepool <hosted_cluster_name> --type=json -p '[\{"op":"remove", "path": "/spec/autoScaling"}, \{"op": "add", "path": "/spec/replicas", "value": <specify_value_to_scale_replicas>]'
The command removes "spec.autoScaling"
from the YAML file, adds "spec.replicas"
, and sets "spec.replicas"
to the integer value that you specify.
Every OKD cluster has a default application Ingress Controller that typically has an external DNS record associated with it. For example, if you create a hosted cluster named example
with the base domain krnl.es
, you can expect the wildcard domain *.apps.example.krnl.es
to be routable.
To set up a load balancer and wildcard DNS record for the *.apps
domain, perform the following actions on your guest cluster:
Deploy MetalLB by creating a YAML file that contains the configuration for the MetalLB Operator:
apiVersion: v1
kind: Namespace
metadata:
name: metallb
labels:
openshift.io/cluster-monitoring: "true"
annotations:
workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: metallb-operator-operatorgroup
namespace: metallb
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: metallb-operator
namespace: metallb
spec:
channel: "stable"
name: metallb-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
Save the file as metallb-operator-config.yaml
.
Enter the following command to apply the configuration:
$ oc apply -f metallb-operator-config.yaml
After the Operator is running, create the MetalLB instance:
Create a YAML file that contains the configuration for the MetalLB instance:
apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
name: metallb
namespace: metallb
Save the file as metallb-instance-config.yaml
.
Create the MetalLB instance by entering this command:
$ oc apply -f metallb-instance-config.yaml
Configure the MetalLB Operator by creating two resources:
An IPAddressPool
resource with a single IP address. This IP address must be on the same subnet as the network that the cluster nodes use.
A BGPAdvertisement
resource to advertise the load balancer IP addresses that the IPAddressPool
resource provides through the BGP protocol.
Create a YAML file to contain the configuration:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: <ip_address_pool_name> (1)
namespace: metallb
spec:
protocol: layer2
autoAssign: false
addresses:
- <ingress_ip>-<ingress_ip> (2)
---
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: <bgp_advertisement_name> (3)
namespace: metallb
spec:
ipAddressPools:
- <ip_address_pool_name> (1)
1 | Specify the IPAddressPool resource name. |
2 | Specify the IP address for your environment, for example, 192.168.122.23 . |
3 | Specify the BGPAdvertisement resource name. |
Save the file as ipaddresspool-bgpadvertisement-config.yaml
.
Create the resources by entering the following command:
$ oc apply -f ipaddresspool-bgpadvertisement-config.yaml
After creating a service of the LoadBalancer
type, MetalLB adds an external IP address for the service.
Configure a new load balancer service that routes ingress traffic to the ingress deployment by creating a YAML file named metallb-loadbalancer-service.yaml
:
kind: Service
apiVersion: v1
metadata:
annotations:
metallb.universe.tf/address-pool: ingress-public-ip
name: metallb-ingress
namespace: openshift-ingress
spec:
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443
selector:
ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
type: LoadBalancer
Save the metallb-loadbalancer-service.yaml
file.
Enter the following command to apply the YAML configuration:
$ oc apply -f metallb-loadbalancer-service.yaml
Enter the following command to reach the OKD console:
$ curl -kI https://console-openshift-console.apps.example.krnl.es
HTTP/1.1 200 OK
Check the clusterversion
and clusteroperator
values to verify that everything is running. Enter the following command:
$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get clusterversion,co
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
clusterversion.config.openshift.io/version 4.x.y True False 3m32s Cluster version is 4.x.y
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
clusteroperator.config.openshift.io/console 4.x.y True False False 3m50s
clusteroperator.config.openshift.io/ingress 4.x.y True False False 53m
Replace <4.x.y>
with the supported OKD version that you want to use, for example, 4.17.0-multi
.
You can enable machine health checks on bare metal to repair and replace unhealthy managed cluster nodes automatically. You must have additional agent machines that are ready to install in the managed cluster.
Consider the following limitations before enabling machine health checks:
You cannot modify the MachineHealthCheck
object.
Machine health checks replace nodes only when at least two nodes stay in the False
or Unknown
status for more than 8 minutes.
After you enable machine health checks for the managed cluster nodes, the MachineHealthCheck
object is created in your hosted cluster.
To enable machine health checks in your hosted cluster, modify the NodePool
resource. Complete the following steps:
Verify that the spec.nodeDrainTimeout
value in your NodePool
resource is greater than 0s
. Replace <hosted_cluster_namespace>
with the name of your hosted cluster namespace and <nodepool_name>
with the node pool name. Run the following command:
$ oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -o yaml | grep nodeDrainTimeout
nodeDrainTimeout: 30s
If the spec.nodeDrainTimeout
value is not greater than 0s
, modify the value by running the following command:
$ oc patch nodepool -n <hosted_cluster_namespace> <nodepool_name> -p '{"spec":{"nodeDrainTimeout": "30m"}}' --type=merge
Enable machine health checks by setting the spec.management.autoRepair
field to true
in the NodePool
resource. Run the following command:
$ oc patch nodepool -n <hosted_cluster_namespace> <nodepool_name> -p '{"spec": {"management": {"autoRepair":true}}}' --type=merge
Verify that the NodePool
resource is updated with the autoRepair: true
value by running the following command:
$ oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -o yaml | grep autoRepair
To disable machine health checks for the managed cluster nodes, modify the NodePool
resource.
Disable machine health checks by setting the spec.management.autoRepair
field to false
in the NodePool
resource. Run the following command:
$ oc patch nodepool -n <hosted_cluster_namespace> <nodepool_name> -p '{"spec": {"management": {"autoRepair":false}}}' --type=merge
Verify that the NodePool
resource is updated with the autoRepair: false
value by running the following command:
$ oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -o yaml | grep autoRepair