$ oc label node/worker-1a node/worker-1b topology.kubernetes.io/zone=rack1
Before you get started with hosted control planes for OKD, you must properly label nodes so that the pods of hosted clusters can be scheduled into infrastructure nodes. Node labeling is also important for the following reasons:
To ensure high availability and proper workload deployment. For example, to avoid having the control plane workload count toward your OKD subscription, you can set the node-role.kubernetes.io/infra
label.
To ensure that control plane workloads are separate from other workloads in the management cluster.
To ensure that control plane workloads are configured at the correct multi-tenancy distribution level for your deployment. The distribution levels are as follows:
Everything shared: Control planes for hosted clusters can run on any node that is designated for control planes.
Request serving isolation: Serving pods are requested in their own dedicated nodes.
Nothing shared: Every control plane has its own dedicated nodes.
For more information about dedicating a node to a single hosted cluster, see "Labeling management cluster nodes".
Do not use the management cluster for your workload. Workloads must not run on nodes where control planes run. |
Proper node labeling is a prerequisite to deploying hosted control planes.
As a management cluster administrator, you use the following labels and taints in management cluster nodes to schedule a control plane workload:
hypershift.openshift.io/control-plane: true
: Use this label and taint to dedicate a node to running hosted control plane workloads. By setting a value of true
, you avoid sharing the control plane nodes with other components, for example, the infrastructure components of the management cluster or any other mistakenly deployed workload.
hypershift.openshift.io/cluster: ${HostedControlPlane Namespace}
: Use this label and taint when you want to dedicate a node to a single hosted cluster.
Apply the following labels on the nodes that host control-plane pods:
node-role.kubernetes.io/infra
: Use this label to avoid having the control-plane workload count toward your subscription.
topology.kubernetes.io/zone
: Use this label on the management cluster nodes to deploy highly available clusters across failure domains. The zone might be a location, rack name, or the hostname of the node where the zone is set. For example, a management cluster has the following nodes: worker-1a
, worker-1b
, worker-2a
, and worker-2b
. The worker-1a
and worker-1b
nodes are in rack1
, and the worker-2a
and worker-2b nodes are in rack2
. To use each rack as an availability zone, enter the following commands:
$ oc label node/worker-1a node/worker-1b topology.kubernetes.io/zone=rack1
$ oc label node/worker-2a node/worker-2b topology.kubernetes.io/zone=rack2
Pods for a hosted cluster have tolerations, and the scheduler uses affinity rules to schedule them. Pods tolerate taints for control-plane
and the cluster
for the pods. The scheduler prioritizes the scheduling of pods into nodes that are labeled with hypershift.openshift.io/control-plane
and hypershift.openshift.io/cluster: ${HostedControlPlane Namespace}
.
For the ControllerAvailabilityPolicy
option, use HighlyAvailable
, which is the default value that the hosted control planes command-line interface, hcp
, deploys. When you use that option, you can schedule pods for each deployment within a hosted cluster across different failure domains by setting topology.kubernetes.io/zone
as the topology key. Scheduling pods for a deployment within a hosted cluster across different failure domains is available only for highly available control planes.
To enable a hosted cluster to require its pods to be scheduled into infrastructure nodes, set HostedCluster.spec.nodeSelector
, as shown in the following example:
spec:
nodeSelector:
node-role.kubernetes.io/infra: ""
This way, hosted control planes for each hosted cluster are eligible infrastructure node workloads, and you do not need to entitle the underlying OKD nodes.
Four built-in priority classes influence the priority and preemption of the hosted cluster pods. You can create the pods in the management cluster in the following order from highest to lowest:
hypershift-operator
: HyperShift Operator pods.
hypershift-etcd
: Pods for etcd.
hypershift-api-critical
: Pods that are required for API calls and resource admission to succeed. These pods include pods such as kube-apiserver
, aggregated API servers, and web hooks.
hypershift-control-plane
: Pods in the control plane that are not API-critical but still need elevated priority, such as the cluster version Operator.
By default, pods for a hosted cluster tolerate the control-plane
and cluster
taints. However, you can also use custom taints on nodes so that hosted clusters can tolerate those taints on a per-hosted-cluster basis by setting HostedCluster.spec.tolerations
.
Passing tolerations for a hosted cluster is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
spec:
tolerations:
- effect: NoSchedule
key: kubernetes.io/custom
operator: Exists
You can also set tolerations on the hosted cluster while you create a cluster by using the --tolerations
hcp CLI argument.
--toleration="key=kubernetes.io/custom,operator=Exists,effect=NoSchedule"
For fine granular control of hosted cluster pod placement on a per-hosted-cluster basis, use custom tolerations with nodeSelectors
. You can co-locate groups of hosted clusters and isolate them from other hosted clusters. You can also place hosted clusters in infra and control plane nodes.
Tolerations on the hosted cluster spread only to the pods of the control plane. To configure other pods that run on the management cluster and infrastructure-related pods, such as the pods to run virtual machines, you need to use a different process.
You can configure hosted control planes to isolate network traffic or control plane pods.
Each hosted control plane is assigned to run in a dedicated Kubernetes namespace. By default, the Kubernetes namespace denies all network traffic.
The following network traffic is allowed through the network policy that is enforced by the Kubernetes Container Network Interface (CNI):
Ingress pod-to-pod communication in the same namespace (intra-tenant)
Ingress on port 6443 to the hosted kube-apiserver
pod for the tenant
Metric scraping from the management cluster Kubernetes namespace with the network.openshift.io/policy-group: monitoring
label is allowed for monitoring
In addition to network policies, each hosted control plane pod is run with the restricted
security context constraint. This policy denies access to all host features and requires pods to be run with a UID and with SELinux context that is allocated uniquely to each namespace that hosts a customer control plane.
The policy ensures the following constraints:
Pods cannot run as privileged.
Pods cannot mount host directory volumes.
Pods must run as a user in a pre-allocated range of UIDs.
Pods must run with a pre-allocated MCS label.
Pods cannot access the host network namespace.
Pods cannot expose host network ports.
Pods cannot access the host PID namespace.
By default, pods drop the following Linux capabilities: KILL
, MKNOD
, SETUID
, and SETGID
.
The management components, such as kubelet
and crio
, on each management cluster worker node are protected by an SELinux label that is not accessible to the SELinux context for pods that support hosted control planes.
The following SELinux labels are used for key processes and sockets:
kubelet: system_u:system_r:unconfined_service_t:s0
crio: system_u:system_r:container_runtime_t:s0
crio.sock: system_u:object_r:container_var_run_t:s0
<example user container processes>: system_u:system_r:container_t:s0:c14,c24