$ oc -n NAMESPACE patch HC HCNAME --patch '{"spec":{"release":{"image": "example"}}}' --type=merge
After you configure your environment for hosted control planes and create a hosted cluster, you can further manage your clusters and nodes.
Hosted control planes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
Updates for hosted control planes involve updating the hosted cluster and the node pools. For a cluster to remain fully operational during an update process, you must meet the requirements of the Kubernetes version skew policy while completing the control plane and node updates.
The spec.release
value dictates the version of the control plane. The HostedCluster
object transmits the intended spec.release
value to the HostedControlPlane.spec.release
value and runs the appropriate Control Plane Operator version.
The hosted control plane manages the rollout of the new version of the control plane components along with any OKD components through the new version of the Cluster Version Operator (CVO).
With node pools, you can configure the software that is running in the nodes by exposing the spec.release
and spec.config
values. You can start a rolling node pool update in the following ways:
Changing the spec.release
or spec.config
values.
Changing any platform-specific field, such as the AWS instance type. The result is a set of new instances with the new type.
Changing the cluster configuration, if the change propagates to the node.
Node pools support replace updates and in-place updates. The nodepool.spec.release
value dictates the version of any particular node pool. A NodePool
object completes a replace or an in-place rolling update according to the .spec.management.upgradeType
value.
After you create a node pool, you cannot change the update type. If you want to change the update type, you must create a node pool and delete the other one.
A replace update creates instances in the new version while it removes old instances from the previous version. This update type is effective in cloud environments where this level of immutability is cost effective.
Replace updates do not preserve any manual changes because the node is entirely re-provisioned.
An in-place update directly updates the operating systems of the instances. This type is suitable for environments where the infrastructure constraints are higher, such as bare metal.
In-place updates can preserve manual changes, but will report errors if you make manual changes to any file system or operating system configuration that the cluster directly manages, such as kubelet certificates.
On hosted control planes, you update your version of OKD by updating the node pools. The node pool version must not surpass the hosted control plane version.
To start the process to update to a new version of OKD, change the spec.release.image
value of the node pool by entering the following command:
$ oc -n NAMESPACE patch HC HCNAME --patch '{"spec":{"release":{"image": "example"}}}' --type=merge
To verify that the new version was rolled out, check the .status.version
value and the status conditions.
If you are a cluster instance administrator, you can pause the reconciliation of a hosted cluster and hosted control plane. You might want to pause reconciliation when you back up and restore an etcd database or when you need to debug problems with a hosted cluster or hosted control plane.
To pause reconciliation for a hosted cluster and hosted control plane, populate the pausedUntil
field of the HostedCluster
resource, as shown in the following examples. In the examples, the value for pausedUntil
is defined in an environment variable prior to the command.
To pause the reconciliation until a specific time, specify an RFC339 timestamp:
PAUSED_UNTIL="2022-03-03T03:28:48Z"
oc patch -n <hosted-cluster-namespace> hostedclusters/<hosted-cluster-name> -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge
The reconciliation is paused until the specified time is passed.
To pause the reconciliation indefinitely, pass a Boolean value of true
:
PAUSED_UNTIL="true"
oc patch -n <hosted-cluster-namespace> hostedclusters/<hosted-cluster-name> -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge
The reconciliation is paused until you remove the field from the HostedCluster
resource.
When the pause reconciliation field is populated for the HostedCluster
resource, the field is automatically added to the associated HostedControlPlane
resource.
To remove the pausedUntil
field, enter the following patch command:
oc patch -n <hosted-cluster-namespace> hostedclusters/<hosted-cluster-name> -p '{"spec":{"pausedUntil":null}}' --type=merge
Hosted control planes for Red Hat OKD creates ServiceMonitor
resources in each control plane namespace that allow a Prometheus stack to gather metrics from the control planes. The ServiceMonitor
resources use metrics relabelings to define which metrics are included or excluded from a particular component, such as etcd or the Kubernetes API server. The number of metrics that are produced by control planes directly impacts the resource requirements of the monitoring stack that gathers them.
Instead of producing a fixed number of metrics that apply to all situations, you can configure a metrics set that identifies a set of metrics to produce for each control plane. The following metrics sets are supported:
Telemetry
: These metrics are needed for telemetry. This set is the default set and is the smallest set of metrics.
SRE
: This set includes the necessary metrics to produce alerts and allow the troubleshooting of control plane components.
All
: This set includes all of the metrics that are produced by standalone OKD control plane components.
To configure a metrics set, set the METRICS_SET
environment variable in the HyperShift Operator deployment by entering the following command:
$ oc set env -n hypershift deployment/operator METRICS_SET=All
When you specify the SRE
metrics set, the HyperShift Operator looks for a config map named sre-metric-set
with a single key: config
. The value of the config
key must contain a set of RelabelConfigs
that are organized by control plane component.
You can specify the following components:
etcd
kubeAPIServer
kubeControllerManager
openshiftAPIServer
openshiftControllerManager
openshiftRouteControllerManager
cvo
olm
catalogOperator
registryOperator
nodeTuningOperator
controlPlaneOperator
hostedClusterConfigOperator
A configuration of the SRE
metrics set is illustrated in the following example:
kubeAPIServer:
- action: "drop"
regex: "etcd_(debugging|disk|server).*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_admission_controller_admission_latencies_seconds_.*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_admission_step_admission_latencies_seconds_.*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "scheduler_(e2e_scheduling_latency_microseconds|scheduling_algorithm_predicate_evaluation|scheduling_algorithm_priority_evaluation|scheduling_algorithm_preemption_evaluation|scheduling_algorithm_latency_microseconds|binding_latency_microseconds|scheduling_latency_seconds)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_(request_count|request_latencies|request_latencies_summary|dropped_requests|storage_data_key_generation_latencies_microseconds|storage_transformation_failures_total|storage_transformation_latencies_microseconds|proxy_tunnel_sync_latency_secs)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "docker_(operations|operations_latency_microseconds|operations_errors|operations_timeout)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "reflector_(items_per_list|items_per_watch|list_duration_seconds|lists_total|short_watches_total|watch_duration_seconds|watches_total)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "transformation_(transformation_latencies_microseconds|failures_total)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "network_plugin_operations_latency_microseconds|sync_proxy_rules_latency_microseconds|rest_client_request_latency_seconds"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)"
sourceLabels: ["__name__", "le"]
kubeControllerManager:
- action: "drop"
regex: "etcd_(debugging|disk|request|server).*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "rest_client_request_latency_seconds_(bucket|count|sum)"
sourceLabels: ["__name__"]
- action: "drop"
regex: "root_ca_cert_publisher_sync_duration_seconds_(bucket|count|sum)"
sourceLabels: ["__name__"]
openshiftAPIServer:
- action: "drop"
regex: "etcd_(debugging|disk|server).*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_admission_controller_admission_latencies_seconds_.*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_admission_step_admission_latencies_seconds_.*"
sourceLabels: ["__name__"]
- action: "drop"
regex: "apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)"
sourceLabels: ["__name__", "le"]
openshiftControllerManager:
- action: "drop"
regex: "etcd_(debugging|disk|request|server).*"
sourceLabels: ["__name__"]
openshiftRouteControllerManager:
- action: "drop"
regex: "etcd_(debugging|disk|request|server).*"
sourceLabels: ["__name__"]
olm:
- action: "drop"
regex: "etcd_(debugging|disk|server).*"
sourceLabels: ["__name__"]
catalogOperator:
- action: "drop"
regex: "etcd_(debugging|disk|server).*"
sourceLabels: ["__name__"]
cvo:
- action: drop
regex: "etcd_(debugging|disk|server).*"
sourceLabels: ["__name__"]
When you need to troubleshoot an issue with hosted control plane clusters, you can gather information by running the hypershift dump cluster
command. The command generates output for the management cluster and the hosted cluster.
The output for the management cluster contains the following content:
Cluster-scoped resources: These resources are node definitions of the management cluster.
The hypershift-dump
compressed file: This file is useful if you need to share the content with other people.
Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
Network logs: These logs include the OVN northbound and southbound databases and the status for each one.
Hosted clusters: This level of output involves all of the resources inside of the hosted cluster.
The output for the hosted cluster contains the following content:
Cluster-scoped resources: These resources include all of the cluster-wide objects, such as nodes and CRDs.
Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
Although the output does not contain any secret objects from the cluster, it can contain references to the names of secrets.
You must have cluster-admin
access to the management cluster.
You need the name
value for the HostedCluster
resource and the namespace where the CR is deployed.
You must have the hcp
command line interface installed. For more information, see Installing the hosted control planes command line interface.
You must have the OpenShift CLI (oc
) installed.
You must ensure that the kubeconfig
file is loaded and is pointing to the management cluster.
To gather output for troubleshooting, enter the following commands:
$ CLUSTERNAME="samplecluster"
$ CLUSTERNS="clusters"
$ mkdir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
$ hypershift dump cluster \
--name ${CLUSTERNAME} \
--namespace ${CLUSTERNS} \
--dump-guest-cluster \
--artifact-dir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
2023-06-06T12:18:20+02:00 INFO Archiving dump {"command": "tar", "args": ["-cvzf", "hypershift-dump.tar.gz", "cluster-scoped-resources", "event-filter.html", "namespaces", "network_logs", "timestamp"]}
2023-06-06T12:18:21+02:00 INFO Successfully archived dump {"duration": "1.519376292s"}
To configure the command-line interface so that it impersonates all of the queries against the management cluster by using a username or service account, enter the hypershift dump cluster
command with the --as
flag.
The service account must have enough permissions to query all of the objects from the namespaces, so the cluster-admin
role is recommended to make sure you have enough permissions. The service account must be located in or have permissions to query the namespace of the HostedControlPlane
resource.
If your username or service account does not have enough permissions, the output contains only the objects that you have permissions to access. During that process, you might see forbidden
errors.
To use impersonation by using a service account, enter the following commands. Replace values as necessary:
$ CLUSTERNAME="samplecluster"
$ CLUSTERNS="clusters"
$ SA="samplesa"
$ SA_NAMESPACE="default"
$ mkdir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
$ hypershift dump cluster \
--name ${CLUSTERNAME} \
--namespace ${CLUSTERNS} \
--dump-guest-cluster \
--as "system:serviceaccount:${SA_NAMESPACE}:${SA}" \
--artifact-dir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
To use impersonation by using a username, enter the following commands. Replace values as necessary:
$ CLUSTERNAME="samplecluster"
$ CLUSTERNS="clusters"
$ CLUSTERUSER="cloud-admin"
$ mkdir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
$ hypershift dump cluster \
--name ${CLUSTERNAME} \
--namespace ${CLUSTERNS} \
--dump-guest-cluster \
--as "${CLUSTERUSER}" \
--artifact-dir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
The steps to delete a hosted cluster differ depending on which provider you use.
If the cluster is on AWS, follow the instructions in Destroying a hosted cluster on AWS.
If the cluster is on bare metal, follow the instructions in Destroying a hosted cluster on bare metal.
If the cluster is on OKD Virtualization, follow the instructions in Destroying a hosted cluster on OpenShift Virtualization.
If you want to disable the hosted control plane feature, see Disabling the hosted control plane feature.