Managing nodes - Working with nodes | Nodes

Modifying nodes
Updating boot images
- Disabling updated boot images
Configuring control plane nodes as schedulable
Setting SELinux booleans
Adding kernel arguments to nodes
Enabling swap memory use on nodes
About configuring parallel container image pulls
- Configuring parallel container image pulls
Migrating control plane nodes from one RHOSP host to another manually

OKD uses a KubeletConfig custom resource (CR) to manage the configuration of nodes. By creating an instance of a KubeletConfig object, a managed machine config is created to override setting on the node.

Logging in to remote machines for the purpose of changing their configuration is not supported.

Modifying nodes

To make configuration changes to a cluster, or machine pool, you must create a custom resource definition (CRD), or kubeletConfig object. OKD uses the Machine Config Controller to watch for changes introduced through the CRD to apply the changes to the cluster.

Because the fields in a kubeletConfig object are passed directly to the kubelet from upstream Kubernetes, the validation of those fields is handled directly by the kubelet itself. Please refer to the relevant Kubernetes documentation for the valid values for these fields. Invalid values in the kubeletConfig object can render cluster nodes unusable.

Procedure

Obtain the label associated with the static CRD, Machine Config Pool, for the type of node you want to configure. Perform one of the following steps:

Check current labels of the desired machine config pool.

For example:

$  oc get machineconfigpool  --show-labels

Example output

NAME      CONFIG                                             UPDATED   UPDATING   DEGRADED   LABELS
master    rendered-master-e05b81f5ca4db1d249a1bf32f9ec24fd   True      False      False      operator.machineconfiguration.openshift.io/required-for-upgrade=
worker    rendered-worker-f50e78e1bc06d8e82327763145bfcf62   True      False      False

Add a custom label to the desired machine config pool.

For example:
```
$ oc label machineconfigpool worker custom-kubelet=enabled
```

Create a kubeletconfig custom resource (CR) for your configuration change.

For example:

Sample configuration for a custom-config CR

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-config (1)
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: enabled (2)
  kubeletConfig: (3)
    podsPerCore: 10
    maxPods: 250
    systemReserved:
      cpu: 2000m
      memory: 1Gi
#...

1	Assign a name to CR.
2	Specify the label to apply the configuration change, this is the label you added to the machine config pool.
3	Specify the new value(s) you want to change.

Create the CR object.

$ oc create -f <file-name>

For example:

$ oc create -f master-kube-config.yaml

Most Kubelet Configuration options can be set by the user. The following options are not allowed to be overwritten:

CgroupDriver
ClusterDNS
ClusterDomain
StaticPodPath

If a single node contains more than 50 images, pod scheduling might be imbalanced across nodes. This is because the list of images on a node is shortened to 50 by default. You can disable the image limit by editing the KubeletConfig object and setting the value of nodeStatusMaxImages to -1.

Updating boot images

The Machine Config Operator (MCO) uses a boot image to bring up a Fedora CoreOS (FCOS) node. By default, OKD does not manage the boot image.

This means that the boot image in your cluster is not updated along with your cluster. For example, if your cluster was originally created with OKD 4.12, the boot image that the cluster uses to create nodes is the same 4.12 version, even if your cluster is at a later version. If the cluster is later upgraded to 4.13 or later, new nodes continue to scale with the same 4.12 image.

This process could cause the following issues:

Extra time to start up nodes
Certificate expiration issues
Version skew issues

To avoid these issues, you can configure your cluster to update the boot image whenever you update your cluster. By modifying the MachineConfiguration object, you can enable this feature. Currently, the ability to update the boot image is available for only Google Cloud Platform (GCP) clusters as a Technology Preview feature and is not supported for Cluster CAPI Operator managed clusters.

The updating boot image feature on Google Cloud Platform (GCP) clusters is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

To view the current boot image used in your cluster, examine a machine set:

Example machine set with the boot image reference

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  name: ci-ln-hmy310k-72292-5f87z-worker-a
  namespace: openshift-machine-api
spec:
# ...
  template:
# ...
    spec:
# ...
      providerSpec:
# ...
        value:
          disks:
          - autoDelete: true
            boot: true
            image: projects/rhcos-cloud/global/images/rhcos-412-85-202203181601-0-gcp-x86-64 (1)
# ...

1 This boot image is the same as the originally-installed OKD version, in this example OKD 4.12, regardless of the current version of the cluster. The way that the boot image is represented in the machine set depends on the platform, as the structure of the providerSpec field differs from platform to platform.

If you configure your cluster to update your boot images, the boot image referenced in your machine sets matches the current version of the cluster.

Prerequisites

You have enabled the TechPreviewNoUpgrade feature set by using the feature gates. For more information, see "Enabling features using feature gates" in the "Additional resources" section.

Procedure

Edit the MachineConfiguration object, named cluster, to enable the updating of boot images by running the following command:

$ oc edit MachineConfiguration cluster

Optional: Configure the boot image update feature for all the machine sets:

apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
# ...
  managedBootImages: (1)
    machineManagers:
    - resource: machinesets
      apiGroup: machine.openshift.io
      selection:
        mode: All (2)

1	Activates the boot image update feature.
2	Specifies that all the machine sets in the cluster are to be updated.

Optional: Configure the boot image update feature for specific machine sets:

apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
# ...
  managedBootImages: (1)
    machineManagers:
    - resource: machinesets
      apiGroup: machine.openshift.io
      selection:
        mode: Partial
          partial:
            machineResourceSelector:
              matchLabels:
                update-boot-image: "true" (2)

1	Activates the boot image update feature.
2	Specifies that any machine set with this label is to be updated.

If an appropriate label is not present on the machine set, add a key/value pair by running a command similar to following:

$ oc label machineset.machine ci-ln-hmy310k-72292-5f87z-worker-a update-boot-image=true -n openshift-machine-api

Verification

Get the boot image version by running the following command:

$ oc get machinesets <machineset_name> -n openshift-machine-api -o yaml

Example machine set with the boot image reference

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: ci-ln-77hmkpt-72292-d4pxp
    update-boot-image: "true"
  name: ci-ln-77hmkpt-72292-d4pxp-worker-a
  namespace: openshift-machine-api
spec:
# ...
  template:
# ...
    spec:
# ...
      providerSpec:
# ...
        value:
          disks:
          - autoDelete: true
            boot: true
            image: projects/rhcos-cloud/global/images/rhcos-416-92-202402201450-0-gcp-x86-64 (1)
# ...

1	This boot image is the same as the current OKD version.

Additional resources

Enabling features using feature gates

Disabling updated boot images

To disable the updated boot image feature, edit the MachineConfiguration object so that the machineManagers field is an empty array.

If you disable this feature after some nodes have been created with the new boot image version, any existing nodes retain their current boot image. Turning off this feature does not rollback the nodes or machine sets to the originally-installed boot image. The machine sets retain the boot image version that was present when the feature was enabled and is not updated again when the cluster is upgraded to a new OKD version in the future.

Procedure

Disable updated boot images by editing the MachineConfiguration object:
```
$ oc edit MachineConfiguration cluster
```

Make the machineManagers parameter an empty array:

apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
# ...
  managedBootImages: (1)
    machineManagers: []

1	Remove the parameters listed under `machineManagers` and add the `[]` characters to disable boot image updates.

Configuring control plane nodes as schedulable

You can configure control plane nodes to be schedulable, meaning that new pods are allowed for placement on the master nodes. By default, control plane nodes are not schedulable.

You can set the masters to be schedulable, but must retain the worker nodes.

You can deploy OKD with no worker nodes on a bare metal cluster. In this case, the control plane nodes are marked schedulable by default.

You can allow or disallow control plane nodes to be schedulable by configuring the mastersSchedulable field.

When you configure control plane nodes from the default unschedulable to schedulable, additional subscriptions are required. This is because control plane nodes then become worker nodes.

Procedure

Edit the schedulers.config.openshift.io resource.

$ oc edit schedulers.config.openshift.io cluster

Configure the mastersSchedulable field.

apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  creationTimestamp: "2019-09-10T03:04:05Z"
  generation: 1
  name: cluster
  resourceVersion: "433"
  selfLink: /apis/config.openshift.io/v1/schedulers/cluster
  uid: a636d30a-d377-11e9-88d4-0a60097bee62
spec:
  mastersSchedulable: false (1)
status: {}
#...

1	Set to `true` to allow control plane nodes to be schedulable, or `false` to disallow control plane nodes to be schedulable.

Save the file to apply the changes.

Setting SELinux booleans

OKD allows you to enable and disable an SELinux boolean on a Fedora CoreOS (FCOS) node. The following procedure explains how to modify SELinux booleans on nodes using the Machine Config Operator (MCO). This procedure uses container_manage_cgroup as the example boolean. You can modify this value to whichever boolean you need.

Prerequisites

You have installed the OpenShift CLI (oc).

Procedure

Create a new YAML file with a MachineConfig object, displayed in the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-setsebool
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Set SELinux booleans
          Before=kubelet.service

          [Service]
          Type=oneshot
          ExecStart=/sbin/setsebool container_manage_cgroup=on
          RemainAfterExit=true

          [Install]
          WantedBy=multi-user.target graphical.target
        enabled: true
        name: setsebool.service
#...

Create the new MachineConfig object by running the following command:
```
$ oc create -f 99-worker-setsebool.yaml
```

Applying any changes to the MachineConfig object causes all affected nodes to gracefully reboot after the change is applied.

Adding kernel arguments to nodes

In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.

Improper use of kernel arguments can result in your systems becoming unbootable.

Examples of kernel arguments you could set include:

nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider nosmt in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.

systemd.unified_cgroup_hierarchy: Configures the version of Linux control group that is installed on your nodes: cgroup v1 or cgroup v2. cgroup v2 is the next version of the kernel control group and offers multiple improvements. However, it can have some unwanted effects on your nodes.

cgroup v2 is enabled by default. To disable cgroup v2, use the systemd.unified_cgroup_hierarchy=0 kernel argument, as shown in the following procedure.

cgroup v1 is a deprecated feature. Deprecated functionality is still included in OKD and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.

For the most recent list of major functionality that has been deprecated or removed within OKD, refer to the Deprecated and removed features section of the OKD release notes.

enforcing=0: Configures Security Enhanced Linux (SELinux) to run in permissive mode. In permissive mode, the system acts as if SELinux is enforcing the loaded security policy, including labeling objects and emitting access denial entries in the logs, but it does not actually deny any operations. While not supported for production systems, permissive mode can be helpful for debugging.

Disabling SELinux on FCOS in production is not supported. Once SELinux has been disabled on a node, it must be re-provisioned before re-inclusion in a production cluster.

See Kernel.org kernel parameters for a list and descriptions of kernel arguments.

In the following procedure, you create a MachineConfig object that identifies:

A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.
Kernel arguments that are appended to the end of the existing kernel arguments.
A label that indicates where in the list of machine configs the change is applied.

Prerequisites

Have administrative privilege to a working OKD cluster.

Procedure

List existing MachineConfig objects for your OKD cluster to determine how to label your machine config:

$ oc get MachineConfig

Example output

NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
99-master-ssh                                                                                 3.2.0             40m
99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
99-worker-ssh                                                                                 3.2.0             40m
rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m

Create a MachineConfig object file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxpermissive.yaml)

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker (1)
  name: 05-worker-kernelarg-selinuxpermissive (2)
spec:
  config:
    ignition:
      version: 3.4.0
  kernelArguments:
    - enforcing=0 (3)
      systemd.unified_cgroup_hierarchy=0 (4)
#...

1	Applies the new kernel argument only to worker nodes.
2	Named to identify where it fits among the machine configs (05) and what it does (adds a kernel argument to configure SELinux permissive mode).
3	Identifies the exact kernel argument as `enforcing=0`.
4	Configures cgroup v1 on the associated nodes. cgroup v2 is the default.

Create the new machine config:

$ oc create -f 05-worker-kernelarg-selinuxpermissive.yaml

Check the machine configs to see that the new one was added:

$ oc get MachineConfig

Example output

NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
05-worker-kernelarg-selinuxpermissive                                                         3.4.0             105s
99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
99-master-ssh                                                                                 3.2.0             40m
99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
99-worker-ssh                                                                                 3.2.0             40m
rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m
rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.4.0             33m

Check the nodes:

$ oc get nodes

Example output

NAME                           STATUS                     ROLES    AGE   VERSION
ip-10-0-136-161.ec2.internal   Ready                      worker   28m   v1.29.4
ip-10-0-136-243.ec2.internal   Ready                      master   34m   v1.29.4
ip-10-0-141-105.ec2.internal   Ready,SchedulingDisabled   worker   28m   v1.29.4
ip-10-0-142-249.ec2.internal   Ready                      master   34m   v1.29.4
ip-10-0-153-11.ec2.internal    Ready                      worker   28m   v1.29.4
ip-10-0-153-150.ec2.internal   Ready                      master   34m   v1.29.4

You can see that scheduling on each worker node is disabled as the change is being applied.

Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command-line arguments (in /proc/cmdline on the host):

$ oc debug node/ip-10-0-141-105.ec2.internal

Example output

Starting pod/ip-10-0-141-105ec2internal-debug ...
To use host binaries, run `chroot /host`

sh-4.2# cat /host/proc/cmdline
BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 enforcing=0

sh-4.2# exit

You should see the enforcing=0 argument added to the other kernel arguments.

Enabling swap memory use on nodes

Enabling swap memory use on nodes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Enabling swap memory is only available for container-native virtualization (CNV) users or use cases.

Enabling swap memory can negatively impact workload performance and out-of-resource handling. Do not enable swap memory on control plane nodes.

To enable swap memory, create a kubeletconfig custom resource (CR) to set the swapbehavior parameter. You can set limited or unlimited swap memory:

Limited: Use the LimitedSwap value to limit how much swap memory workloads can use. Any workloads on the node that are not managed by OKD can still use swap memory. The LimitedSwap behavior depends on whether the node is running with Linux control groups version 1 (cgroups v1) or version 2 (cgroup v2):
- cgroup v2: OKD workloads can use any combination of memory and swap, up to the pod’s memory limit, if set.
- cgroup v1: OKD workloads cannot use swap memory.
Unlimited: Use the UnlimitedSwap value to allow workloads to use as much swap memory as they request, up to the system limit.

Because the kubelet will not start in the presence of swap memory without this configuration, you must enable swap memory in OKD before enabling swap memory on the nodes. If there is no swap memory present on a node, enabling swap memory in OKD has no effect.

Prerequisites

You have a running OKD cluster that uses version 4.10 or later.
You are logged in to the cluster as a user with administrative privileges.
You have enabled the TechPreviewNoUpgrade feature set on the cluster (see Nodes → Working with clusters → Enabling features using feature gates).

Enabling the TechPreviewNoUpgrade feature set cannot be undone and prevents minor version updates. These feature sets are not recommended on production clusters.
If cgroup v2 is enabled on a node, you must enable swap accounting on the node, by setting the swapaccount=1 kernel argument.

Procedure

Apply a custom label to the machine config pool where you want to allow swap memory.
```
$ oc label machineconfigpool worker kubelet-swap=enabled
```

Create a custom resource (CR) to enable and configure swap settings.

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: swap-config
spec:
  machineConfigPoolSelector:
    matchLabels:
      kubelet-swap: enabled
  kubeletConfig:
    failSwapOn: false (1)
    memorySwap:
      swapBehavior: LimitedSwap (2)
#...

1	Set to `false` to enable swap memory use on the associated nodes. Set to `true` to disable swap memory use.
2	Specify the swap memory behavior. If unspecified, the default is `LimitedSwap`.

Enable swap memory on the machines.

About configuring parallel container image pulls

To help control bandwidth issues, you can configure the number of workload images that can be pulled at the same time.

By default, the cluster pulls images in parallel, which allows multiple workloads to pull images at the same time. Pulling multiple images in parallel can improve workload start-up time because workloads can pull needed images without waiting for each other. However, pulling too many images at the same time can use excessive network bandwidth and cause latency issues throughout your cluster.

The default setting allows unlimited simultaneous image pulls. But, you can configure the maximum number of images that can be pulled in parallel. You can also force serial image pulling, which means that only one image can be pulled at a time.

To control the number of images that can be pulled simultaneously, use a kubelet configuration to set the maxParallelImagePulls to specify a limit. Additional image pulls above this limit are held until one of the current pulls is complete.

To force serial image pulls, use a kubelet configuration to set serializeImagePulls field to true.

Configuring parallel container image pulls

You can control the number of images that can be pulled by your workload simultaneously by using a kubelet configuration.

You can set a maximum number of images that can be pulled or force workloads to pull images one at a time.

Prerequisites

You have a running OKD cluster.
You are logged in to the cluster as a user with administrative privileges.

Procedure

Apply a custom label to the machine config pool where you want to configure parallel pulls by running a command similar to the following.
```
$ oc label machineconfigpool <mcp_name> parallel-pulls=set
```

Create a custom resource (CR) to configure parallel image pulling.

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: parallel-image-pulls
# ...
spec:
  machineConfigPoolSelector:
    matchLabels:
      parallel-pulls: set
  kubeletConfig:
    serializeImagePulls: false (1)
    maxParallelImagePulls: 3 (2)
# ...

1	Set to `false` to enable parallel image pulls. Set to `true` to force serial image pulling. The default is `false`.
2	Specify the maximum number of images that can be pulled in parallel. Enter a number or set to `nil` to specify no limit. This field cannot be set if `SerializeImagePulls` is `true`. The default is `nil`.

Create the new machine config by running a command similar to the following:
```
$ oc create -f <file_name>.yaml
```

Verification

Check the machine configs to see that a new one was added by running the following command:

$ oc get MachineConfig

Example output

NAME                                                GENERATEDBYCONTROLLER                        IGNITIONVERSION   AGE
00-master                                           70025364a114fc3067b2e82ce47fdb0149630e4b     3.5.0             133m
00-worker                                           70025364a114fc3067b2e82ce47fdb0149630e4b     3.5.0             133m
# ...
99-parallel-generated-kubelet                       70025364a114fc3067b2e82ce47fdb0149630e4b     3.5.0             15s (1)
# ...
rendered-parallel-c634a80f644740974ceb40c054c79e50  70025364a114fc3067b2e82ce47fdb0149630e4b     3.5.0             10s (2)

1	The new machine config. In this example, the machine config is for the `parallel` custom machine config pool.
2	The new rendered machine config. In this example, the machine config is for the `parallel` custom machine config pool.

Check to see that the nodes in the parallel machine config pool are being updated by running the following command:

$ oc get machineconfigpool

Example output

NAME       CONFIG                                               UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
parallel   rendered-parallel-3904f0e69130d125b3b5ef0e981b1ce1   False     True       False      1              0                   0                     0                      65m
master     rendered-master-7536834c197384f3734c348c1d957c18     True      False      False      3              3                   3                     0                      140m
worker     rendered-worker-c634a80f644740974ceb40c054c79e50     True      False      False      2              2                   2                     0                      140m

When the nodes are updated, verify that the parallel pull maximum was configured:
1. Open an oc debug session to a node by running a command similar to the following:
  $ oc debug node/<node_name>
2. Set /host as the root directory within the debug shell by running the following command:
  sh-5.1# chroot /host
3. Examine the kubelet.conf file by running the following command:
  sh-5.1# cat /etc/kubernetes/kubelet.conf | grep -i maxParallelImagePulls
  Example output
  maxParallelImagePulls: 3

Migrating control plane nodes from one RHOSP host to another manually

If control plane machine sets are not enabled on your cluster, you can run a script that moves a control plane node from one OpenStack node to another.

Control plane machine sets are not enabled on clusters that run on user-provisioned infrastructure.

For information about control plane machine sets, see "Managing control plane machines with control plane machine sets".

Prerequisites

The environment variable OS_CLOUD refers to a clouds entry that has administrative credentials in a clouds.yaml file.
The environment variable KUBECONFIG refers to a configuration that contains administrative OKD credentials.

Procedure

From a command line, run the following script:

#!/usr/bin/env bash

set -Eeuo pipefail

if [ $# -lt 1 ]; then
	echo "Usage: '$0 node_name'"
	exit 64
fi

# Check for admin OpenStack credentials
openstack server list --all-projects >/dev/null || { >&2 echo "The script needs OpenStack admin credentials. Exiting"; exit 77; }

# Check for admin OpenShift credentials
oc adm top node >/dev/null || { >&2 echo "The script needs OpenShift admin credentials. Exiting"; exit 77; }

set -x

declare -r node_name="$1"
declare server_id
server_id="$(openstack server list --all-projects -f value -c ID -c Name | grep "$node_name" | cut -d' ' -f1)"
readonly server_id

# Drain the node
oc adm cordon "$node_name"
oc adm drain "$node_name" --delete-emptydir-data --ignore-daemonsets --force

# Power off the server
oc debug "node/${node_name}" -- chroot /host shutdown -h 1

# Verify the server is shut off
until openstack server show "$server_id" -f value -c status | grep -q 'SHUTOFF'; do sleep 5; done

# Migrate the node
openstack server migrate --wait "$server_id"

# Resize the VM
openstack server resize confirm "$server_id"

# Wait for the resize confirm to finish
until openstack server show "$server_id" -f value -c status | grep -q 'SHUTOFF'; do sleep 5; done

# Restart the VM
openstack server start "$server_id"

# Wait for the node to show up as Ready:
until oc get node "$node_name" | grep -q "^${node_name}[[:space:]]\+Ready"; do sleep 5; done

# Uncordon the node
oc adm uncordon "$node_name"

# Wait for cluster operators to stabilize
until oc get co -o go-template='statuses: {{ range .items }}{{ range .status.conditions }}{{ if eq .type "Degraded" }}{{ if ne .status "False" }}DEGRADED{{ end }}{{ else if eq .type "Progressing"}}{{ if ne .status "False" }}PROGRESSING{{ end }}{{ else if eq .type "Available"}}{{ if ne .status "True" }}NOTAVAILABLE{{ end }}{{ end }}{{ end }}{{ end }}' | grep -qv '\(DEGRADED\|PROGRESSING\|NOTAVAILABLE\)'; do sleep 5; done

If the script completes, the control plane machine is migrated to a new OpenStack node.

Additional resources

Managing control plane machines with control plane machine sets