There are times when you need to make changes to the operating systems running on OKD nodes. This can include changing settings for network time service, adding kernel arguments, or configuring journaling in a specific way.

Aside from a few specialized features, most changes to operating systems on OKD nodes can be done by creating what are referred to as MachineConfig objects that are managed by the Machine Config Operator.

Tasks in this section describe how to use features of the Machine Config Operator to configure operating system features on OKD nodes.

Understanding the Machine Config Operator

Machine Config Operator

Purpose

The Machine Config Operator manages and applies configuration and updates of the base operating system and container runtime, including everything between the kernel and kubelet.

There are four components:

  • machine-config-server: Provides Ignition configuration to new machines joining the cluster.

  • machine-config-controller: Coordinates the upgrade of machines to the desired configurations defined by a MachineConfig object. Options are provided to control the upgrade for sets of machines individually.

  • machine-config-daemon: Applies new machine configuration during update. Validates and verifies the machine’s state to the requested machine configuration.

  • machine-config: Provides a complete source of machine configuration at installation, first start up, and updates for a machine.

Project

MachineConfig overview

The Machine Config Operator (MCO) manages updates to systemd, CRI-O and Kubelet, the kernel, Network Manager and other system features. It also offers a MachineConfig CRD that can write configuration files onto the host (see machine-config-operator) Understanding what MCO does and how it interacts with other components is critical to making advanced, system-level changes to an OKD cluster. Here are some things you should know about MCO, MachineConfigs, and how they are used:

  • A MachineConfig can make a specific change to a file or service on the operating system of each system representing a pool of OKD nodes.

  • MCO applies changes to operating systems in pools of machines. All OKD clusters start with worker and master node pools. By adding more role labels, you can configure custom pools of nodes. For example, you can set up a custom pool of worker nodes that includes particular hardware features needed by an application. However, examples in this section focus on changes to the default pool types.

  • Some machine configuration must be in place before OKD is installed to disk. In most cases, this can be accomplished by creating a MachineConfig that is injected directly into the OKD installer process, instead of running as a post-installation MachineConfig. In other cases, you might need to do bare metal installation where you pass kernel arguments at OKD installer start-up, to do such things as setting per-node individual IP addresses or advanced disk partitioning.

  • MCO manages items that are set in MachineConfigs. Manual changes you do to your systems will not be overwritten by MCO, unless MCO is explicitly told to manage a conflicting file. In other words, MCO only makes specific updates you request, it does not claim control over the whole node.

  • Manual changes to nodes are strongly discouraged. If you need to decommission a node and start a new one, those direct changes would be lost.

  • MCO is only supported for writing to files in /etc and /var directories, although there are symbolic links to some directories that can be writeable by being symbolically linked to one of those areas. The /opt directory is an example.

  • Ignition is the configuration format used in MachineConfigs. See the Ignition Configuration Specification v3.1.0 for details.

  • Although Ignition config settings can be delivered directly at OKD installation time, and are formatted in the same way that MCO delivers Ignition configs, MCO has no way of seeing what those original Ignition configs are. So you should wrap Ignition config settings into a MachineConfig before deploying them.

  • When a file managed by MCO changes outside of MCO, the Machine Config Daemon (MCD) sets the node as degraded. It will not overwrite the offending file, however, and should continue to operate in a degraded state.

  • A key reason for using a MachineConfig is that it will be applied when you spin up new nodes for a pool in your OKD cluster. The machine-api-operator provisions a new machine and MCO configures it.

MCO uses CoreOS Ignition as the configuration format OKD 4.6 moved from Ignition version 2 to Ignition version 3 format.

What can you change with MachineConfigs?

The kinds of components that MCO can change include:

  • config: Create Ignition config objects (see the Ignition configuration specification) to do things like modify files, systemd services, and other features on OKD machines, including:

    • Configuration files: Create or overwrite files in the /var or /etc directory.

    • systemd units: Create and set the status of a systemd service or add to an existing systemd service by dropping in additional settings.

    • users and groups: Change ssh keys in the passwd section post-installation.

  • kernelArguments: Add arguments to the kernel command line when OKD nodes boot.

  • kernelType: Optionally identify a non-standard kernel to use instead of the standard kernel. Use realtime to use the RT kernel (for RAN). This is only supported on select platforms.

  • fips: Enable FIPS mode. FIPS should be set at installation-time setting and not a post-installation procedure.

  • extensions: Extend FCOS features by adding selected pre-packaged software. For this feature (new in OKD 4.6), available extensions include usbguard and kernel modules.

  • Custom resources (for ContainerRuntime and Kubelet): Outside of MachineConfigs, MCO manages two special custom resources for modifying CRI-O container runtime settings (ContainerRuntime CR) and the Kubelet service (Kubelet CR).

The MCO is not the only Operator that can change operating system components on OKD nodes. Other Operators can modify operating system-level features as well. One example is the Node Tuning Operator, which allows you to do node-level tuning through Tuned daemon profiles.

Tasks for the MCO configuration that can be done post-installation are included in the following procedures. See descriptions of FCOS bare metal installation for system configuration tasks that must be done during or before OKD installation.

Project

See the openshift-machine-config-operator GitHub site for details.

Checking Machine Config Pool status

To see the status of the Machine Config Operator, its sub-components, and the resources it manages, use the following oc commands:

Procedure
  1. To see the number of MCO-managed nodes available on your cluster for each pool, type:

    $ oc get machineconfigpool
    NAME      CONFIG                  UPDATED  UPDATING   DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT  AGE
    master    rendered-master-dd…     True     False      False     3             3                  3                                0                     4h42m
    worker    rendered-worker-fde…    True     False      False     3             3                  3                                0                     4h42m

    In the previous output, there are three master and three worker nodes. All machines are updated and none are currently updating. Because all nodes are Updated and Ready and none are Degraded, you can ell that there are no issues.

  2. To see each existing machineconfig, type:

    $ oc get machineconfigs
    NAME                             GENERATEDBYCONTROLLER          IGNITIONVERSION  AGE
    00-master                        2c9371fbb673b97a6fe8b1c52...   3.1.0            5h18m
    00-worker                        2c9371fbb673b97a6fe8b1c52...   3.1.0            5h18m
    01-master-container-runtime      2c9371fbb673b97a6fe8b1c52...   3.1.0            5h18m
    01-master-kubelet                2c9371fbb673b97a6fe8b1c52…     3.1.0            5h18m
    ...
    rendered-master-dde...           2c9371fbb673b97a6fe8b1c52...   3.1.0            5h18m
    rendered-worker-fde...           2c9371fbb673b97a6fe8b1c52...   3.1.0            5h18m

    Note that the machineconfigs listed as rendered are not meant to be changed or deleted. Expect them to be hidden at some point in the future.

  3. Check the status of worker (or change to master) to see the status of that pool of nodes:

    $ oc describe mcp worker
    ...
      Degraded Machine Count:     0
      Machine Count:              3
      Observed Generation:        2
      Ready Machine Count:        3
      Unavailable Machine Count:  0
      Updated Machine Count:      3
    Events:                       <none>
  4. You can view the contents of a particular machineconfig (in this case, 01-master-kubelet). The trimmed output from the following oc describe command shows that this machineconfig contains both configuration files (cloud.conf and kubelet.conf) and a systemd service (Kubernetes Kubelet):

    $ oc describe machineconfigs 01-master-kubelet
    Name:         01-master-kubelet
    ...
    Spec:
      Config:
        Ignition:
          Version:  3.1.0
        Storage:
          Files:
            Contents:
              Source:   data:,
            Mode:       420
            Overwrite:  true
            Path:       /etc/kubernetes/cloud.conf
            Contents:
              Source:   data:,kind%3A%20KubeletConfiguration%0AapiVersion%3A%20kubelet.config.k8s.io%2Fv1beta1%0Aauthentication%3A%0A%20%20x509%3A%0A%20%20%20%20clientCAFile%3A%20%2Fetc%2Fkubernetes%2Fkubelet-ca.crt%0A%20%20anonymous...
            Mode:       420
            Overwrite:  true
            Path:       /etc/kubernetes/kubelet.conf
        Systemd:
          Units:
            Contents:  [Unit]
    Description=Kubernetes Kubelet
    Wants=rpc-statd.service network-online.target crio.service
    After=network-online.target crio.service
    
    ExecStart=/usr/bin/hyperkube \
        kubelet \
          --config=/etc/kubernetes/kubelet.conf \ ...

If something goes wrong with a machineconfig that you apply, you can always back out that change. For example, if you had run oc create -f ./myconfig.yaml to apply a machineconfig, you could remove that machineconfig by typing:

+

$ oc delete -f ./myconfig.yaml

+ If that was the only problem, the nodes in the affected pool should return to a non-degraded state. This actually causes the rendered configuration to roll back to its previously rendered state.

If you add your own MachineConfigs to your cluster, you can use the commands shown in the previous example to check their status and the related status of the pool to which they are applied.

Using MachineConfigs to configure nodes

Tasks in this section let you create MachineConfig objects to modify files, systemd unit files, and other operating system features running on OKD nodes. For more ideas on working with MachineConfigs, see content related to changing MTU network settings, adding or updating SSH authorized keys, , replacing DNS nameservers, verifying image signatures, enabling SCTP, and configuring iSCSI initiatornames for OKD.

MachineConfigs

OKD version 4.6 supports Ignition specification version 3.1. All new MachineConfigs you create going forward should be based on Ignition specification version 3.1. If you are upgrading your OKD cluster, any existing Ignition specification version 2.x MachineConfigs will be translated automatically to specification version 3.1.

Configuring chrony time service

You can set the time server and related settings used by the chrony time service (chronyd) by modifying the contents of the chrony.conf file and passing those contents to your nodes as a MachineConfig.

Procedure
  1. Create the contents of the chrony.conf file and encode it as base64. For example:

    $ cat << EOF | base64
        server clock.redhat.com iburst
        driftfile /var/lib/chrony/drift
        makestep 1.0 3
        rtcsync
        logdir /var/log/chrony
        EOF
    Example output
    ICAgIHNlcnZlciBjbG9jay5yZWRoYXQuY29tIGlidXJzdAogICAgZHJpZnRmaWxlIC92YXIvbGli
    L2Nocm9ueS9kcmlmdAogICAgbWFrZXN0ZXAgMS4wIDMKICAgIHJ0Y3N5bmMKICAgIGxvZ2RpciAv
    dmFyL2xvZy9jaHJvbnkK
  2. Create the MachineConfig file, replacing the base64 string with the one you just created yourself. This example adds the file to master nodes. You can change it to worker or make an additional MachineConfig for the worker role:

    $ cat << EOF > ./99-masters-chrony-configuration.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: master
      name: masters-chrony-configuration
    spec:
      config:
        ignition:
          config: {}
          security:
            tls: {}
          timeouts: {}
          version: 3.1.0
        networkd: {}
        passwd: {}
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,c2VydmVyIGNsb2NrLnJlZGhhdC5jb20gaWJ1cnN0CmRyaWZ0ZmlsZSAvdmFyL2xpYi9jaHJvbnkvZHJpZnQKbWFrZXN0ZXAgMS4wIDMKcnRjc3luYwpsb2dkaXIgL3Zhci9sb2cvY2hyb255Cg==
            mode: 420
            overwrite: true
            path: /etc/chrony.conf
      osImageURL: ""
    EOF
  3. Make a backup copy of the configuration file.

  4. Apply the configuration in one of two ways:

    • If the cluster is not up yet, generate manifest files, add this file to the openshift directory, and then continue to create the cluster.

    • If the cluster is already running, apply the file as follows:

       $ oc apply -f ./masters-chrony-configuration.yaml

Adding kernel arguments to nodes

In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.

Improper use of kernel arguments can result in your systems becoming unbootable.

Examples of kernel arguments you could set include:

  • selinux=0: Disables Security Enhanced Linux (SELinux). While not recommended for production, disabling SELinux can improve performance by 2% - 3%.

  • nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider nosmt in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.

See Kernel.org kernel parameters for a list and descriptions of kernel arguments.

In the following procedure, you create a MachineConfig that identifies:

  • A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.

  • Kernel arguments that are appended to the end of the existing kernel arguments.

  • A label that indicates where in the list of MachineConfigs the change is applied.

Prerequisites
  • Have administrative privilege to a working OKD cluster.

Procedure
  1. List existing MachineConfigs for your OKD cluster to determine how to label your MachineConfig:

    $ oc get MachineConfig
    Example output
    NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
    00-master                                                   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    00-worker                                                   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    01-master-container-runtime                                 577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    01-master-kubelet                                           577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    01-worker-container-runtime                                 577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    01-worker-kubelet                                           577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    99-master-1131169f-dae9-11e9-b5dd-12a845e8ffd8-registries   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    99-master-ssh                                                                                          3.1.0             30m
    99-worker-114e8ac7-dae9-11e9-b5dd-12a845e8ffd8-registries   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    99-worker-ssh                                                                                          3.1.0             30m
    rendered-master-b3729e5f6124ca3678188071343115d0            577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
    rendered-worker-18ff9506c718be1e8bd0a066850065b7            577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             30m
  2. Create a MachineConfig file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxoff.yaml)

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker(1)
      name: 05-worker-kernelarg-selinuxoff(2)
    spec:
      config:
        ignition:
          version: 3.1.0
      kernelArguments:
        - selinux=0(3)
    1 Applies the new kernel argument only to worker nodes.
    2 Named to identify where it fits among the MachineConfigs (05) and what it does (adds a kernel argument to turn off SELinux).
    3 Identifies the exact kernel argument as selinux=0.
  3. Create the new MachineConfig:

    $ oc create -f 05-worker-kernelarg-selinuxoff.yaml
  4. Check the MachineConfigs to see that the new one was added:

    $ oc get MachineConfig
    Example output
    NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
    00-master                                                   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    00-worker                                                   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    01-master-container-runtime                                 577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    01-master-kubelet                                           577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    01-worker-container-runtime                                 577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    01-worker-kubelet                                           577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    
    05-worker-kernelarg-selinuxoff                                                                         3.1.0             105s
    
    99-master-1131169f-dae9-11e9-b5dd-12a845e8ffd8-registries   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    99-master-ssh                                                                                          3.1.0             30m
    99-worker-114e8ac7-dae9-11e9-b5dd-12a845e8ffd8-registries   577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    99-worker-ssh                                                                                          3.1.0             31m
    rendered-master-b3729e5f6124ca3678188071343115d0            577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
    rendered-worker-18ff9506c718be1e8bd0a066850065b7            577c2d527b09cd7a481a162c50592139caa15e20   3.1.0             31m
  5. Check the nodes:

    $ oc get nodes
    Example output
    NAME                           STATUS                     ROLES    AGE   VERSION
    ip-10-0-136-161.ec2.internal   Ready                      worker   28m   v1.19.0
    ip-10-0-136-243.ec2.internal   Ready                      master   34m   v1.19.0
    ip-10-0-141-105.ec2.internal   Ready,SchedulingDisabled   worker   28m   v1.19.0
    ip-10-0-142-249.ec2.internal   Ready                      master   34m   v1.19.0
    ip-10-0-153-11.ec2.internal    Ready                      worker   28m   v1.19.0
    ip-10-0-153-150.ec2.internal   Ready                      master   34m   v1.19.0

    You can see that scheduling on each worker node is disabled as the change is being applied.

  6. Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in /proc/cmdline on the host):

    $ oc debug node/ip-10-0-141-105.ec2.internal
    Example output
    Starting pod/ip-10-0-141-105ec2internal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.2# cat /host/proc/cmdline
    BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
    rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
    coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 selinux=0
    
    sh-4.2# exit

    You should see the selinux=0 argument added to the other kernel arguments.

Adding a real-time kernel to nodes

Some OKD workloads require a high degree of determinism. While Linux is not a real-time operating system, the Linux real-time kernel includes a preemptive scheduler that provides the operating system with real-time characteristics.

If your OKD workloads require these real-time characteristics, you can switch your machines to the Linux real-time kernel. For OKD, Latest you can make this switch using a MachineConfig object. Although making the change is as simple as changing a MachineConfig kernelType setting to realtime, there are a few other considerations before making the change:

  • Currently, real-time kernel is supported only on worker nodes, and only for radio access network (RAN) use.

  • The following procedure is fully supported with bare metal installations that use systems that are certified for Red Hat Enterprise Linux for Real Time 8.

  • Real-time support in OKD is limited to specific subscriptions.

  • The following procedure is also supported for use with Google Cloud Platform.

Prerequisites
  • Have a running OKD cluster (version 4.4 or later).

  • Log in to the cluster as a user with administrative privileges.

Procedure
  1. Create a MachineConfig for the real-time kernel: Create a YAML file (for example, 99-worker-realtime.yaml) that contains a MachineConfig object for the realtime kernelType. This example tells the cluster to use a real-time kernel for all worker nodes:

    $ cat << EOF > 99-worker-realtime.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "worker"
      name: 99-worker-realtime
    spec:
      kernelType: realtime
    EOF
  2. Add the MachineConfig to the cluster. Type the following to add the MachineConfig to the cluster:

    $ oc create -f 99-worker-realtime.yaml
  3. Check the real-time kernel: Once each impacted node reboots, log in to the cluster and run the following commands to make sure that the real-time kernel has replaced the regular kernel for the set of nodes you configured:

    $ oc get nodes
    Example output
    NAME                                        STATUS  ROLES    AGE   VERSION
    ip-10-0-143-147.us-east-2.compute.internal  Ready   worker   103m  v1.19.0
    ip-10-0-146-92.us-east-2.compute.internal   Ready   worker   101m  v1.19.0
    ip-10-0-169-2.us-east-2.compute.internal    Ready   worker   102m  v1.19.0
    $ oc debug node/ip-10-0-143-147.us-east-2.compute.internal
    Example output
    Starting pod/ip-10-0-143-147us-east-2computeinternal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.4# uname -a
    Linux <worker_node> 4.18.0-147.3.1.rt24.96.el8_1.x86_64 #1 SMP PREEMPT RT
            Wed Nov 27 18:29:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

    The kernel name contains rt and text “PREEMPT RT” indicates that this is a real-time kernel.

  4. To go back to the regular kernel, delete the MachineConfig object:

    $ oc delete -f 99-worker-realtime.yaml

Configuring journald settings

If you need to configure settings for the journald service on OKD nodes, you can do that by modifying the appropriate configuration file and passing the file to the appropriate pool of nodes as a MachineConfig.

This procedure describes how to modify journald rate limiting settings in the /etc/systemd/journald.conf file and apply them to worker nodes. See the journald.conf man page for information on how to use that file.

Prerequisites
  • Have a running OKD cluster (version 4.4 or later).

  • Log in to the cluster as a user with administrative privileges.

Procedure
  1. Create the contents of the /etc/systemd/journald.conf file and encode it as base64. For example:

    $ cat > /tmp/jrnl.conf <<EOF
    # Disable rate limiting
    RateLimitInterval=1s
    RateLimitBurst=10000
    Storage=volatile
    Compress=no
    MaxRetentionSec=30s
    EOF
  2. Convert the temporary journal.conf file to base64 and save it into a variable (jrnl_cnf):

    $ export jrnl_cnf=$( cat /tmp/jrnl.conf | base64 -w0 )
    $ echo $jrnl_cnf
    IyBEaXNhYmxlIHJhdGUgbGltaXRpbmcKUmF0ZUxpbWl0SW50ZXJ2YWw9MXMKUmF0ZUxpbWl0QnVyc3Q9MTAwMDAKU3RvcmFnZT12b2xhdGlsZQpDb21wcmVzcz1ubwpNYXhSZXRlbnRpb25TZWM9MzBzCg==
  3. Create the MachineConfig, including the encoded contents of journald.conf (jrnl_cnf variable):

    $ cat > /tmp/40-worker-custom-journald.yaml <<EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 40-worker-custom-journald
    spec:
      config:
        ignition:
          config: {}
          security:
            tls: {}
          timeouts: {}
          version: 3.1.0
        networkd: {}
        passwd: {}
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,${jrnl_cnf}
              verification: {}
            filesystem: root
            mode: 420
            path: /etc/systemd/journald.conf
        systemd: {}
      osImageURL: ""
    EOF
  4. Apply the MachineConfig to the pool:

    $ oc apply -f /tmp/40-worker-custom-journald.yaml
  5. Check that the new MachineConfig has been applied and that the nodes are not in a degraded state. It might take a few minutes. The worker pool will show the updates in progress, as each node successfully has the new MachineConfig applied:

    $ oc get machineconfigpool
    NAME   CONFIG             UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
    master rendered-master-35 True    False    False    3            3                 3                   0                    34m
    worker rendered-worker-d8 False   True     False    3            1                 1                   0                    34m
  6. To check that the change was applied, you can log in to a worker node:

    $ oc get node | grep worker
    ip-10-0-0-1.us-east-2.compute.internal   Ready    worker   39m   v0.0.0-master+$Format:%h$
    $ oc debug node/ip-10-0-0-1.us-east-2.compute.internal
    Starting pod/ip-10-0-141-142us-east-2computeinternal-debug ...
    ...
    sh-4.2# chroot /host
    sh-4.4# cat /etc/systemd/journald.conf
    # Disable rate limiting
    RateLimitInterval=1s
    RateLimitBurst=10000
    Storage=volatile
    Compress=no
    MaxRetentionSec=30s
    sh-4.4# exit

Configuring container image registry settings

Settings that define the registries that OKD uses to get container images are held in the /etc/containers/registries.conf file by default. In that file, you can set registries to not require authentication (insecure), point to mirrored registries, or set which registries are searched for unqualified container image requests.

Rather than change registries.conf directly, you can drop configuration files into the /etc/containers/registries.d directory that are then automatically appended to the system’s existing registries.conf settings.

This procedure describes how to create a registries.d file (/etc/containers/registries.s/99-worker-unqualified-search-registries.conf) that adds quay.io as an unqualified search registry (one that OKD can search when it tries to pull an image name that does not include the registry name). It includes base64-encoded content that you can examine as follows:

$ echo dW5xdWFsaWZpZWQtc2VhcmNoLXJlZ2lzdHJpZXMgPSBbJ3JlZ2lzdHJ5LmFjY2Vzcy5yZWRoYXQuY29tJywgJ2RvY2tlci5pbycsICdxdWF5LmlvJ10K | base64 -d
unqualified-search-registries = ['registry.access.redhat.com', 'docker.io', 'quay.io']

See the containers-registries.conf man page for the format for the registries.conf and registries.d directory files.

Prerequisites
  • Have a running OKD cluster (version 4.4 or later).

  • Log in to the cluster as a user with administrative privileges.

Procedure
  1. Create a YAML file (myregistry.yaml) to hold the contents of the /etc/containers/registries.d/99-worker-unqualified-search-registries.conf file, including the encoded base64 contents for that file. For example:

    $ cat > /tmp/myregistry.yaml <<EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 99-worker-unqualified-search-registries
    spec:
      config:
        ignition:
          version: 3.1.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,dW5xdWFsaWZpZWQtc2VhcmNoLXJlZ2lzdHJpZXMgPSBbJ3JlZ2lzdHJ5LmFjY2Vzcy5yZWRoYXQuY29tJywgJ2RvY2tlci5pbycsICdxdWF5LmlvJ10K
            filesystem: root
            mode: 0420
            path: /etc/containers/registries.d/99-worker-unqualified-search-registries.conf
    EOF
  2. Apply the MachineConfig to the pool:

    $ oc apply -f /tmp/myregistry.yaml
  3. Check that the new MachineConfig has been applied and that the nodes are not in a degraded state. It might take a few minutes. The worker pool will show the updates in progress, as each machine successfully has the new MachineConfig applied:

    $ oc get machineconfigpool
    NAME   CONFIG             UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
    master rendered-master-35 True    False    False    3            3                 3                   0                    34m
    worker rendered-worker-d8 False   True     False    3            1                 1                   0                    34m
  4. To check that the change was applied, you can log in to a worker node:

    $ oc get node | grep worker
    ip-10-0-0-1.us-east-2.compute.internal   Ready    worker   39m   v0.0.0-master+$Format:%h$
    $ oc debug node/ip-10-0-0-1.us-east-2.compute.internal
    Starting pod/ip-10-0-141-142us-east-2computeinternal-debug ...
    ...
    sh-4.2# chroot /host
    sh-4.4# cat /etc/containers/registries.d/99-worker-unqualified-search-registries.conf
    unqualified-search-registries = ['registry.access.redhat.com', 'docker.io', 'quay.io']
    sh-4.4# exit

Adding extensions to FCOS

FCOS is a minimal container-oriented RHEL operating system, designed to provide a common set of capabilities to OKD clusters across all platforms. While adding software packages to FCOS systems is generally discouraged, the MCO provides an extensions feature you can use to add a minimal set of features to FCOS nodes.

Currently, the following extension is available:

  • usbguard: Adding the usbguard extension protects FCOS systems from attacks from intrusive USB devices. See USBGuard for details.

The following procedure describes how to use a MachineConfig to add one or more extensions to your FCOS nodes.

Prerequisites
  • Have a running OKD cluster (version 4.6 or later).

  • Log in to the cluster as a user with administrative privileges.

Procedure
  1. Create a MachineConfig for extensions: Create a YAML file (for example, 80-extensions.yaml) that contains a MachineConfig extensions object. This example tells the cluster to add the usbguard extension.

    $ cat << EOF > 80-extensions.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 80-worker-extensions
    spec:
      config:
        ignition:
          version: 3.1.0
      extensions:
        - usbguard
    EOF
  2. Add the MachineConfig to the cluster. Type the following to add the MachineConfig to the cluster:

    $ oc create -f 80-extensions.yaml

    This sets all worker nodes to have rpm packages for usbguard installed.

  3. Check that the extensions were applied:

    $ oc get machineconfig 80-worker-extensions
    NAME                 GENERATEDBYCONTROLLER IGNITIONVERSION AGE
    80-worker-extensions                       3.1.0           57s
  4. Check that the new MachineConfig has been applied and that the nodes are not in a degraded state. It may take a few minutes. The worker pool will show the updates in progress, as each machine successfully has the new MachineConfig applied:

    $ oc get machineconfigpool
    NAME   CONFIG             UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
    master rendered-master-35 True    False    False    3            3                 3                   0                    34m
    worker rendered-worker-d8 False   True     False    3            1                 1                   0                    34m
  5. Check the extensions. To check that the extension was applied, run:

    $ oc get node | grep worker
    NAME                                        STATUS  ROLES    AGE   VERSION
    ip-10-0-169-2.us-east-2.compute.internal    Ready   worker   102m  v1.18.3
    $ oc debug node/ip-10-0-169-2.us-east-2.compute.internal
    ...
    To use host binaries, run `chroot /host`
    sh-4.4# chroot /host
    sh-4.4# rpm -q usbguard
    usbguard-0.7.4-4.el8.x86_64.rpm

Use the "Configuring crony time service" section as a model for how to go about adding other configuration files to OKD nodes.

Configuring MCO-related custom resources

Besides managing MachineConfigs, the MCO manages two custom resources (CRs): KubeletConfig and ContainerRuntimeConfig. Those CRs let you change node-level settings impacting how the Kubelet and CRI-O container runtime services behave.

Creating a KubeletConfig CRD to edit kubelet parameters

The kubelet configuration is currently serialized as an ignition configuration, so it can be directly edited. However, there is also a new kubelet-config-controller added to the Machine Config Controller (MCC). This allows you to create a KubeletConfig custom resource (CR) to edit the kubelet parameters.

Procedure
  1. Run:

    $ oc get machineconfig

    This provides a list of the available machine configuration objects you can select. By default, the two kubelet-related configs are 01-master-kubelet and 01-worker-kubelet.

  2. To check the current value of max Pods per node, run:

    # oc describe node <node-ip> | grep Allocatable -A6

    Look for value: pods: <value>.

    For example:

    # oc describe node ip-172-31-128-158.us-east-2.compute.internal | grep Allocatable -A6
    Example output
    Allocatable:
     attachable-volumes-aws-ebs:  25
     cpu:                         3500m
     hugepages-1Gi:               0
     hugepages-2Mi:               0
     memory:                      15341844Ki
     pods:                        250
  3. To set the max Pods per node on the worker nodes, create a custom resource file that contains the kubelet configuration. For example, change-maxPods-cr.yaml:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: set-max-pods
    spec:
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: large-pods
      kubeletConfig:
        maxPods: 500

    The rate at which the kubelet talks to the API server depends on queries per second (QPS) and burst values. The default values, 50 for kubeAPIQPS and 100 for kubeAPIBurst, are good enough if there are limited pods running on each node. Updating the kubelet QPS and burst rates is recommended if there are enough CPU and memory resources on the node:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: set-max-pods
    spec:
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: large-pods
      kubeletConfig:
        maxPods: <pod_count>
        kubeAPIBurst: <burst_rate>
        kubeAPIQPS: <QPS>
    1. Run:

      $ oc create -f change-maxPods-cr.yaml
    2. Run:

      $ oc get kubeletconfig

      This should return set-max-pods.

      Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.

  4. Check for maxPods changing for the worker nodes:

    $ oc describe node
    1. Verify the change by running:

      $ oc get kubeletconfigs set-max-pods -o yaml

      This should show a status of True and type:Success

Creating a ContainerRuntime CR to edit CRI-O parameters

The ContainerRuntimeConfig custom resource definition (CRD) provides a structured way of changing settings associated with the OKD CRI-O runtime. Using a ContainerRuntimeConfig custom resource (CR), you select the configuration values you want and the MCO handles rebuilding the crio.conf and storage.conf configuration files.

Parameters you can set in a ContainerRuntimeConfig CR include:

  • PIDs limit: Sets the maximum number of processes allowed in a container. By default, the limit is set to 1024 (pids_limit = 1024).

  • Log level: Sets the level of verbosity for log messages. The default is info (log_level = info). Other options include fatal, panic, error, warn, debug, and trace.

  • Overlay size: Sets the maxim size of a container image. The default is 10 GB.

  • Maximum log size: Sets the maximum size allowed for the container log file. The default maximum log size is unlimited (log_size_max = -1). If it is set to a positive number, it must be at least 8192 to not be smaller than `conmon’s read buffer. Conmon is a program that monitors communications between a container manager (such as Podman or CRI-O) and the OCI runtime (such as runc or crun) for a single container.

The following procedure describes how to change CRI-O settings using the ContainerRuntimeConfig CR.

Procedure
  1. To raise the pidsLimit to 2048, set the logLevel to debug, and set the overlaySize to 8 GB, create a CR file (for example, overlay-size.yaml) that contains that setting:

    $ cat << EOF > /tmp/overlay-size.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: ContainerRuntimeConfig
    metadata:
     name: overlay-size
    spec:
     machineConfigPoolSelector:
       matchLabels:
         custom-crio: overlay-size
     containerRuntimeConfig:
       pidsLimit: 2048
       logLevel: debug
       overlaySize: 8G
    EOF
  2. To apply the ContainerRuntimeConfig settings, run:

    $ oc create -f /tmp/overlay-size
  3. To verify that the settings wer applied, run:

    $ oc get ContainerRuntimeConfig
    NAME           AGE
    overlay-size   3m19s
    
  4. To edit a pool of machines, such as worker, run the following command to open a MachineConfigPool:

    $ oc edit machineconfigpool worker
  5. Check that a new containerruntime object has appeared under the machineconfigs:

    $ oc get machineconfigs | grep containerrun
    99-worker-generated-containerruntime   2c9371fbb673b97a6fe8b1c52691999ed3a1bfc2  3.1.0  31s
  6. Monitor the Machine Config Pool as the changes are rolled into the machines until all are shown as ready:

    $ oc get mcp worker
    NAME    CONFIG               UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
    worker  rendered-worker-169  False    True      False     3             1                  1                    0                     9h
  7. Open an oc debug session to a worker node and run chroot /host.

  8. Verify the changes by running:

    $ crio config | egrep 'log_level|pids_limit'
    pids_limit = 2048
    log_level = "debug"
    $ head -n 7 /etc/containers/storage.conf
    [storage]
      driver = "overlay"
      runroot = "/var/run/containers/storage"
      graphroot = "/var/lib/containers/storage"
      [storage.options]
        additionalimagestores = []
        size = "8G"