×

By default, when you make certain changes to the fields in a MachineConfig object, the Machine Config Operator (MCO) drains and reboots the nodes associated with that machine config. However, you can create a node disruption policy that defines a set of changes to some Ignition config objects that would require little or no disruption to your workloads.

A node disruption policy allows you to define the configuration changes that cause a disruption to your cluster, and which changes do not. This allows you to reduce node downtime when making small machine configuration changes in your cluster. To configure the policy, you modify the MachineConfiguration object, which is in the openshift-machine-config-operator namespace. See the example node disruption policies in the MachineConfiguration objects that follow.

There are machine configuration changes that always require a reboot, regardless of any node disruption policies. For more information, see About the Machine Config Operator.

After you create the node disruption policy, the MCO validates the policy to search for potential issues in the file, such as problems with formatting. The MCO then merges the policy with the cluster defaults and populates the status.nodeDisruptionPolicyStatus fields in the machine config with the actions to be performed upon future changes to the machine config. The configurations in your policy always overwrite the cluster defaults.

The MCO does not validate whether a change can be successfully applied by your node disruption policy. Therefore, you are responsible to ensure the accuracy of your node disruption policies.

For example, you can configure a node disruption policy so that sudo configurations do not require a node drain and reboot. Or, you can configure your cluster so that updates to sshd are applied with only a reload of that one service.

You can control the behavior of the MCO when making the changes to the following Ignition configuration objects:

  • configuration files: You add to or update the files in the /var or /etc directory. You can configure a policy for a specific file anywhere in the directory or for a path to a specific directory. For a path, a change or addition to any file in that directory triggers the policy.

    If a file is included in more than one policy, only the policy with the best match to that file is applied.

    For example, if you have a policy for the /etc/ directory and a policy for the /etc/pki/ directory, a change to the /etc/pki/tls/certs/ca-bundle.crt file would apply the etc/pki policy.

  • systemd units: You create and set the status of a systemd service or modify a systemd service.

  • users and groups: You change SSH keys in the passwd section postinstallation.

  • ICSP, ITMS, IDMS objects: You can remove mirroring rules from an ImageContentSourcePolicy (ICSP), ImageTagMirrorSet (ITMS), and ImageDigestMirrorSet (IDMS) object.

When you make any of these changes, the node disruption policy determines which of the following actions are required when the MCO implements the changes:

  • Reboot: The MCO drains and reboots the nodes. This is the default behavior.

  • None: The MCO does not drain or reboot the nodes. The MCO applies the changes with no further action.

  • Drain: The MCO cordons and drains the nodes of their workloads. The workloads restart with the new configurations.

  • Reload: For services, the MCO reloads the specified services without restarting the service.

  • Restart: For services, the MCO fully restarts the specified services.

  • DaemonReload: The MCO reloads the systemd manager configuration.

  • Special: This is an internal MCO-only action and cannot be set by the user.

  • The Reboot and None actions cannot be used with any other actions, as the Reboot and None actions override the others.

  • Actions are applied in the order that they are set in the node disruption policy list.

  • If you make other machine config changes that do require a reboot or other disruption to the nodes, that reboot supercedes the node disruption policy actions.

Example node disruption policies

The following example MachineConfiguration objects contain a node disruption policy.

A MachineConfiguration object and a MachineConfig object are different objects. A MachineConfiguration object is a singleton object in the MCO namespace that contains configuration parameters for the MCO operator. A MachineConfig object defines changes that are applied to a machine config pool.

The following example MachineConfiguration object shows no user defined policies. The default node disruption policy values are shown in the status stanza.

Default node disruption policy
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
  name: cluster
spec:
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
status:
  nodeDisruptionPolicyStatus:
    clusterPolicies:
      files:
      - actions:
        - type: None
        path: /etc/mco/internal-registry-pull-secret.json
      - actions:
        - type: None
        path: /var/lib/kubelet/config.json
      - actions:
        - reload:
            serviceName: crio.service
          type: Reload
        path: /etc/machine-config-daemon/no-reboot/containers-gpg.pub
      - actions:
        - reload:
            serviceName: crio.service
          type: Reload
        path: /etc/containers/policy.json
      - actions:
        - type: Special
        path: /etc/containers/registries.conf
      - actions:
        - reload:
            serviceName: crio.service
          type: Reload
        path: /etc/containers/registries.d
      - actions:
        - type: None
        path: /etc/nmstate/openshift
      - actions:
        - restart:
            serviceName: coreos-update-ca-trust.service
          type: Restart
        - restart:
            serviceName: crio.service
          type: Restart
        path: /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
      sshkey:
        actions:
        - type: None
  observedGeneration: 9

In the following example, when changes are made to the SSH keys, the MCO drains the cluster nodes, reloads the crio.service, reloads the systemd configuration, and restarts the crio-service.

Example node disruption policy for an SSH key change
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
# ...
spec:
  nodeDisruptionPolicy:
    sshkey:
      actions:
      - type: Drain
      - reload:
          serviceName: crio.service
        type: Reload
      - type: DaemonReload
      - restart:
          serviceName: crio.service
        type: Restart
# ...

In the following example, when changes are made to the /etc/chrony.conf file, the MCO reloads the chronyd.service on the cluster nodes. If files are added to or modified in the /var/run directory, the MCO applies the changes with no further action.

Example node disruption policy for a configuration file change
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
# ...
spec:
  nodeDisruptionPolicy:
    files:
    - actions:
      - reload:
          serviceName: chronyd.service
        type: Reload
        path: /etc/chrony.conf
    - actions:
      - type: None
      path: /var/run

In the following example, when changes are made to the auditd.service systemd unit, the MCO drains the cluster nodes, reloads the crio.service, reloads the systemd manager configuration, and restarts the crio.service.

Example node disruption policy for a systemd unit change
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
# ...
spec:
  nodeDisruptionPolicy:
    units:
      - name: auditd.service
        actions:
          - type: Drain
          - type: Reload
            reload:
              serviceName: crio.service
          - type: DaemonReload
          - type: Restart
            restart:
              serviceName: crio.service

In the following example, when changes are made to the registries.conf file, such as by editing an ImageContentSourcePolicy (ICSP) object, the MCO does not drain or reboot the nodes and applies the changes with no further action.

Example node disruption policy for a registries.conf file change
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
# ...
spec:
  nodeDisruptionPolicy:
    files:
      - actions:
        - type: None
        path: /etc/containers/registries.conf

Configuring node restart behaviors upon machine config changes

You can create a node disruption policy to define the machine configuration changes that cause a disruption to your cluster, and which changes do not.

You can control how your nodes respond to changes in the files in the /var or /etc directory, the systemd units, the SSH keys, and the registries.conf file.

When you make any of these changes, the node disruption policy determines which of the following actions are required when the MCO implements the changes:

  • Reboot: The MCO drains and reboots the nodes. This is the default behavior.

  • None: The MCO does not drain or reboot the nodes. The MCO applies the changes with no further action.

  • Drain: The MCO cordons and drains the nodes of their workloads. The workloads restart with the new configurations.

  • Reload: For services, the MCO reloads the specified services without restarting the service.

  • Restart: For services, the MCO fully restarts the specified services.

  • DaemonReload: The MCO reloads the systemd manager configuration.

  • Special: This is an internal MCO-only action and cannot be set by the user.

  • The Reboot and None actions cannot be used with any other actions, as the Reboot and None actions override the others.

  • Actions are applied in the order that they are set in the node disruption policy list.

  • If you make other machine config changes that do require a reboot or other disruption to the nodes, that reboot supercedes the node disruption policy actions.

Prerequisites
  • You have enabled the TechPreviewNoUpgrade feature set by using the feature gates. For more information, see "Enabling features using feature gates".

    Enabling the TechPreviewNoUpgrade feature set on your cluster prevents minor version updates. The TechPreviewNoUpgrade feature set cannot be disabled. Do not enable this feature set on production clusters.

Procedure
  1. Edit the machineconfigurations.operator.openshift.io object to define the node disruption policy:

    $ oc edit MachineConfiguration cluster -n openshift-machine-config-operator
  2. Add a node disruption policy similar to the following:

    apiVersion: operator.openshift.io/v1
    kind: MachineConfiguration
    metadata:
      name: cluster
    # ...
    spec:
      nodeDisruptionPolicy: (1)
        files: (2)
        - actions: (3)
          - reload: (4)
              serviceName: chronyd.service (5)
            type: Reload
          path: /etc/chrony.conf (6)
        sshkey: (7)
          actions:
          - type: Drain
          - reload:
              serviceName: crio.service
            type: Reload
          - type: DaemonReload
          - restart:
              serviceName: crio.service
            type: Restart
        units: (8)
        - actions:
          - type: Drain
          - reload:
              serviceName: crio.service
            type: Reload
          - type: DaemonReload
          - restart:
              serviceName: crio.service
            type: Restart
          name: test.service
    1 Specifies the node disruption policy.
    2 Specifies a list of machine config file definitions and actions to take to changes on those paths. This list supports a maximum of 50 entries.
    3 Specifies the series of actions to be executed upon changes to the specified files. Actions are applied in the order that they are set in this list. This list supports a maximum of 10 entries.
    4 Specifies that the listed service is to be reloaded upon changes to the specified files.
    5 Specifies the full name of the service to be acted upon.
    6 Specifies the location of a file that is managed by a machine config. The actions in the policy apply when changes are made to the file in path.
    7 Specifies a list of service names and actions to take upon changes to the SSH keys in the cluster.
    8 Specifies a list of systemd unit names and actions to take upon changes to those units.
Verification
  • View the MachineConfiguration object file that you created:

    $ oc get MachineConfiguration/cluster -o yaml
    Example output
    apiVersion: operator.openshift.io/v1
    kind: MachineConfiguration
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: cluster
    # ...
    status:
      nodeDisruptionPolicyStatus: (1)
        clusterPolicies:
          files:
    # ...
          - actions:
            - reload:
                serviceName: chronyd.service
              type: Reload
            path: /etc/chrony.conf
          sshkey:
            actions:
            - type: Drain
            - reload:
                serviceName: crio.service
              type: Reload
            - type: DaemonReload
            - restart:
                serviceName: crio.service
              type: Restart
          units:
          - actions:
            - type: Drain
            - reload:
                serviceName: crio.service
              type: Reload
            - type: DaemonReload
            - restart:
                serviceName: crio.service
              type: Restart
            name: test.se
    # ...
    1 Specifies the current cluster-validated policies.