OKD runs on FCOS. You can follow these procedures to troubleshoot problems related to the operating system.

Investigating kernel crashes

Enabling kdump

The kdump service, included in kexec-tools, provides a crash-dumping mechanism. You can use this service to save the contents of the system’s memory for later analysis.

The kdump service is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

FCOS ships with kexec-tools, but manual configuration is required to enable kdump.

Procedure

Perform the following steps to enable kdump on FCOS.

  1. To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command:

    # rpm-ostree kargs --append='crashkernel=256M'
  2. By default, the path in which the vmcore will be saved is /var/crash. It is also possible to write the dump over the network or to some other location on the local system by editing /etc/kdump.conf. For example, assuming /var/usrlocal/cores exists, enter the following command to edit /etc/kdump.conf to save the vmcore to /var/usrlocal/cores:

    # sed -i "s/^path.*/path \/var\/usrlocal\/cores/" /etc/kdump.conf

    For additional information, see kdump.conf, a manual page for the /etc/kdump.conf configuration file containing the full documentation of available options, and note the comments in /etc/kdump.conf and /etc/sysconfig/kdump.

  3. Enable the kdump systemd service.

    # systemctl enable kdump.service
  4. Reboot your system.

    # systemctl reboot
  5. Ensure that kdump has loaded a crash kernel by checking that the kdump.service has started and exited successfully and that cat /sys/kernel/kexec_crash_loaded prints 1.

Enabling kdump on day-1

The kdump service is intended to be enabled per-node to debug kernel problems. It is not recommended to enable kdump on all of your nodes in the cluster. Although machine-specific machine configs are not yet supported, you can perform the previous steps through a systemd unit in a MachineConfig object on day-1 and have kdump enabled on all nodes in the cluster. You can create a machine config object and inject that object into the set of manifest files used by Ignition during cluster setup. See "Customizing nodes" in the Installing → Installation configuration section for more information and examples on how to use Ignition configs.

Testing the kdump configuration

See the Capturing the Dump section in the Fedora documentation for kdump.

Analyzing a core dump

See the Dump Analysis section in the Fedora documentation for kdump.

Additional resources