Strategies for Source-to-Image troubleshooting

Use Source-to-Image (S2I) to build reproducible, Docker-formatted container images. You can create ready-to-run images by injecting application source code into a container image and assembling a new image. The new image incorporates the base image (the builder) and built source.

To determine where in the S2I process a failure occurs, you can observe the state of the pods relating to each of the following S2I stages:

  1. During the build configuration stage, a build pod is used to create an application container image from a base image and application source code.

  2. During the deployment configuration stage, a deployment pod is used to deploy application pods from the application container image that was built in the build configuration stage. The deployment pod also deploys other resources such as services and routes. The deployment configuration begins after the build configuration succeeds.

  3. After the deployment pod has started the application pods, application failures can occur within the running application pods. For instance, an application might not behave as expected even though the application pods are in a Running state. In this scenario, you can access running application pods to investigate application failures within a pod.

When troubleshooting S2I issues, follow this strategy:

  1. Monitor build, deployment, and application pod status

  2. Determine the stage of the S2I process where the problem occurred

  3. Review logs corresponding to the failed stage

Gathering Source-to-Image diagnostic data

The S2I tool runs a build pod and a deployment pod in sequence. The deployment pod is responsible for deploying the application pods based on the application container image created in the build stage. Watch build, deployment and application pod status to determine where in the S2I process a failure occurs. Then, focus diagnostic data collection accordingly.

  • You have access to the cluster as a user with the cluster-admin role.

  • Your API service is still functional.

  • You have installed the OpenShift CLI (oc).

  1. Watch the pod status throughout the S2I process to determine at which stage a failure occurs:

    $ oc get pods -w  (1)
    1 Use -w to monitor pods for changes until you quit the command using Ctrl+C.
  2. Review a failed pod’s logs for errors.

    • If the build pod fails, review the build pod’s logs:

      $ oc logs -f pod/<application_name>-<build_number>-build

      Alternatively, you can review the build configuration’s logs using oc logs -f bc/<application_name>. The build configuration’s logs include the logs from the build pod.

    • If the deployment pod fails, review the deployment pod’s logs:

      $ oc logs -f pod/<application_name>-<build_number>-deploy

      Alternatively, you can review the deployment configuration’s logs using oc logs -f dc/<application_name>. This outputs logs from the deployment pod until the deployment pod completes successfully. The command outputs logs from the application pods if you run it after the deployment pod has completed. After a deployment pod completes, its logs can still be accessed by running oc logs -f pod/<application_name>-<build_number>-deploy.

    • If an application pod fails, or if an application is not behaving as expected within a running application pod, review the application pod’s logs:

      $ oc logs -f pod/<application_name>-<build_number>-<random_string>

Gathering application diagnostic data to investigate application failures

Application failures can occur within running application pods. In these situations, you can retrieve diagnostic information with these strategies:

  • Review events relating to the application pods.

  • Review the logs from the application pods, including application-specific log files that are not collected by the OpenShift Logging framework.

  • Test application functionality interactively and run diagnostic tools in an application container.

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

  1. List events relating to a specific application pod. The following example retrieves events for an application pod named my-app-1-akdlg:

    $ oc describe pod/my-app-1-akdlg
  2. Review logs from an application pod:

    $ oc logs -f pod/my-app-1-akdlg
  3. Query specific logs within a running application pod. Logs that are sent to stdout are collected by the OpenShift Logging framework and are included in the output of the preceding command. The following query is only required for logs that are not sent to stdout.

    1. If an application log can be accessed without root privileges within a pod, concatenate the log file as follows:

      $ oc exec my-app-1-akdlg -- cat /var/log/my-application.log
    2. If root access is required to view an application log, you can start a debug container with root privileges and then view the log file from within the container. Start the debug container from the project’s DeploymentConfig object. Pod users typically run with non-root privileges, but running troubleshooting pods with temporary root privileges can be useful during issue investigation:

      $ oc debug dc/my-deployment-configuration --as-root -- cat /var/log/my-application.log

      You can access an interactive shell with root access within the debug pod if you run oc debug dc/<deployment_configuration> --as-root without appending -- <command>.

  4. Test application functionality interactively and run diagnostic tools, in an application container with an interactive shell.

    1. Start an interactive shell on the application container:

      $ oc exec -it my-app-1-akdlg /bin/bash
    2. Test application functionality interactively from within the shell. For example, you can run the container’s entry point command and observe the results. Then, test changes from the command line directly, before updating the source code and rebuilding the application container through the S2I process.

    3. Run diagnostic binaries available within the container.

      Root privileges are required to run some diagnostic binaries. In these situations you can start a debug pod with root access, based on a problematic pod’s DeploymentConfig object, by running oc debug dc/<deployment_configuration> --as-root. Then, you can run diagnostic binaries as root from within the debug pod.

  5. If diagnostic binaries are not available within a container, you can run a host’s diagnostic binaries within a container’s namespace by using nsenter. The following example runs ip ad within a container’s namespace, using the host`s ip binary.

    1. Enter into a debug session on the target node. This step instantiates a debug pod called <node_name>-debug:

      $ oc debug node/my-cluster-node
    2. Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:

      # chroot /host

      OKD 4.14 cluster nodes running Fedora CoreOS (FCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. However, if the OKD API is not available, or the kubelet is not properly functioning on the target node, oc operations will be impacted. In such situations, it is possible to access nodes using ssh core@<node>.<cluster_name>.<base_domain> instead.

    3. Determine the target container ID:

      # crictl ps
    4. Determine the container’s process ID. In this example, the target container ID is a7fe32346b120:

      # crictl inspect a7fe32346b120 --output yaml | grep 'pid:' | awk '{print $2}'
    5. Run ip ad within the container’s namespace, using the host’s ip binary. This example uses 31150 as the container’s process ID. The nsenter command enters the namespace of a target process and runs a command in its namespace. Because the target process in this example is a container’s process ID, the ip ad command is run in the container’s namespace from the host:

      # nsenter -n -t 31150 -- ip ad

      Running a host’s diagnostic binaries within a container’s namespace is only possible if you are using a privileged container such as a debug node.

Additional resources