×

If your control plane API endpoint is available, but worker nodes did not join the hosted cluster on AWS, you can debug worker node issues.

Checking why worker nodes did not join the hosted cluster

To troubleshoot why worker nodes did not join the hosted cluster on AWS, you can check the following information.

Prerequisites
Procedure
  1. Address any error messages in the status of the HostedCluster and NodePool resources:

    1. Check the status of the HostedCluster resource by running the following command:

      $ oc get hc -n <hosted_cluster_namespace> <hosted_cluster_name> -o jsonpath='{.status}'
    2. Check the status of the NodePool resource by running the following command:

      $ oc get hc -n <hosted_cluster_namespace> <hosted_cluster_name> -o jsonpath='{.status}'

      If you did not find any error messages in the status of the HostedCluster and NodePool resources, proceed to the next step.

  2. Check if your worker machines are created by running the following commands, replacing values as necessary:

    $ HC_NAMESPACE="clusters"
    $ HC_NAME="cluster_name"
    $ CONTROL_PLANE_NAMESPACE="${HC_NAMESPACE}-${HC_NAME}"
    $ oc get machines.cluster.x-k8s.io -n $CONTROL_PLANE_NAMESPACE
    $ oc get awsmachines -n $CONTROL_PLANE_NAMESPACE
  3. If worker machines do not exist, check if the machinedeployment and machineset resources are created by running the following commands:

    $ oc get machinedeployment -n $CONTROL_PLANE_NAMESPACE
    $ oc get machineset -n $CONTROL_PLANE_NAMESPACE
  4. If the machinedeployment and machineset resources do not exist, check logs of the HyperShift Operator by running the following command:

    $ oc logs deployment/operator -n hypershift
  5. If worker machines exist but are not provisioned in the hosted cluster, check the log of the cluster API provider by running the following command:

    $ oc logs deployment/capi-provider -c manager -n $CONTROL_PLANE_NAMESPACE
  6. If worker machines exist and are provisioned in the cluster, ensure that machines are initialized through Ignition successfully by checking the system console logs. Check the system console logs of every machine by using the console-logs utility by running the following command:

    $ ./bin/hypershift console-logs aws --name $HC_NAME --aws-creds ~/.aws/credentials --output-dir /tmp/console-logs

    You can access the system console logs in the /tmp/console-logs directory. The control plane exposes the Ignition endpoint. If you see an error related to the Ignition endpoint, then the Ignition endpoint is not accessible from the worker nodes through https.

  7. If worker machines are provisioned and initialized through Ignition successfully, you can extract and access the journal logs of every worker machine by creating a bastion machine. A bastion machine allows you to access worker machines by using SSH.

    1. Create a bastion machine by running the following command:

      $ ./bin/hypershift create bastion aws --aws-creds ~/.aws/credentials --name $CLUSTER_NAME --ssh-key-file /tmp/ssh/id_rsa.pub
    2. Optional: If you used the --generate-ssh flag when creating the cluster, you can extract the public and private key for the cluster by running the following commands:

      $ mkdir /tmp/ssh
      $ oc get secret -n clusters ${HC_NAME}-ssh-key -o jsonpath='{ .data.id_rsa }' | base64 -d > /tmp/ssh/id_rsa
      $ oc get secret -n clusters ${HC_NAME}-ssh-key -o jsonpath='{ .data.id_rsa\.pub }' | base64 -d > /tmp/ssh/id_rsa.pub
    3. Extract journal logs from the every worker machine by running the following commands:

      $ mkdir /tmp/journals
      $ INFRAID="$(oc get hc -n clusters $CLUSTER_NAME -o jsonpath='{ .spec.infraID }')"
      $ SSH_PRIVATE_KEY=/tmp/ssh/id_rsa
      $ ./test/e2e/util/dump/copy-machine-journals.sh /tmp/journals

      You must place journal logs in the /tmp/journals directory in a compressed format. Check for the error that indicates why kubelet did not join the cluster.