Troubleshooting GKE Autopilot nodes


For Cloud Customer Care to provide advanced troubleshooting for your Autopilot nodes, perform the following tasks to grant Customer Care access to nodes in your Autopilot cluster.

Before you begin

  1. Contact Customer Care and obtain the member name that is required to access your cluster, which will be used in the instructions below.
  2. Ensure you have a user that can grant changes to the resources mentioned in the instructions below.

Gather required information

The following information is needed for troubleshooting:

  • NODE_NAME: The name of the node.
  • NODE_ZONE: The compute zone the node is in.
  • PROJECT_ID: The project ID.
  • COMPUTE_SERVICE_ACCT: The Compute Engine service account used in the cluster.
  • ORGANIZATION_ID: Required, if the project is in an organization.
  • CLUSTER_NETWORK: The cluster's network.
# Define the following variables in gcloud
NODE_NAME=<node name>
NODE_ZONE=<node zone>
PROJECT_ID=<project id>
ORGANIZATION_ID=<org id, if any>
COMPUTE_SERVICE_ACCT=<compute service account used in cluster>
CLUSTER_NETWORK=<cluster network>

Grant access to your Autopilot cluster

Access to your cluster is done using OS Login. The following Identity and Access Management (IAM) permissions are required and must be granted to the member name that you obtained from Customer Care:

gcloud compute instances add-iam-policy-binding ${NODE_NAME} \
    --zone=${NODE_ZONE} \
    --role=roles/compute.osAdminLogin \
    --member='group:SUPPORT_MEMBER'

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --role=roles/compute.viewer \
    --member='group:SUPPORT_MEMBER'

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --role=roles/iap.tunnelResourceAccessor \
    --member='group:SUPPORT_MEMBER'

# If project is in an organization:
gcloud organizations add-iam-policy-binding ${ORGANIZATION_ID}  \
    --role=roles/compute.osLoginExternalUser \
    --member='group:SUPPORT_MEMBER'

gcloud iam service-accounts add-iam-policy-binding ${COMPUTE_SERVICE_ACCT} \
    --role=roles/iam.serviceAccountUser \
    --member='group:SUPPORT_MEMBER'

Replace SUPPORT_MEMBER with the member name provided by Customer Care.

Grant networking access through IAP

In addition to the necessary permissions, access to nodes through IAP is needed. For details, see Using IAP for TCP forwarding.

# Allow access through IAP
gcloud compute firewall-rules create allow-ssh-ingress-from-iap \
    --direction=INGRESS \
    --action=allow \
    --rules=tcp:22 \
    --source-ranges=35.235.240.0/20
    --network=$CLUSTER_NETWORK

Clean up and revoke access

After Customer Care has completed their debugging, perform the following tasks to clean up and revoke access to your Autopilot nodes.

  1. Remove the IAM permissions. Replace SUPPORT_MEMBER with the member name provided by Customer Care.

    gcloud iam service-accounts remove-iam-policy-binding ${COMPUTE_SERVICE_ACCT} \
        --role=roles/iam.serviceAccountUser \
        --member='group:SUPPORT_MEMBER'
    
    # If project is in an organization:
    gcloud organizations remove-iam-policy-binding ${ORGANIZATION_ID} \
        --role=roles/compute.osLoginExternalUser \
        --member='group:SUPPORT_MEMBER'
    
    gcloud projects remove-iam-policy-binding ${PROJECT_ID} \
        --role=roles/iap.tunnelResourceAccessor \
        --member='group:SUPPORT_MEMBER'
    
    gcloud projects remove-iam-policy-binding ${PROJECT_ID} \
        --role=roles/compute.viewer \
        --member='group:SUPPORT_MEMBER'
    
    gcloud compute instances remove-iam-policy-binding ${NODE_NAME} \
        --zone=${NODE_ZONE} \
        --role=roles/compute.osAdminLogin \
        --member='group:SUPPORT_MEMBER'
    
  2. Remove the firewall rule for IAP ranges:

    gcloud compute firewall-rules delete allow-ssh-ingress-from-iap