Troubleshoot cluster creation or update issues

This page shows you how to resolve issues related to installing or upgrading GKE on Azure.

If you need additional assistance, reach out to Cloud Customer Care.

Cluster creation failures

When you make a request to create a cluster, GKE on Azure first runs a set of pre-flight tests to verify the request. If the cluster creation fails, it can be either because one of these pre-flight tests failed or because a step in the cluster creation process itself didn't complete.

If a pre-flight test fails, your cluster doesn't create any resources, and returns information on the error to you directly. For example, if you try to create a cluster with the name invalid%%%name, the pre-flight test for a valid cluster name fails and the request returns the following error:

ERROR: (gcloud.container.azure.clusters.create) INVALID_ARGUMENT: must be
between 1-63 characters, valid characters are /[a-z][0-9]-/, should start with a
letter, and end with a letter or a number: "invalid%%%name",
field: azure_cluster_id

Cluster creation can also fail after the pre-flight tests have passed. This can happen several minutes after cluster creation has begun, after GKE on Azure has created resources in Google Cloud and Azure. In this case, an Azure resource will exist in your Google Cloud project with its state set to ERROR.

To get details about the failure, run the following command:

gcloud container azure clusters describe CLUSTER_NAME \
    --location GOOGLE_CLOUD_LOCATION \
    --format "value(state, errors)"

Replace the following:

  • CLUSTER_NAME with the name of the cluster whose state you're querying
  • GOOGLE_CLOUD_LOCATION with the name of the Google Cloud region that manages this Azure cluster

Alternatively, you can get details about the creation failure by describing the Operation resource associated with the create cluster API call.

gcloud container azure operations describe OPERATION_ID

Replace OPERATION_ID with the ID of the operation that created the cluster. If you don't have the operation ID of your cluster creation request, you can fetch it with the following command:

gcloud container azure operations list \
    --location GOOGLE_CLOUD_LOCATION

Use the timestamp or related information to identify the cluster creation operation of interest.

Cluster update failures

When you update a cluster, just as when you create a new cluster, GKE on Azure first runs a set of pre-flight tests to verify the request. If the cluster update fails, it can be either because one of these pre-flight tests failed or because a step in the cluster update process itself didn't complete.

If a pre-flight test fails, your cluster doesn't update any resources, and returns information on the error to you directly. For example, if you try to update a cluster to use an SSH key pair with name test_ec2_keypair, the pre-flight test tries to fetch the EC2 key pair and fails and the request returns the following error:

ERROR: (gcloud.container.azure.clusters.update) INVALID_ARGUMENT: key pair
"test_ec2_keypair" not found,
field: azure_cluster.control_plane.ssh_config.ec2_key_pair

Cluster updates can also fail after the pre-flight tests have passed. This can happen several minutes after cluster update has begun, and your Azure resource in your Google Cloud project has its state set to DEGRADED.

To get details about the failure and the related operation, follow the steps described in cluster creation failures.

What's next