Troubleshoot issues with clusters enrolled in the GKE On-Prem API

This page shows you how to investigate issues creating an GKE on VMware user cluster in the Google Cloud console.

The admin cluster isn't displayed on the Cluster basics drop-down list

The admin cluster must be registered to a fleet before you can create user clusters in the Google Cloud console. If you don't see the admin cluster on the drop-down list on the Cluster basics section in the Google Cloud console, the admin cluster either wasn't registered, or it was registered using the gcloud container hub memberships register command.

Check the registration status:

  • In the Google Cloud console, go to the Anthos > Clusters page, and select the same Google Cloud project in which you attempted to create the user cluster.

    Go to the GKE Enterprise clusters page

    • If the admin cluster isn't displayed on the list, see Register an admin cluster.

    • If the admin cluster is displayed on the list, this indicates that the cluster was registered using the gcloud container hub memberships register command. This gcloud command doesn't properly register admin clusters.

To fix the registration issue:

  1. Delete the fleet membership of the admin cluster.

    gcloud container fleet memberships delete ADMIN_CLUSTER_NAME \
      --project=PROJECT_ID \
      --location=global
    
    • ADMIN_CLUSTER_NAME: the name of the admin cluster.
    • PROJECT_ID: the ID of your fleet host project. This is the project that you selected when you attempted to create the user cluster in the Google Cloud console.
  2. Follow the steps in Register an admin cluster to re-register the cluster.

Cluster creation errors

This section describes some errors that happen during cluster creation in the Google Cloud console.

Resource already exist error

User cluster creation fails with an error message similar to the following:

Resource 'projects/1234567890/locations/europe-west1/vmwareClusters/user-cluster1'
already exists
Request ID: 129290123128705826

This error message indicates that the cluster name is already in use.

One solution to fix the issue:

  1. Delete the cluster.

  2. Create the cluster again with a another name that doesn't conflict with an existing cluster.

Anti-affinity groups error

User cluster creation fails with an error message similar to the following:

- Validation Category: VCenter
    - [FATAL] Hosts for AntiAffinityGroups: Anti-affinity groups enabled with
    available vsphere host number 1 less than 3, please add more vsphere hosts
    or disable anti-affinity groups.

The VMware Distributed Resource Scheduler (DRS) anti-affinity rules require at least 3 physical hosts in your vSphere environment. To fix the issue, disable Anti-affinity groups in the Features section on the Cluster details page for your cluster, as follows:

  1. In the Google Cloud console, go to the GKE Enterprise clusters page.

    Go to the GKE Enterprise clusters page

  2. Select the Google Cloud project that the user cluster is in.

  3. In the cluster list, click the name of the cluster, and then click View details in the Details panel.

  4. In the Features section, click Edit.

  5. Clear Enable Anti-affinity groups, and click Done.

  6. The Google Cloud console displays Cluster status: changes in progress. Click Show Details to view the Resource status condition and Status messages.

Conflicting IP addresses error

User cluster creation fails with an error message similar to the following:

- Validation Category: Network Configuration
- [FAILURE] CIDR, VIP and static IP (availability and overlapping): user: user
  cluster control plane VIP "10.251.133.132" overlaps with
  example-cluster1/control plane VIP "10.251.133.132"

Currently, you can't edit fields such as the Control plane VIP and the Ingress VIP in the Load balancer section of the Cluster details page in the Google Cloud console. To fix conflicting IP addresses:

  1. Delete the cluster.

  2. Create the cluster again with IP addresses that doesn't conflict with an existing cluster.

Remove unhealthy clusters

A cluster can get in an unhealthy state for many reasons, such as:

  • Connectivity issues with the Connect Agent or the on-premises environment.

  • The admin cluster for a user cluster was deleted, or there are connectivity issues between the admin and user clusters.

  • The cluster's VM was deleted before deleting the cluster.

If the console is unable to delete a cluster, use gcloud CLI commands to delete Google Cloud resources from unhealthy clusters. If you haven't updated the gcloud CLI recently, run the following command to update the components:

gcloud components update

Next, delete the Google Cloud resources.

User cluster

  1. Delete the user cluster:

    gcloud container vmware clusters delete USER_CLUSTER_NAME \
      --project=PROJECT_ID \
      --location=REGION \
      --force \
      --allow-missing \
      --ignore-errors

    Replace the following:

    • USER_CLUSTER_NAME: The name of the user cluster to delete.

    • PROJECT_ID: The ID of the project that the cluster is registered to.

    • REGION: The Google Cloud location associated with the user cluster. The location is displayed in the console.

      The --force flag lets you delete a cluster that has node pools. Without the --force flag, you have to delete the node pools first, and then delete the cluster.

      The --allow-missing flag allows the command to continue if the cluster isn't found.

      The --ignore-errors removes Google Cloud resources when the admin and user clusters are unreachable. Some F5 or vSphere resources might be left over. See Clean up resources for information on cleaning up the leftover resources.

      This command deletes the cluster if it exists and removes both GKE On-Prem API and fleet membership resources from Google Cloud.

  2. Confirm that the GKE On-Prem API resources have been deleted:

    gcloud container vmware clusters list \
      --project=PROJECT_ID \
      --location=-

    When you set --location=-, that means to list all clusters in all regions. If you need to scope down the list, set --location to a specific region.

  3. Confirm that the fleet membership resources have been deleted:

    gcloud container fleet memberships list \
      --project=PROJECT_ID

Admin cluster

  1. If you enrolled the admin cluster in the GKE On-Prem API, unenroll it:

    gcloud container vmware admin-clusters unenroll ADMIN_CLUSTER_NAME \
     --project=PROJECT_ID \
     --location=REGION \
     --allow-missing

    Replace the following:

    • ADMIN_CLUSTER_NAME: The name of the admin cluster.
    • PROJECT_ID: The ID of the fleet host project.
    • REGION: The Google Cloud region.

    The --allow-missing flag unenrolls the cluster if the fleet membership isn't found.

    This command removes the GKE On-Prem API resources from Google Cloud.

  2. Remove the cluster from the fleet:

    gcloud container fleet memberships delete ADMIN_CLUSTER_NAME \
     --project=PROJECT_ID \
     --location=global

    This command removes fleet membership resources from Google Cloud.

  3. Confirm that the GKE On-Prem API resources have been deleted:

gcloud container fleet memberships delete ADMIN_CLUSTER_NAME --project=FLEET_HOST_PROJECT_ID

When you set --location=-, that means to list all clusters in all regions. If you need to scope down the list, set --location to a specific region.

  1. Confirm that the fleet membership resources have been deleted:

    gcloud container fleet memberships list \
      --project=PROJECT_ID