Troubleshooting managed instance groups

There are several issues that can prevent a managed instance group (MIG) from successfully creating or recreating a VM instance.

If logs are generated for a deleted MIG

The problem might be related to the following situations.

Attached autoscaler still exists

If you deleted a MIG using the Compute Engine API and you did not issue a separate request to delete the attached autoscaler, the Logs Explorer might show logs with the following message.

The resource 'projects/PROJECT/zones/ZONE/instanceGroupManagers/DELETED_INSTANCE_GROUP_NAME' was not found.

Resolution:

To resolve this issue, delete the attached autoscaler using the Compute Engine API methods:

For an autoscaler of a zonal MIG, use the autoscalers.delete method.
For an autoscaler of a regional MIG, use the regionAutoscalers.delete method.

If your MIG cannot create or recreate instances

The problem might be related to the following situations.

The boot disk already exists

By default, a new boot persistent disk is created when you create an instance. The name of the boot disk matches the name of the VM. If you name a VM my-instance, the disk is also named my-instance. If a persistent disk already exists with that name, the request fails. To resolve this issue, you can optionally take a snapshot, and then delete the existing persistent disk.

The instance template is not valid

If you updated your instance template recently, there could be an invalid property that causes the MIG to fail VM creation. Examine the properties for these common errors:

You specified a resource that doesn't exist, such as a source image.
You misspelled a resource name.
You tried to attach an existing non-boot persistent disk in read/write mode but your group contains more than one VM. For groups with more than one VM, any additional disks you want to share between all of the VMs in the group can be attached only in read-only mode.

Limit exceeded for resource type

The following error occurs when you try to create more than 2,000 VMs in a regional MIG or more than 1,000 VMs in a zonal MIG. You have reached the size limit for your instance group.

Error message:

ERROR: (gcloud.compute.<INSTANCE_GROUP_TYPE>.<METHOD>) Could not
fetch resource:

 - Exceeded limit 'MAX_INSTANCES_IN_INSTANCE_GROUP' on resource 'PROJECT_ID'.
 Limit: NUMBER

Resolution:

To resolve this issue, try one of the following:

If you are using a zonal MIG, use a regional MIG instead.
Create multiple MIGs and split your workload across them—for example by adjusting your load balancing configuration.
If you still need a bigger group, you can increase the size limit of your MIG or contact support to make a request.

If you cannot delete your MIG or its instances

The problem might be related to the following situation.

Resource not found in zone or region

The following error occurs when you try to delete a regional MIG and you specify the --zone flag, specify no region, or specify the wrong region. A similar error can occur if you try to delete a zonal MIG and you specify the --region flag.

Error message:

ERROR: (gcloud.compute.instance-groups.managed.delete) Some requests did not succeed:
‐ The resource 'projects/PROJECT/zones/ZONE/instanceGroupManagers/INSTANCE_GROUP_NAME' was not found

ERROR: (gcloud.compute.instance-groups.managed.delete) Some requests did not succeed:
‐ The resource 'projects/PROJECT/regions/REGION/instanceGroupManagers/INSTANCE_GROUP_NAME' was not found

Resolution:

To resolve this issue, try one of the following:

Append the appropriate --region or --zone flag to your command
Set a default region and zone

Resource is used by a backend service

You cannot remove an instance group when it is used by a load balancer's backend service. You must remove the instance from the backend service before you can delete the instance group.

Error message:

ERROR: (gcloud.compute.instance-groups.managed.delete) Some requests did not succeed:
‐ The instance_group_manager resource 'projects/PROJECT/zones/ZONE/instanceGroupManagers/INSTANCE_GROUP_NAME is already being used by 'projects/PROJECT/global/backendServices/BACKEND_SERVICE

ERROR: (gcloud.compute.instance-groups.managed.delete) Some requests did not succeed:
‐ The instance_group_manager resource 'projects/PROJECT/regions/REGION/instanceGroupManagers/INSTANCE_GROUP_NAME is already being used by 'projects/PROJECT/global/backendServices/BACKEND_SERVICE

Resolution:

Optional: Drain the backend instance group.
- For proxy load balancers only, you can set the capacity scaler to 0.0 before removing the instance group from a backend service. You can set the capacity scaler to zero by using the gcloud compute backend-services edit command.
- For both proxy and pass-through load balancers, if you enable connection draining on the backend service, Google Cloud attempts to allow existing connections to persist, complete, and drain whenever an instance group is removed from a backend service.

Remove the MIG from the regional or global backend service.

For a zonal MIG, run the following command:

gcloud compute backend-services remove-backend BACKEND_SERVICE \
    --instance-group=INSTANCE_GROUP_NAME \
    --instance-group-zone=ZONE \
    [--region=REGION | --global]

For a regional MIG, run the following command:

gcloud compute backend-services remove-backend BACKEND_SERVICE  \
    --instance-group=INSTANCE_GROUP_NAME \
    --instance-group-region=REGION \
    [--region=REGION | --global]

Delete the MIG:

gcloud compute instance-groups managed delete INSTANCE_GROUP_NAME

If your MIG continually tries to recreate instances

The problem might be related to the following situation.

Health check probes cannot reach the instance

If you configured an autohealing policy but you did not configure—or misconfigured—the firewall rule that lets the health check probes reach your application, then your VMs appear unhealthy, and the MIG continuously tries to recreate them. For information about how to configure a health check firewall rule, see Example health check set up.