Performing one-click OS image upgrades in MIGs


By using a combination of custom image families and rolling updates, you can enable one-click OS image upgrades on your managed instance group (MIG).

Using the one-click OS image upgrade provides a number of benefits, including:

  • Works with all VM machine types and all instance group sizes.
  • Supports both Windows and Linux images and containers.
  • Instances are recreated based on their current instance template, or optionally, based on a new template, so you can preserve custom startup scripts and metadata.
  • Works with stateful MIGs, so you can optionally preserve data on non-boot disks.
  • The rollout of an update to the new OS version happens automatically, without the need for additional user input after the initial request.
  • Supports batch updates with an optional health check.

Before you begin

  • If you haven't already, then set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:
    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Set a default region and zone.

How does one-click OS image upgrade work?

When you invoke an update, the MIG replaces the boot disks for all VMs in the group with the latest available OS image version from your custom image family. The MIG preserves metadata and startup scripts that you set up in the instance template for each VM in the group. Non-boot disks are recreated based on their specification in the instance template. For information about preserving data, see Configuring stateful disks in MIGs.

To limit application disruption, you can perform updates in batches, keeping a specific percent of VMs running during the update. To increase reliability, you can configure an application-based health check for your MIG: the group waits for a healthy response from an application on updated VMs before proceeding with further updates to other VMs.

Before you begin

  • Install or update to the latest version of the Google Cloud CLI.

  • Make sure you have created an instance template that points to an image family. Google recommends that you use custom image families to reduce the risk of rolling out an image version that is incompatible with your application. You can ensure that only compatible image versions are rolled out by adding images to your custom image family only after compatibility testing with your application.

    When your instance template points to an image family, the MIG always creates instances from the latest image in the family, for example:

    • When the MIG adds new instances because you or the MIG's autoscaler increased the MIG's size.
    • When the MIG recreates an instance, triggered manually or by autohealing.
  • Test the new image with your app before adding it to your image family and rolling it out.

  • Optionally, create an application-based health check for your MIG. An application-based health check verifies that your application is responding as expected on each of the VMs in the MIG. You can configure your update to allow no more than one unavailable VM. If an application does not respond as expected, then the MIG marks that VM as unavailable, and your rollout does not proceed.

Performing one-click OS image upgrades for MIGs

To update all the VMs in a MIG to the latest image from a custom image family, complete the following steps:

  1. Start a rolling replace with the following command.

    gcloud compute instance-groups managed rolling-action replace instance-group-name \
        [--max-surge=max-surge ] [--max-unavailable=max-unavailable]

    Replace the following:

    • instance-group-name: the name of the MIG to operate on.
    • max-surge: the maximum additional number of VMs that can be temporarily created during the update process. This can be a fixed number (for example, 5) or a percentage of the size of the MIG (for example, 10%).
    • max-unavailable: the maximum number of VMs that can be unavailable during the update process. This can be a fixed number (5) or a percentage of the size of the MIG (10%).

    You can combine health checks by using the --max-unavailable and --max-surge options to stop further updates if they cause VMs to become unavailable.

  2. Monitor the update by using the wait-until subcommand to check that the MIG's status.versionTarget.isReached field is set to true.

    gcloud compute instance-groups managed wait-until instance-group-name --version-target-reached

    Replace the following:

    • instance-group-name: the name of the MIG to operate on.

    The command returns when the group is updated.

    You can also list instances to see each instance's status.

    gcloud compute instance-groups managed list-instances instance-group-name

    The command returns a list of instances and their details, including status, health state, and current actions for each VM. When all VMs are RUNNING and have no current action, then the MIG is up-to-date and stable.

  3. In case you need to roll back to a previous OS image, you must create an instance template and specify the image you want to use. Then start a rolling update to update all managed instances to use that template. For more information, see Rolling back an update.

Example

This example covers the following tasks:

  1. Create an instance template for easy OS image updates:
  2. Create a MIG based on the template.
  3. Set up a health check to limit disruption by an image update.
  4. Add a new image to an image family.
  5. Invoke an OS update with a single command.
  6. Monitor the update.

Use the following steps to enable and perform one-click OS upgrades on a MIG:

  1. Create an instance template that specifies a custom image family. The image family should contain tested and trusted images. Each VM that the MIG creates from the template uses the latest available image from this family.

    gcloud compute instance-templates create example-template \
        --machine-type n1-standard-4 \
        --image-family my-image-family \
        --image-project my-project \
        --tags=http-server
    
  2. Create a MIG based on the instance template. This example starts the MIG with three instances based on example-template. Because the instance template specifies an image family, the MIG creates each VM with the latest image from the family.

    gcloud compute instance-groups managed create example-group \
      --base-instance-name example \
      --size 3 \
      --zone us-east1-b \
      --template example-template
    
  3. Optional: Configure and enable an application-based health check. If your app doesn't respond after an image update, you can use the health check status combined with the maxUnavailable setting to stop the MIG from further rollouts.

    1. Create a health check that looks for an HTTP 200 response on the request path /health. The GitHub app that is on each instance serves that path.

      gcloud compute health-checks create http example-autohealer-check \
          --check-interval 10 \
          --timeout 5 \
          --healthy-threshold 2 \
          --unhealthy-threshold 3 \
          --request-path "/health"
      
    2. Create a firewall rule to allow the health checker probes to access the instances. The health checker probes come from addresses in the ranges: 130.211.0.0/22 and 35.191.0.0/16

      gcloud compute firewall-rules create default-allow-http-health-check \
          --network default \
          --allow tcp:80 \
          --source-ranges 130.211.0.0/22,35.191.0.0/16
      
    3. Add the health check to your MIG.

      gcloud compute instance-groups managed update example-group \
          --zone us-east1-b --health-check example-autohealer-check
      
  4. When an update is available, tested, and determined to be compatible with your app, create a new image, and use the --family flag to include that image in the custom image family.

    gcloud compute images my-image-v2 \
        --source-disk boot-disk-1 \
        --source-disk-zone us-central1-f \
        --family my-image-family

    In this example, the latest image in my-image-family is now my-image-v2, which is based on the source disk boot-disk-1.

  5. Invoke a rolling replace to replace all VMs in the MIG. The MIG replaces each VM based on the group's instance template. The instance template specifies my-image-family, so each VM gets the latest image in the family (my-image-v2).

    You can configure the level of disruption that the update causes. In this example, the MIG creates one additional VM above the group's target size, and it does not remove any existing VMs until that one VM is up and running.

    gcloud compute instance-groups managed rolling-action replace example-group \
        --max-surge 1 --max-unavailable 0
    
  6. If you want to monitor the status of the updates, use the wait-until command with the --version-target-reached flag. The command returns when the group is updated.

    gcloud compute instance-groups managed wait-until --version-target-reached example-group \
        --zone us-east1-
    Waiting for group to reach version target
    ...
    Version target is reached
    

    You can also use the list-instances command to see the status, health state, current actions, instance template, and version for each VM.

    gcloud compute instance-groups managed list-instances example-group \
        --zone us-east1-b
    
    
    NAME       ZONE        STATUS   HEALTH_STATE  ACTION     INSTANCE_TEMPLATE  VERSION_NAME                        LAST_ERROR
    test-211p  us-east1-b  RUNNING  HEALTHY       NONE       example-template   0/2020-01-30 13:34:28.843377+00:00
    test-t5qb  us-east1-b  RUNNING  UNKNOWN       VERIFYING  example-template   0/2020-01-30 13:34:28.843377+00:00
    test-x331  us-east1-b  RUNNING  HEALTHY       NONE       example-template   0/2020-01-20 20:39:51.819399+00:00
    
  7. If you need to rollback to a previous image, use the following steps:

    1. Create a new instance template that specifies the image that you want.
    2. Start a rolling update to apply the instance template.

What's next