Stateful managed instance groups

You can build highly available deployments of stateful workloads on VM instances using stateful managed instance groups (stateful MIGs). Stateful workloads include applications with stateful data or configurations such as databases, legacy monolith applications, and long-running batch computations with checkpointing.

With stateful MIGs, you can improve the uptime and resiliency of such stateful applications with autohealing (automatic recovery of failed workloads) and multi-zone deployments, and you can simplify updates of stateful instances by ⁠controlling update rollouts.

A stateful managed instance group preserves the unique state of each instance (including instance name, attached persistent disks, and metadata) on VM restart, recreation, auto-healing, or update.

This page describes when to use stateful MIGs and provides a high-level overview of how they work. For more information, see How stateful MIGs work.

To learn how to set up a stateful MIG, see Configuring stateful MIGs.

How stateful workloads are different from stateless workloads

You can use managed instance groups to support both stateful and stateless workloads. The key difference between stateful and stateless workloads is that stateful workloads preserve individual VM state (for example, a database shard, or app configuration) on the VM's disks, while stateless workloads, like a web frontend, do not retain any state on the individual VMs.

You treat VMs with stateful workloads like custom-built machinery: you care about VM identity (name), metadata, and data on each individual machine. You cannot easily scale stateful workloads horizontally because scaling could require data replication, creation or deletion of data shards, or changing the overall application configuration. When recreating or updating a machine with a stateful workload, you must preserve the VM's unique state. Examples of stateful applications include Cassandra, ElasticSearch, mongoDB, MySQL, PostgreSQL, and Kafka.

You treat VMs with stateless workloads as interchangeable and only care about the number of serving VMs that you have. No one VM is treated any differently than another. You can easily scale stateless workloads horizontally by adding or removing VMs. When updating your application, you can delete machines and replace them with new ones with different names, metadata, and disks. When a stateless VM is deleted or recreated, all data on the machine is lost: the disks are deleted or recreated from scratch. A web frontend is an example of a stateless application.

Stateful MIGStateless MIG
Workload Stateful workloads where disks and/or metadata are preserved on VM recreate operations. Highly available and scalable stateless workloads, where disks are recreated from scratch on horizontal scaling, autohealing, auto-updating, and VM recreation.
MIG features
  • Autohealing
  • Controlled updates of specific instances
  • Multi-zone deployments
  • Autohealing
  • Automated rolling updates
  • Multi-zone deployments
  • Autoscaling
Preservable items
  • Instance names
  • Persistent disks, including support for disks that are not defined in the instance template
  • Instance-specific metadata
Instance names

All MIGs support custom and preservable instance names.

When to use stateful MIGs

Consider using stateful managed instance groups (stateful MIGs) whenever you deploy a stateful application or cluster to Compute Engine and would like to improve its availability with autohealing and multi-zone deployments, or you want to simplify updates of stateful instances by orchestrating update rollouts and controlling the allowed level of disruption to the instances.

Stateful MIGs are intended for applications with stateful data or configuration, such as:

  • Databases. For example: Cassandra, ElasticSearch, mongoDB, and ZooKeeper. Before deciding on stateful MIGs, consider using fully managed services, for example, MySQL and PostgreSQL are available in Cloud SQL, to focus on your applications and not have to deal with VMs.
  • Data processing applications. For example: Kafka and Flink. Before deciding on stateful MIGs, consider using fully managed services, for example, Dataflow or Dataproc, to focus on your data processing tasks and not have to deal with VMs.
  • Other stateful applications. For example: TeamCity, Jenkins, Bamboo, and custom stateful workloads.
  • Legacy monolith applications. These applications store application state on a boot disk or additional persistent disks, or they rely on stateful configuration, such as specific VM instance names or metadata key values.
  • Batch workloads with checkpointing. With this configuration, you can preserve checkpointed results of long-running computation in anticipation of workload or VM failure or instance preemption. Stateful MIGs can recreate a failed machine, while preserving its data disk, so that your computation can continue from the last checkpoint.

To achieve resilience against zonal failure, your application must replicate data across multiple instances at the application level. For example, ElasticSearch and Cassandra support such functionality. You can use a regional MIG to make such an application resilient to zonal failure by deploying redundant replicas to multiple zones and relying on your application's data replication functionality. In the event of a zonal failure, your data is served from available replicas in the remaining zones.

Review the limitations to verify if a stateful MIG fully meets your requirements.

What makes a MIG stateful

A MIG is considered stateful if you have created a stateful configuration.

You can create a stateful configuration when you create your MIG, or you can convert a group from stateless to stateful after its creation by adding stateful configuration.

You create a stateful configuration by setting a non-empty stateful policy and/or one or more non-empty per-instance configs:

  • A stateful policy defines items that you want to preserve for all instances in your MIG.
  • A per-instance config defines items to preserve for a specific VM instance.

The configuration is effective after you or the MIG applies it:

  • A MIG automatically applies your stateful policy configuration to new and existing instances.
  • When creating or updating per-instance configs, you can choose whether to apply the new configuration manually or have it applied automatically.

After the stateful configuration (stateful policy and/or per-instance configs) is applied, you can verify it by inspecting the preserved state of each managed instance.

Subsequent changes to your MIG's stateful configuration or size (for example, decreasing the MIG's size, or deleting or abandoning instances from the MIG) can affect the preserved states of the instances.

Stateful configuration

A stateful managed instance group (MIG) takes its instance configuration from a combination of the instance template, stateful policy, and per-instance configs that you set. After you apply the instance template, stateful policy, and/or per-instance configs to your group, the MIG uses that configuration when creating, recreating, autohealing, or auto-updating its VM instances.

Stateful policy

A stateful policy defines common stateful items for all instances in a managed instance group. Each item that you include in the stateful policy must be defined in the MIG's instance template.

You can make the following changes to a stateful policy:

  • Configure disks to become stateful by adding them to the stateful policy.
  • Configure disks to become stateless by removing them from the stateful policy.

Per-instance configs

A per-instance config defines stateful items that are unique for a specific managed instance, such as instance-specific metadata key-value pairs. These items do not need to be defined in the MIG's instance template.

You can make the following changes to a per-instance config for a specific instance in a MIG:

  • Configure disks that are defined in the instance template to become stateful for the instance (by adding those disks to the per-instance config) or to become stateless (by removing those disks from the per-instance config).
  • Configure existing disks, not defined in the instance template, to be attached and become stateful for the instance (by adding those disks to the per-instance config) or to be detached from the instance (removing disks from the per-instance config).
  • Add or remove stateful metadata key-value pairs that are specific to the instance.

Example of stateful configuration

Here is an example of a stateful configuration:

Instance template + stateful policy + per-instance config = managed instance config.

In this chart:

  • The instance template defines a common configuration for all VM instances in a MIG
  • The stateful policy defines a common stateful configuration for disks with device name, data-disk, which are defined by the instance template, and which are created and attached individually to each VM instance in the MIG.
  • The per-instance config defines a stateful configuration for a specific VM instance named, node-1. It specifies to attach an existing disk, my-legacy-1, to the node-1 instance and treat it as stateful. It also specifies one metadata key value to preserve individuality for the node-1 instance: node-id:xyz273.

When creating the node-1 VM, the MIG does the following:

  1. Uses the n2-standard-2 machine type, according to the instance template.
  2. Creates and attaches a boot disk with an auto-generated disk name, boot-node-1, and device name boot-disk, using a Debian GNU/Linux image, according to the instance template. The MIG treats the boot-node-1 boot disk as stateless because it isn't configured in the stateful policy or in the per-instance config.
  3. Creates and attaches an additional disk with an auto-generated disk name, data-disk-1, and device name, data-disk, using a custom image, according to the instance template. The MIG treats the data-disk-1 additional disk as stateful because its device name is specified in the stateful policy.
  4. Attaches an existing disk with the disk name, my-legacy-1, and uses device name, legacy-disk, according to the per-instance config. The MIG treats the my-legacy-1 additional disk as stateful because its device name is specified in the per-instance config.
  5. Sets three metadata key-value pairs: two from the instance template (app:example-stateful-app, version:1.0) and one from the per-instance config (node-id:xyz273). The MIG treats the node-id:xyz273 key-value pair as stateful because it is specified in the per-instance config.

When recreating the node-1 VM, assuming the same config is still effective, the MIG recreates the stateless items and preserves the stateful items:

  1. Recreates the boot disk from the original image:

    First, it deletes the boot-node-1 boot disk, and then it recreates it from the Debian GNU/Linux image, as specified in the instance template.

  2. Preserves additional disks, data-disk-1 and my-legacy-1:

    Detaches the additional disks before deleting the VM, and then attaches them to the VM after it has been recreated.

  3. Preserves the individual metadata key-value pair, node-id:xyz273:

    Sets the metadata after the VM has been recreated. Also sets the common key-value pairs from the instance template (app:example-stateful-app and version:1.0).

What's next