Stateful managed instance groups

You can build highly available deployments of stateful workloads on VM instances using stateful managed instance groups (stateful MIGs). Stateful workloads include applications with stateful data or configurations such as databases, legacy monolith applications, and long-running batch computations with checkpointing.

With stateful MIGs, you can improve the uptime and resiliency of such stateful applications with autohealing (automatic recovery of failed workloads), multi-zone deployments, and automated rolling updates.

A stateful managed instance group preserves the unique state of each instance (including instance name, attached persistent disks, IP addresses, and metadata) on VM restart, recreation, auto-healing, or update.

This page describes when to use stateful MIGs and provides a high-level overview of how they work. For more information, see How stateful MIGs work.

To learn how to set up a stateful MIG, see Configuring stateful MIGs.

How stateful workloads are different from stateless workloads

You can use managed instance groups to support both stateful and stateless workloads. The key difference between stateful and stateless workloads is that stateful workloads preserve individual VM state (for example, a database shard, or app configuration) on the VM's disks, while stateless workloads, like a web frontend, do not retain any state on the individual VMs.

You treat VMs with stateful workloads like custom-built machinery: you care about VM identity (name), IP address, metadata, and data on each individual machine. You cannot easily scale stateful workloads horizontally because scaling could require data replication, creation or deletion of data shards, or changing the overall application configuration. When recreating or updating a machine with a stateful workload, you must preserve the VM's unique state. Examples of stateful applications include Cassandra, ElasticSearch, mongoDB, MySQL, PostgreSQL, and Kafka.

You treat VMs with stateless workloads as interchangeable and only care about the number of serving VMs that you have. No one VM is treated any differently than another. You can easily scale stateless workloads horizontally by adding or removing VMs. When updating your application, you can delete machines and replace them with new ones with different names, IP addresses, metadata, and disks. When a stateless VM is deleted or recreated, all data on the machine is lost: the disks are deleted or recreated from scratch. A web frontend is an example of a stateless application.

	Stateful MIG	Stateless MIG
Workload	Stateful workloads where disks, IP addresses, and/or metadata are preserved on VM recreate operations.	Highly available and scalable stateless workloads, where disks and IP addresses are recreated from scratch on horizontal scaling, autohealing, auto-updating, and VM recreation.
MIG features	Autohealing Automated rolling updates Multi-zone deployments	Autohealing Automated rolling updates Multi-zone deployments Autoscaling
Preservable items	Instance names Persistent disks, including support for disks that are not defined in the instance template Instance-specific metadata IP addresses	Instance names

All MIGs support custom and preservable instance names.

When to use stateful MIGs

Consider using stateful managed instance groups (stateful MIGs) whenever you deploy a stateful application or cluster to Compute Engine and would like to improve its availability with autohealing and multi-zone deployments, or you want to simplify updates of stateful instances by orchestrating update rollouts and controlling the allowed level of disruption to the instances.

Stateful MIGs are intended for applications with stateful data or configuration, such as:

Databases. For example: Cassandra, ElasticSearch, mongoDB, and ZooKeeper. Before deciding on stateful MIGs, consider using fully managed services, for example, MySQL and PostgreSQL are available in Cloud SQL, to focus on your applications and not have to deal with VMs.
Data processing applications. For example: Kafka and Flink. Before deciding on stateful MIGs, consider using fully managed services, for example, Dataflow or Dataproc, to focus on your data processing tasks and not have to deal with VMs.
Other stateful applications. For example: TeamCity, Jenkins, Bamboo, DNS servers with stateful IP address, and custom stateful workloads.
Legacy monolith applications. These applications store application state on a boot disk or additional persistent disks, or they rely on stateful configuration, such as specific VM instance names, IP addresses, or metadata key values.
Batch workloads with checkpointing. With this configuration, you can preserve checkpointed results of long-running computation in anticipation of workload or VM failure or instance preemption. Stateful MIGs can recreate a failed machine, while preserving its data disk, so that your computation can continue from the last checkpoint.

To achieve resilience against zonal failure, your application must replicate data across multiple instances at the application level. For example, ElasticSearch and Cassandra support such functionality. You can use a regional MIG to make such an application resilient to zonal failure by deploying redundant replicas to multiple zones and relying on your application's data replication functionality. In the event of a zonal failure, your data is served from available replicas in the remaining zones.

Review the limitations to verify if a stateful MIG fully meets your requirements.

What makes a MIG stateful

A MIG is considered stateful if you have created a stateful configuration.

You can create a stateful configuration when you create your MIG, or you can convert a group from stateless to stateful after its creation by adding a configuration.

You create a stateful configuration by setting a non-empty stateful policy and/or one or more non-empty per-instance configurations:

A stateful policy defines items that you want to preserve for all instances in your MIG.
A per-instance configuration defines items to preserve for a specific VM instance.

The configuration is effective after you or the MIG applies it:

A MIG automatically applies your stateful policy configuration to new and existing instances.
When creating or updating per-instance configurations, you can choose whether to apply the new configuration manually or have it applied automatically.

After the stateful configuration (stateful policy and/or per-instance configurations) is applied, you can verify it by inspecting the preserved state of each managed instance.

Subsequent changes to your MIG's stateful configuration or size (for example, decreasing the MIG's size, or deleting or abandoning instances from the MIG) can affect the preserved states of the instances.

Stateful configuration

A stateful managed instance group (MIG) takes its instance configuration from a combination of the instance template, optional all-instances configuration, optional stateful policy, and optional per-instance configurations that you set. After you set the configuration for your group, the MIG uses that configuration when creating VMs. To apply an updated configuration to existing VMs, see Apply new VM configurations in a MIG.

Stateful policy

A stateful policy defines common stateful items for all instances in a managed instance group. Each item that you include in the stateful policy must be defined in the MIG's instance template.

You can make the following changes to a stateful policy:

Configure disks to become stateful by adding them to the stateful policy.
Configure disks to become stateless by removing them from the stateful policy.
Specify that IP addresses must be stateful by adding network interface configuration to the stateful policy.
Specify that IP addresses must be treated as stateless by removing the configuration from stateful policy.

Per-instance configurations

A per-instance configuration defines stateful items that are unique for a specific managed instance, such as instance-specific disks, metadata key-value pairs, and IP addresses. Instance-specific metadata and disks do not need to be defined in the MIG's instance template; however, network interfaces for stateful IPs must be defined in the MIG's instance template.

You can make the following changes to a per-instance configuration for a specific instance in a MIG:

Configure disks that are defined in the instance template to become stateful for the instance (by adding those disks to the per-instance configuration) or to become stateless (by removing those disks from the per-instance configuration).
Configure existing disks, not defined in the instance template, to be attached and become stateful for the instance (by adding those disks to the per-instance configuration) or to be detached from the instance (removing disks from the per-instance configuration).
Add or remove stateful metadata key-value pairs that are specific to the instance.
Configuring IP addresses individually for instances in a MIG to become stateful or stateless.

Example of stateful configuration

Here is an example of a stateful configuration:

Instance template + stateful policy + per-instance configuration = managed instance config.

In this chart:

The instance template defines a common configuration for all VM instances in a MIG
The stateful policy defines a common stateful configuration for disks with device name, data-disk, which are defined by the instance template, and which are created and attached individually to each VM instance in the MIG.
The per-instance configuration defines a stateful configuration for a specific VM instance named, node-1. It specifies to attach an existing disk, my-legacy-1, to the node-1 instance and treat it as stateful. It also specifies one metadata key value to preserve individuality for the node-1 instance: node-id:xyz273.

When creating the node-1 VM, the MIG does the following:

Uses the n2-standard-2 machine type, according to the instance template.
Creates and attaches a boot disk with an auto-generated disk name, boot-node-1, and device name boot-disk, using a Debian GNU/Linux image, according to the instance template. The MIG treats the boot-node-1 boot disk as stateless because it isn't configured in the stateful policy or in the per-instance configuration.
Creates and attaches an additional disk with an auto-generated disk name, data-disk-1, and device name, data-disk, using a custom image, according to the instance template. The MIG treats the data-disk-1 additional disk as stateful because its device name is specified in the stateful policy.
Attaches an existing disk with the disk name, my-legacy-1, and uses device name, legacy-disk, according to the per-instance configuration. The MIG treats the my-legacy-1 additional disk as stateful because its device name is specified in the per-instance configuration.
Sets three metadata key-value pairs: two from the instance template (app:example-stateful-app, version:1.0) and one from the per-instance config (node-id:xyz273). The MIG treats the node-id:xyz273 key-value pair as stateful because it is specified in the per-instance configuration.

When recreating the node-1 VM, assuming the same config is still effective, the MIG recreates the stateless items and preserves the stateful items:

Recreates the boot disk from the original image:

First, it deletes the boot-node-1 boot disk, and then it recreates it from the Debian GNU/Linux image, as specified in the instance template.
Preserves additional disks, data-disk-1 and my-legacy-1:

Detaches the additional disks before deleting the VM, and then attaches them to the VM after it has been recreated.
Preserves the individual metadata key-value pair, node-id:xyz273:

Sets the metadata after the VM has been recreated. Also sets the common key-value pairs from the instance template (app:example-stateful-app and version:1.0).

Feedback

We want to learn about your use cases, challenges, and feedback about stateful MIGs. Please share your feedback with our team at mig-discuss@google.com.

What's next

Read Configuring stateful MIGs to learn how to support stateful workloads by preserving instance names, persistent disks, and metadata in managed instances.
Learn how to Migrate an existing workload to a stateful MIG.
Learn more about How stateful MIGs work.
Learn more about Managed instance groups.
Read about Working with managed instances.