About multi-cluster GKE upgrades using Multi Cluster Ingress

Autopilot Standard

This document describes how to design, plan, and implement upgrades in a multi-cluster Google Kubernetes Engine (GKE) environment. While this document uses Multi Cluster Ingress for upgrades, the concepts can be applied to other solutions, for example, manually configuring an external load balancer. There is an accompanying tutorial to this document that shows how to upgrade a multi-cluster GKE environment with Multi Cluster Ingress. This document is intended for Google Cloud administrators who are responsible for maintaining fleets for GKE clusters.

The need for multi-cluster architecture

This section discusses various reasons that you might need a multi-cluster Kubernetes architecture.

Kubernetes as infrastructure

This document considers Kubernetes clusters to be components of infrastructure. Infrastructure is disposable. No special treatment must be given to any infrastructure component because components are there to serve a purpose. The purpose of Kubernetes is to provide automation and orchestration to developers and operators to serve container-based applications and services to consumers. Consumers can be internal teams, other services, or external customers.

Common multi-cluster scenarios

In addition to the Kubernetes-as-infrastructure argument, there are a number of reasons to have a multi-cluster environment:

Geography. Many Services need to be in multiple regions. Placing a Service closer to the consumer (in their region) provides a better experience due to lower latency than if the Service is served from a single region. A Kubernetes cluster runs in a single region. For multi-regional deployments, multiple Kubernetes clusters in multiple regions are required. Multi-cloud or hybrid cloud environments also require multiple clusters in each environment. Kubernetes clusters are also often colocated with the Services' (stateful) data sources. Certain applications might be required to be in the same location (region and zone) as its backend, for example, a relational database management system (RDBMS).
Tenancy and environments. Kubernetes clusters are designed for multi-tenancy. Multiple teams can share a single cluster for their respective Services. Kubernetes provides standard resources, such as namespaces, role-based access control (RBAC), network policies, and authentication, to properly configure access controls in multi-tenant environments. In some cases, certain Services might not be able to co-reside on a cluster with other Services due to company policy, privacy, security, or industry regulation. In such cases, multiple clusters are required to separate certain tenants into their own clusters. Environments (development, staging, and production) are also often created as separate clusters. The scope of access and types of applications installed in different environments vary tremendously and should be kept as separate clusters.
Composition and function. Sometimes a cluster is created to perform a particular function. For example, machine learning workflows use Kubeflow or data analytics jobs that might require nodes with GPUs or other hardware requirements for instance clusters made up of Spot VMs for the purpose of batch analytics workloads. These hardware requirements might not apply to other Services. These workflows might not be crucial to running the business and can require ephemeral clusters (short-lived clusters). Shared services, such as observability (logging, metrics, and traces) and CI/CD tooling, are better suited in their own platform admin cluster. Separate function-specific clusters are often seen for non-business critical workflows.
Resiliency. Multiple clusters are often used to increase resiliency in an environment. Each cluster has an impact area. In this context, an impact area is the number of Services that are adversely affected due to a cluster malfunction, misconfiguration, or a cluster going offline due to planned or unplanned maintenance. If you have a large number of smaller sized clusters, then you have a large number of smaller sized impact areas. If a Service exists in two clusters, the clusters share the load equally. If one cluster goes offline, 50% of the traffic is affected. If the same Service was served by a single cluster, any event on that cluster would cause 100% outage for that Service. For this reason, multiple clusters are also often used for disaster recovery.

This document focuses on the resiliency aspect of multi-cluster deployments.

Multi-cluster and distributed Services

A distributed Service is a Kubernetes Service that is deployed to multiple Kubernetes clusters. Distributed Services are stateless Services and act identically across multiple clusters. This means that a distributed service has the same Kubernetes Service name and is implemented in the same namespace across multiple clusters. Kubernetes Services are tied to the Kubernetes cluster that they run on. If a Kubernetes cluster goes offline, so does the Kubernetes Service. Distributed Services are abstracted from individual Kubernetes clusters. If one or more Kubernetes clusters are down, the distributed Service might be online and within the service-level objective (SLO).

In the following diagram, frontend is a distributed Service running on multiple clusters with the same Service name and namespace.

Distributed Service `frontend` running on multiple clusters.

With this architecture, the frontend Service isn't tied to a single cluster and is represented conceptually in the diagram as a layer that spans the Kubernetes cluster infrastructure layer. If any of the individual clusters that are running the frontend Service goes down, the frontend remains online. There are additional Services running on single individual clusters: accounts Service and ledger Service. Their uptime and availability are dependent on the uptime of the Kubernetes cluster on which they reside.

Resiliency is one of the reasons for multicluster deployments. Distributed Services create resilient services on multi-cluster architecture. Stateless services are prime candidates for distributed services in a multi-cluster environment. The following requirements and considerations apply when you work with distributed Services:

Multi-cluster networking. You can send traffic that is destined to a distributed Service to clusters that are running that Service by using a multi-cluster ingress technology like Multi Cluster Ingress or by using your own external load balancer or proxy solution. Whichever option you use, it must give you control over when, where, and how much traffic is routed to a particular instance of a distributed Service. The following diagram shows a load balancer sending traffic to a distributed Service frontend that is running in two GKE clusters.
Observability. Use tools to measure your SLOs—typically availability and latency—collectively for a distributed Service. This configuration provides a global view of how each Service is performing across multiple clusters. While distributed Service is not a well-defined resource in most observability solutions, you can collect and combine individual Kubernetes Service metrics. Solutions like Cloud Monitoring or open source tools like Grafana provide Kubernetes Service metrics. Service mesh solutions like Istio and Anthos Service Mesh also provide Service metrics without any instrumentation required.
Service placement. Kubernetes Services provide node-level fault tolerance within a single Kubernetes cluster. This means that a Kubernetes Service can withstand node outages. During node outages, a Kubernetes control plane node automatically reschedules Pods to healthy nodes. A distributed Service provides cluster-level fault tolerance. This means that a distributed Service can withstand cluster outages. When you're capacity planning for a distributed Service, you must consider this Service placement. A distributed Service does not need to run on every cluster. Which clusters a distributed Service runs on depends on the following requirements:
- Where, or in which regions, is the Service required?
- What is the required SLO for the distributed Service?
- What type of fault tolerance is required for the distributed Service—cluster, zonal, or regional? For example, do you require multiple clusters in a single zone, or multiple clusters across zones in a single region or multiple regions?
- What level of outages should the distributed Service withstand in the worst-case scenario? The following options are available at the cluster layer:
  - N+1 (where N represents the number of clusters required to satisfy service capacity needs). A distributed Service can withstand a single cluster failure
  - N+2. A distributed Service can withstand two concurrent failures. For example, a planned and an unplanned outage of a Kubernetes Service in two clusters at the same time.
Rollouts and rollback. Distributed Services, like Kubernetes Services, allow for gradual rollouts and rollbacks. Unlike Kubernetes Services, distributed Services enable clusters to be an additional unit of deployment as a means for gradual change. Rollouts and rollbacks also depend upon the Service requirement. In some cases, you might need to upgrade the service on all the clusters at the same time, for example, a bug fix. In other cases, you might need to slowly roll out (or stagger) the change one cluster at a time. This gradual rollout lowers the risk to the distributed Service by gradually introducing changes to the service. However, this might take longer depending on the number of clusters. No one upgrade strategy is best. Often, multiple rollout and rollback strategies are used depending upon the distributed Service requirements. The important point here is that distributed Services must allow for gradual and controlled changes in the environment.
Business continuity and disaster recovery (BCDR). These terms are often used together. Business continuity refers to continuation of critical services in case of a major (planned or unplanned) event, whereas disaster recovery refers to the steps taken or needed to return business operations to their normal state after such events. There are many strategies for BCDR that are beyond the scope of this document. BCDR requires some level of redundancy in systems and Services. The key premise of distributed Services is that they run in multiple locations (clusters, zones, and regions).

BCDR strategies are often dependent upon the previously discussed rollout and rollback strategies. For example, if rollouts are performed in a staggered or controlled manner, the effect of a bug or a bad configuration push can be caught early without affecting a large number of users. At a large scale and coupled with rapid rate of change (for example in modern CI/CD practices), it is common that not all users are served the same version of a distributed Service. BCDR planning and strategies in distributed systems and Services differ from traditional monolithic architectures. In traditional systems, a change is made wholesale, affecting a large number of—or perhaps every—users, and thus must have a redundant or backup system in place in case of unwanted effects of a rollout. In distributed systems and Services, almost all changes are done in a gradual manner in order to only affect a small number of users.
Cluster lifecycle management. Like controlled rollouts and rollbacks, distributed Services allow for controlled cluster lifecycle management. Distributed Services provide cluster level resiliency so clusters can be taken out of rotation for maintenance. Cluster lifecycle management is a tenet of distributed Services that does not apply to Kubernetes Services.

The remainder of this document focuses on the cluster lifecycle aspect of distributed Services.

GKE cluster lifecycle management

Cluster lifecycle management can be defined as the strategies and the planning required to maintain a healthy and updated fleet of Kubernetes clusters without violating service SLOs. With proper strategies and planning in place, cluster lifecycle management should be routine, expected, and uneventful.

This document focuses on GKE lifecycle management. However, you can apply these concepts to other distributions of Kubernetes.

GKE versioning and upgrades

Before discussing strategies and planning for cluster lifecycle management, it is important to understand what constitutes a cluster upgrade.

A cluster contains two components: control plane nodes and worker nodes. A Kubernetes cluster upgrade requires that all nodes are upgraded to the same version. Kubernetes follow a semantic versioning schema. Kubernetes versions are expressed as X.Y.Z: where X is the major version, Y is the minor version, and Z is the patch version. Minor releases occur approximately every three months (quarterly) and the Kubernetes project maintains release branches for the most recent three minor releases. This means that a nine-month-old Kubernetes minor release is no longer maintained and might require API changes when you upgrade to the latest version. Kubernetes upgrades must be planned at a regular cadence. We recommend doing planned GKE upgrades quarterly or every two quarters.

GKE clusters support running Kubernetes versions from any supported minor release. At least two minor versions are available at any given time.

GKE has the following cluster availability types:

Single zone clusters. A single control plane node and all node pools are in a single zone in a single region.
Multi-zonal clusters. A single control plane node is in one zone and node pools are in multiple zones in a single region.
Regional clusters. Multiple control plane nodes and node pools in multiple zones in a single region.

GKE is a managed service and offers auto-upgrades for both control plane nodes and worker nodes.

GKE auto-upgrade

GKE auto-upgrade is a popular and often used cluster lifecycle strategy. GKE auto-upgrade provides a fully managed way to keep your GKE clusters updated to supported versions. GKE auto-upgrades upgrade control plane nodes and worker nodes separately:

Master auto-upgrades. By default, GKE control plane nodes are automatically upgraded. Single zone and multi-zonal clusters have a single control plane (control plane node). During control plane node upgrades, workloads continue to run. However you cannot deploy new workloads, modify existing workloads, or make other changes to the cluster's configuration until the upgrade is complete.

Regional clusters have multiple replicas of the control plane, and only one replica is upgraded at a time. During the upgrade, the cluster remains highly available, and each control plane replica is unavailable only while it is being upgraded.
Worker node auto-upgrades. Node pools are upgraded one at a time. Within a node pool, nodes are upgraded one at a time in an undefined order. You can change the number of nodes that are upgraded at a time, but this process can take several hours depending on the number of nodes and their workload configurations.

GKE auto-upgrade lifecycle strategy

We recommend using GKE auto-upgrade where possible. GKE auto-upgrades prioritizes convenience over control. However, GKE auto-upgrades provide many ways to influence when and how your clusters get upgraded within certain parameters. You can influence the maintenance windows and maintenance exclusions. Release channels influence the version selection and node upgrade strategies influence the order and timing of node upgrades. Despite these controls and regional clusters (with multiple Kubernetes control planes), GKE auto-upgrade doesn't guarantee services' uptime. You can choose not to use the GKE auto upgrade feature if you require one or more of the following:

Control of the exact version of GKE clusters.
Control the exact time to upgrade GKE.
Control the upgrade strategy (discussed in the next section) for your GKE fleet.

GKE multi-cluster lifecycle management

This section describes various GKE multi-cluster lifecycle management strategies and how to plan for them.

Planning and design considerations

GKE multi-cluster architecture plays a part in selecting a cluster lifecycle management strategy. Before discussing these strategies, it is important to discuss certain design decisions that might affect or be affected by the cluster lifecycle management strategy.

Type of clusters

If you're using the GKE auto-upgrade as a cluster lifecycle management strategy, the type of cluster can matter. For example, regional clusters have multiple control plane nodes where control plane nodes are auto upgraded one at a time whereas zonal clusters have a single control plane node. If you're not using GKE auto-upgrade and if you consider all Kubernetes clusters as disposable infrastructure, then it might not matter what type of cluster you choose when deciding on a cluster lifecycle management strategy. You can apply the strategies discussed in the next section, GKE multi-cluster lifecycle management to any type of cluster.

Cluster placement and footprint

Consider the following factors when you decide on the cluster placement and footprint:

Zones and regions that clusters are required to be in.
Number and size of clusters needed.

The first factor is usually easy to address because the zones and regions are dictated by your business and the regions in which you serve your users.

Addressing the number and size of clusters typically falls in the following categories, each with advantages and disadvantages:

Small number of large clusters. You can choose to use the redundancy and resiliency provided by regional clusters and place one (or two) large regional clusters per region. The benefit of this approach is low operational overhead of managing multiple clusters. The downside is that it can affect a large number of services at once due to its large impact area.
Large number of small clusters. You can create a large number of small clusters to reduce the cluster impact area because your services are split across many clusters. This approach also works well for short-lived ephemeral clusters (for example, clusters running a batch workload). The downside of this approach is higher operational overhead because there are more clusters to upgrade. There can also be additional costs associated with a higher number of control plane nodes. You can offset the costs and the high-operational overhead with automation, predictable schedule and strategy, and careful coordination between the teams and services that are affected.

This document doesn't recommend one approach over the other; they are options. In some cases, you can choose both design patterns for different categories of services.

The following strategies work with either design choice.

Capacity planning

When planning for capacity, it is important to consider the chosen cluster lifecycle strategy. Capacity planning must consider the following normal service load and maintenance events:

Planned events like cluster upgrades
Unplanned events like cluster outages, for example, bad configuration pushes and bad rollouts

When capacity planning, you must consider any total or partial outages. If you design for only planned maintenance events, then all distributed Services must have one additional cluster than required so that you can take one cluster out of rotation at a time for upgrades without degrading the service. This approach is also referred to as N+1 capacity planning. If you design for planned and unplanned maintenance events, then all distributed services must have two (or more) additional clusters than required to serve intended capacity—one for the planned event and one for an unplanned event in case it occurs during the planned maintenance window. This approach is also referred to as N+2 capacity planning.

In multi-cluster architectures, the terms draining and spilling are often used. These terms refer to the process of removing (or draining) traffic from a cluster and redirecting (or spilling) traffic onto other clusters during upgrades and maintenance events. This process is accomplished by using networking solutions like multi-cluster Ingress or other load balancing methods. Careful use of draining and spilling is at the heart of some cluster lifecycle management strategies. When you're capacity planning, you must consider draining and spilling. For example, when a single cluster is drained, you need to consider whether the other clusters have enough capacity to handle the additional spilled traffic. Other considerations include sufficient capacity in the zone or region or a need to send traffic to a different region (if using a single regional cluster per region). The following diagram shows traffic being removed (sometimes referred to as draining a cluster) from one cluster and sent to another cluster running the same distributed service.

Draining traffic from one cluster and sending traffic to another cluster.

Clusters and distributed Services

Services-based cluster design dictates that cluster architecture (number, size, and location) is determined by the Services that are required to run on the clusters. Therefore the placement of your clusters are dictated by where the distributed Services are needed. Consider the following when deciding the placement of distributed Services:

Location requirement. Which regions does the Service need to be served out of?
Criticality. How critical is the availability of a Service to the business?
SLO. What are the service level objectives for the service (typically based on criticality)?
Resilience. How resilient does the Service need to be? Does it need to withstand cluster, zonal, or even regional failures?

When planning for cluster upgrades, you must consider the number of Services a single cluster affects when it is drained, and you must account for spilling each of these Services to other appropriate clusters. Clusters can be single tenant or multi-tenant. Single-tenant clusters only serve a single Service or a product represented by a set of Services. Single-tenant clusters do not share the cluster with other Services or products. Multi-tenant clusters can run many Services and products that are typically partitioned into namespaces.

Impact to teams

A cluster event not only affects Services but can also impact teams. For example, the DevOps team might need to redirect or halt their CI/CD pipelines during a cluster upgrade. Likewise, support teams can get alerted for planned outages. Automation and tooling must be in place to help ease the impact to multiple teams. A cluster or a cluster fleet upgrade should be considered as routine and uneventful when all teams are informed.

Timing, scheduling, and coordination

Kubernetes releases a new minor version quarterly and maintains the last three releases. You must carefully plan the timing and scheduling of cluster upgrades. There must be an agreement between the service owners, service operators and platform administrators on when these upgrades take place. When planning for upgrades, consider the following questions:

How often do you upgrade? Do you upgrade every quarter or on a different timeline?
When do you upgrade? Do you upgrade at the beginning of the quarter when business slows down or during other business downtimes driven by your industry?
When shouldn't you upgrade? Do you have clear planning around when not to upgrade, for example, avoid peak scale events like Black Friday, Cyber Monday, or during high profile conferences and other industry-specific events.

It is important to have a strategy in place that is clearly communicated with the Service owners as well as the operations and support teams. There should be no surprises and everyone should know when and how the clusters are upgraded. This requires clear coordination with all teams involved. A single Service has multiple teams that interact with it. Typically, these teams can be grouped into the following categories:

The Service developer, who is responsible for creating and coding the business logic into a Service.
The Service operator, who is responsible for safely and reliably running the Service. The operators can consist of multiple teams like policy or security administrator, networking administrator, and support teams.

Everyone must be in communication during cluster upgrades so that they can take proper actions during this time. One approach is to plan for upgrades the same way that you plan for an outage incident. You have an incident commander, a chat room, and a retrospective (even if no users were impacted). For more information, see Incident response.

GKE cluster lifecycle strategies

This section discusses the main cluster lifecycle management strategies often used in GKE multi-cluster architecture. It is important to note that one strategy won't work for all scenarios and you might choose multiple strategies for various categories of services and needs of the business.

Rolling upgrades

The following diagram shows the rolling upgrade strategy.

Rolling upgrade strategy where drained traffic is spilled to a different cluster.

Using a load balancer, one GKE cluster is drained of all traffic and upgraded. The drained traffic load is spilled to a different GKE cluster.

Rolling upgrades are the simplest and the most cost effective strategy out of the strategies discussed in this document. You start with n number of clusters running the old_ver (or current production) version. You then drain m clusters at a time, where m is less than n. You then delete and recreate new clusters with the new version, or upgrade the drained clusters.

The decision between deleting and upgrading new clusters depends upon the size of the clusters as well as if you consider that clusters are immutable infrastructure. Immutable infrastructure dictates that instead of constantly upgrading a cluster, which might produce unwanted results over time, that you create new clusters and avoid any unforeseen configuration drift.

If you use GKE, you can create a GKE cluster with a single command or an API call. New cluster strategy requires that you have the entire cluster configuration (cluster manifests) stored outside of the cluster, typically in Git. You can then use the same configuration template on the new cluster. If this is a new cluster, ensure that your CI/CD pipelines are pointing to the correct cluster. After the cluster is properly configured, you can then push traffic back onto the cluster slowly while monitoring the Services' SLOs.

The process is repeated for all clusters. Depending upon your capacity planning, you can upgrade multiple clusters at a time without violating Service SLOs.

If you value simplicity and cost over resiliency, use the rolling upgrades strategy. During this strategy, you never exceed the GKE fleet's required capacity for all distributed services.

The following diagram compares the timeline and the Service capacity requirement during a GKE cluster upgrade in a multi-cluster architecture.

Graph showing that the Service capacity doesn't exceed requirements.

The preceding diagram shows that throughout the GKE upgrade process, the capacity to support the services never goes below what is required. When the GKE cluster to be upgraded is taken out of rotation, the other clusters are scaled up to support the load.

Blue/green upgrades

The following diagram shows a blue/green upgrade strategy.

Traffic is sent to new cluster before removing the drained cluster.

In the preceding diagram, a new GKE cluster running the new version is added. Then a load balancer is used to send traffic to the new cluster while slowly draining one of the old clusters until no traffic is sent to it. The fully drained old cluster can then be removed. The same process can be followed for the remaining clusters.

The blue/green upgrade strategy provide some added resiliency. This strategy is similar to rolling upgrades, but it is more costly. The only difference is that instead of draining existing clusters first, you create new m clusters with the version first, where m is less than or equal to n. You add the new clusters to the CI/CD pipelines, and then slowly spill traffic over while monitoring the Service SLOs. When the new clusters are fully taking traffic, you drain and delete clusters with the older version.

The blue/green strategy to upgrade clusters is similar to a blue/green strategy typically used for Services. Creating multiple new clusters at a time increases the overall cost but gives you the benefit of speeding up the fleet upgrade time. The added cost is only for the duration of the upgrade when additional clusters are used. The benefit of creating new clusters first is that in case of a failure, you can roll back. You can also test the new cluster before sending production traffic to it. Because these clusters coexist with their old version counterparts for a small period of time, the additional costs are minimal.

If you value simplicity and resiliency over cost, use the blue/green upgrade strategy. Additional clusters are added first and exceed the GKE fleet's required capacity for the duration of the upgrades.

Graph showing that capacity is exceeded during the upgrade.

In the preceding diagram, adding a new cluster first temporarily increases the available capacity over the required capacity while another cluster in the fleet is drained and removed from the fleet. However, after removing one of the old (fully drained) clusters, the capacity goes back to what is needed. This capacity change is highlighted because there can be an increase in cost with this model, depending upon the number and size of clusters in the fleet.

Canary cluster upgrades

A canary cluster upgrade is the most resilient and complex strategy of those discussed in this document. This strategy completely abstracts cluster lifecycle management from the Services lifecycle management, thereby offering the lowest risk and highest resilience for your services. In the previous rolling and blue/green upgrade strategies, you maintain your entire GKE fleet on a single version. In this strategy, you maintain two or perhaps three fleets of GKE clusters that are running different versions. Instead of upgrading the clusters, you migrate Services from one fleet of clusters to the other fleet over time. When the oldest GKE fleet is drained (meaning that all Services have been migrated to the next versioned GKE fleet), you delete the fleet.

This strategy requires that you maintain a minimum of two GKE fleets—one for the current production and one for the next production candidate version. You can also maintain more than two GKE fleets. Extra fleets give you more flexibility, but your cost and operational overhead also goes up. These extra fleets are not the same as having clusters in different environments, for example development, staging, and production environments. Non-production environments are great for testing the Kubernetes features and Services with non-production traffic.

This strategy of using canary cluster upgrades dictates that you maintain multiple GKE fleet versions in the production environment. This is similar to canary release strategies that are often used by Services. With canary Service deployments, the Service owner can always pinpoint issues to a particular version of the Service. With canary clusters, the Service owner must also take into account the GKE fleet versions that their Services are running on. A single distributed Service version can potentially run on multiple GKE fleet versions. The migration of a Service can happen gradually so that you can see the effects of the Service on the new fleet before sending all traffic for the Service to the new versioned clusters.

The following diagram shows that managing different fleets of GKE clusters can completely abstract the cluster lifecycle from the services lifecycle.

Migrating Service `frontend` to a new fleet of clusters.

The preceding diagram shows a Distributed Service frontend being slowly migrated from one fleet of GKE clusters to the next fleet running the new version until the older fleet is completely drained over time. After the fleet is drained, it can be removed and a new fleet is created. All services are migrated to the next fleet, removing the older fleets as they are drained.

If you value resilience over everything else, use the canary cluster upgrade strategy.

Choose an upgrade strategy

The following diagram can help you determine which strategy is best for you based on the Service and business needs.

Decision tree to help choose an upgrade strategy.

The preceding diagram is a decision tree to help you pick the upgrade strategy that is right for you:

If you do not require complete control over the exact version and time of upgrade, you can choose the auto-upgrade feature available in GKE.
If your priority is low cost, you can choose the rolling upgrade strategy.
If your priority is balancing cost and resilience, you can choose the blue/green strategy.
If your priority is resilience over cost, you can choose the canary cluster upgrade strategy.

Using Multi Cluster Ingress for GKE cluster lifecycle management

Almost any strategy depends on the ability to drain and re-route traffic to other clusters during upgrades. A solution that provides this multi-cluster ingress capability is Multi Cluster Ingress. Multi Cluster Ingress is a Google Cloud-hosted multi-cluster ingress controller for GKE clusters that supports deploying shared load balancing resources across clusters and across regions. Multi Cluster Ingress is a solution to get client traffic to a distributed service running in many clusters across many regions. Like Ingress for GKE, it uses Cloud Load Balancing to send traffic to a backend service. The backend service is the distributed Service. The backend Service sends traffic to multiple backends, which are Kubernetes Services running on multiple GKE clusters. For Service-to-Service traffic across clusters, you can use service mesh technologies like Anthos Service Mesh or Istio, which provide similar functionality across distributed Services.

For GKE multi-cluster environments, you can use Multi Cluster Ingress to manipulate traffic to multiple clusters for the previously discussed cluster lifecycle management strategies. You can follow a tutorial to use Multi Cluster Ingress for GKE upgrades using the blue-green strategy.

What's next

Learn more about Multi Cluster Ingress.
Learn how to deploy Multi Cluster Ingress across clusters.