Introducing fleets

Fleets (formerly known as environs) are a Google Cloud concept for logically organizing clusters and other resources, letting you use and manage multi-cluster capabilities and apply consistent policies across your systems. Fleets form a crucial part of how enterprise multi-cluster functionality works in Google Cloud.

This guide introduces you to fleets: what we mean by a fleet, where fleets are used in our components, and how to set up your systems to take advantage of fleet-level features. We also provide some examples to illustrate how fleets can help simplify your cluster and system management, and best practices to follow when building and operating multi-cluster systems with fleets.

This guide is designed for technical readers, including system architects, platform operators, and service operators, who want to leverage multiple clusters and related infrastructure. These concepts are useful wherever your organization happens to be running multiple clusters, whether in Google Cloud, across multiple cloud providers, or on-premises.

You should be familiar with basic Kubernetes concepts such as clusters; if you're not, see Kubernetes basics, the GKE documentation, and Preparing an application for Anthos Service Mesh.

If you want to learn more about Anthos and the components that use fleets, see our Anthos technical overview and Explore Anthos tutorial. However, you don't need to be familiar with Anthos to follow this guide.

Introduction

Typically, as organizations embrace cloud-native technologies like containers, container orchestration, and service meshes, they reach a point where running a single cluster is no longer sufficient. There are a variety of reasons why organizations choose to deploy multiple clusters to achieve their technical and business objectives; for example, separating production from non-production environments, or separating services across tiers, locales, or teams. You can read more about the benefits and tradeoffs involved in multi-cluster approaches in multi-cluster use cases.

As the number of clusters grows, providing management and governance over these clusters and the resources inside them becomes increasingly difficult. Often at this point, organizations resort to building custom tooling and operational policies to obtain the level of control that they require.

Google Cloud provides the fleet concept to help administrators manage multiple clusters. A fleet provides a way to logically group and normalize clusters, making administration of infrastructure easier. Fleets can be used in the context of both Anthos and GKE; you can see a list of the Anthos and GKE components that can leverage fleets in the fleet-enabled components section later in this document.

Adopting fleets helps your organization uplevel management from individual clusters to entire groups of clusters. Furthermore, the normalization that fleets require can help your teams adopt similar best practices to those used at Google. For comparison, just as the Organization resource is the root node of the Google Cloud resource hierarchy and is used for policy and control over resources grouped under it, the fleet forms the root for managing multiple clusters.

Terminology

The following are some important terms we use when talking about fleets.

Fleet-aware resources

Fleet-aware resources are Google Cloud project resources that can be logically grouped and managed as fleets. Only Kubernetes clusters can currently be fleet members, although we envisage virtual machine (VM) instances and possibly other resources being able to join fleets in future platform iterations. Google Cloud provides a Connect service to register resources as fleet members.

Fleet host project

The implementation of fleets, like many other Google Cloud resources, is rooted in a Google Cloud project, which we refer to as the fleet host project. A given Cloud project can only have a single fleet (or no fleets) associated with it. This restriction reinforces using Cloud projects to provide stronger isolation between resources that are not governed or consumed together.

Fleet-enabled components

The following Anthos and GKE components all leverage fleet concepts such as namespace and identity sameness to provide a simplified way to work with your clusters and services. For any current requirements or limitations for using fleets with each component, see the component requirements.

  • Workload identity pools (Anthos and GKE clusters)
    A fleet offers a common workload identity pool that can be used to authenticate and authorize workloads uniformly within a service mesh and to external services.

  • Anthos Service Mesh (Anthos)
    Anthos Service Mesh is a suite of tools that helps you monitor and manage a reliable service mesh on Google Cloud or on-premises. You can form a service mesh across the resources (such as clusters and VMs) that are part of the same fleet.

  • Anthos Config Management (Anthos) and Config Sync (GKE)
    Anthos Config Management lets you deploy and monitor declarative policy and configuration changes for your system stored in a central Git repository, leveraging core Kubernetes concepts such as namespaces, labels, and annotations. With Anthos Config Management (and its sibling product Config Sync for non-Anthos clusters), policy and configuration is defined across the fleet, but applied and enforced locally in each of the member resources.

  • Multi-cluster Ingress (Anthos)
    Multi-cluster Ingress uses the fleet to define the set of clusters and service endpoints that traffic can be load balanced over, enabling low-latency and high-availability services.

Grouping infrastructure

The first important concept of fleets is the concept of grouping—that is, choosing which pieces of related fleet-aware resources should be made part of a fleet. The decision about what to group together requires answering the following questions:

  • Are the resources related to one another?
    • Resources that have large amounts of cross-service communication benefit the most from being managed together in a fleet.
    • Resources in the same deployment environment (for example, your production environment) should be managed together in a fleet.
  • Who administers the resources?
    • Having unified (or at least mutually trusted) control over the resources is crucial to ensuring the integrity of the fleet.

To illustrate this point, consider an organization that has multiple lines of business (LOBs). In this case, services rarely communicate across LOB boundaries, services in different LOBs are managed differently (for example, upgrade cycles differ between LOBs), and they might even have a different set of administrators for each LOB. In this case, it might make sense to have fleets per LOB. Each LOB also likely adopts multiple fleets to separate their production and non-production services.

As other fleet concepts are explored in the following sections, you might find other reasons to create multiple fleets as you consider your specific organizational needs.

Sameness

An important concept in fleets is the concept of sameness. This means that some Kubernetes objects such as clusters with the same name in different contexts are treated as the same thing. This normalization is done to make administering fleet resources more tractable. It provides some strong guidance about how to set up namespaces, services, and identities. However, it also follows what we find most organizations already implementing themselves.

Namespace sameness

The fundamental example of sameness in a fleet is namespace sameness. Namespaces with the same name in different clusters are considered the same by many components. Another way to think about this property is that a namespace is logically defined across an entire fleet, even if the instantiation of the namespace exists only in a subset of the fleet resources.

Consider the following backend namespace example. Although the namespace is instantiated only in Clusters A and B, it is implicitly reserved in Cluster C (it allows the backend service to also be scheduled into Cluster C if necessary). This means that namespaces are allocated for the entire fleet and not per cluster. As such, namespace sameness requires consistent namespace ownership across the fleet.

Diagram illustrating namespace sameness in a fleet
Namespace sameness in a fleet

Service sameness

Anthos Service Mesh and Multi-cluster Ingress use the concept of sameness of services within a namespace. Like namespace sameness, this implies that services with the same namespace and service name are considered to be the same service.

The service endpoints can be merged across the mesh in the case of Anthos Service Mesh. With Multi-cluster Ingress, a MultiClusterService (MCS) resource makes the endpoint merging more explicit; however, we recommend similar practices with respect to naming. Because of this, it's important to ensure that identically named service names within the same namespace are actually the same thing.

In the following example, internet traffic is load balanced across a same-named service in the frontend namespace present in both Clusters B and C. Similarly, using the service mesh properties within the fleet, the frontend service can reach a same-named service in the auth namespace present in Clusters A and C.

Diagram illustrating service sameness in a fleet
Service sameness in a fleet

Identity sameness when accessing external resources

Services within a fleet can leverage a common identity as they egress to access external resources such as Google Cloud services, object stores, and so on. This common identity makes it possible to give the services within an fleet access to an external resource once rather than cluster-by-cluster.

To illustrate this point further, consider the following example. Clusters A, B, and C are enrolled in common identity within their fleet. When services in the backend namespace access Google Cloud resources, their identities are mapped to a common Google Cloud service account called back. The Google Cloud service account back can be authorized on any number of managed services, from Cloud Storage to Cloud SQL. As new fleet resources such as clusters are added in the backend namespace, they automatically inherit the workload identity sameness properties.

Because of identity sameness, it is important that all resources in a fleet are trusted and well-governed. Revisiting the previous example, if Cluster C is owned by a separate, untrusted team, they too can create a backend namespace and access managed services as if they were the backend in Cluster A or B.

Diagram illustrating identity sameness accessing resources outside a fleet
Identity sameness accessing resources outside a fleet

Identity sameness within a fleet

Within the fleet, identity sameness is used similarly to the external identity sameness we previously discussed. Just as fleet services are authorized once for an external service, they can be authorized internally as well.

In the following example, we are using Anthos Service Mesh to create a multi-cluster service mesh where frontend has access to backend. With Anthos Service Mesh and fleets, we don't need to specify that frontend in clusters B and C can access backend in Clusters A and B. Instead, we just specify that frontend in the fleet can access backend in the fleet. This property not only makes authorization simpler, it also makes the resource boundaries more flexible; now workloads can easily be moved from cluster to cluster without affecting how they are authorized. As with workload identity sameness, governance over the fleet resources is crucial to ensuring the integrity of service-to-service communication.

Diagram illustrating identity sameness inside a fleet
Identity sameness inside a fleet

Exclusivity

Fleet-aware resources can only be members of a single fleet at any given time, a restriction that is enforced by Google Cloud tools and components. This restriction ensures that there is only one source of truth governing a cluster. Without exclusivity, even the most simple components would become complex to use, requiring your organization to reason about and configure how multiple components from multiple fleets would interact.

High trust

Service sameness, workload identity sameness, and mesh identity sameness are built on top of a principle of high trust between members of a fleet. This trust makes it possible to uplevel management of these resources to the fleet, rather than managing resource-by-resource (that is, cluster-by-cluster for Kubernetes resources), and ultimately makes the cluster boundary less important.

Put another way, within a fleet, clusters provide protection from blast radius concerns, availability (of both the control plane and underlying infrastructure), noisy neighbors, and so on. However, they are not a strong isolation boundary for policy and governance because administrators of any member in a fleet can potentially affect the operations of services in other members of the fleet.

For this reason, we recommend that resources that are not trusted by the fleet administrator be placed in their own fleets to keep them isolated. Then, as necessary, individual services can be authorized across the fleet boundary.

What's next?

Ready to think about applying these concepts to your own systems? See our fleet requirements and best practices.