GKE Enterprise (Anthos) technical overview

GKE Enterprise is Google's cloud-centric container platform for running modern apps anywhere consistently at scale. This guide provides an overview of how GKE Enterprise works and how it can help you deliver manageable, scalable, reliable applications.

Why GKE Enterprise?

Typically, as organizations embrace cloud-native technologies like containers, container orchestration, and service meshes, they reach a point where running a single cluster is no longer sufficient. There are a variety of reasons why organizations choose to deploy multiple clusters to achieve their technical and business objectives; for example, separating production from non-production environments, varying regulatory restrictions, or separating services across tiers, locales, or teams. However, using multiple clusters has its own difficulties and overheads in terms of consistent configuration, security, and management - for example, manually configuring one cluster at a time risks breakages, and it can be challenging to see exactly where errors are happening.

Things can become even more complex (and expensive) when the clusters aren't all in one place. Many organizations using Google Cloud also want or need to run workloads in their own data centers, factory floors, retail stores, and even in other public clouds – but they don't want to build new container platforms themselves in all these locations, or rethink how they configure, secure, monitor, and optimize container workloads depending on where they're running, with the possibility of inconsistent environments, security and misconfiguration risks, and operational toil.

For example:

  • A financial institution is building a digital banking platform on Google Cloud and requires consistent configurations, strong security policy enforcement, and deep visibility into how multiple apps communicate. A large retail company building a modern ecommerce platform has the same requirements. Both companies manage multiple clusters in multiple regions in Google Cloud using GKE.
  • Another global financial institution is building complex risk management apps, inter-bank transfer apps, and many other sensitive workloads, some of which must remain behind the corporate firewall and some of which are deployed on GKE on Google Cloud.
  • A major pharmacy retailer is creating new vaccine scheduling, customer messaging, and digital engagement apps to modernize pharmacy operations and create a more personalized in-store experience. These apps require in-store container platforms that are integrated with Google Cloud-hosted services like BigQuery and Retail Search
  • A media and entertainment company requires a consistent container environment in 30 ballparks - all connected to and managed from Google Cloud - to gather and analyze terabytes of game statistics and to fuel fan engagement both inside the ballpark and virtually.
  • A hardware manufacturing company needs to test and optimize factory floor product quality and worker safety by analyzing data with very low latency to make decisions in near real-time, while also consolidating data in Google Cloud for longer-term analysis.
  • A software and internet company that offers an integration platform in a software as a service (SaaS) model needs to offer its platform on several major public clouds to run where its customers need proximity to native cloud services. The company needs a unified and consistent way to provision, configure, secure, and monitor container environments in multiple public clouds from one management plane, to avoid the operational overhead of managing each cloud environment with different native management tools.

GKE Enterprise can help all these organizations by providing a consistent platform that lets them:

  • Modernize applications and infrastructure in-place
  • Create a unified cloud operating model (single pane of glass) to create, update, and optimize container clusters wherever they are
  • Scale large multi-cluster applications as fleets - logical groupings of similar environments - with consistent security, configuration, and service management
  • Enforce consistent governance and security from a unified control plane

It does this with opinionated tools and features that help them govern, manage, and operate containerized workloads at enterprise scale, enabling them to adopt best practices and principles that we've learned from running services at Google.

GKE Enterprise basics

Diagram showing the features of the GKE Enterprise platform

GKE Enterprise capabilities are built around the idea of the fleet: a logical grouping of Kubernetes clusters that can be managed together. A fleet can be entirely made up of GKE clusters on Google Cloud, or include clusters outside Google Cloud running on-premises and on other public clouds such as AWS and Azure.

Once you have created a fleet, you can use GKE Enterprise fleet-enabled features to add further value and simplify working across multiple clusters and infrastructure providers:

  • Configuration and policy management tools help you work more easily at scale, automatically adding and updating the same configuration, features, and security policies consistently across your fleet, wherever your clusters are.
  • Fleet-wide networking features help you manage traffic across your entire fleet, including Multi-Cluster Ingress for applications that span multiple clusters, and service mesh traffic management features.
  • Identity management features help you consistently configure authentication for fleet workloads and users.
  • Observability features let you monitor and troubleshoot your fleet clusters and applications, including their health, resource utilization, and security posture.
  • Team management tools enable you to make sure that your teams have access to the infrastructure resources they need to run their workloads, and give teams a team-scoped view of their resources and workloads.
  • For microservice-based applications running in your fleet, Anthos Service Mesh provides powerful tools for application security, networking, and observability across your mesh.

You can enable the entire GKE Enterprise platform to use all available features, including multicloud and hybrid cloud capabilities, or you can create a fleet on Google Cloud only and pay for additional enterprise features as you need them. GKE Enterprise uses industry-standard open source technologies, and supports multiple infrastructure providers, providing flexibility to use GKE Enterprise in a way that meets your business and organizational needs.

How fleets work

Fleets are how GKE Enterprise lets you logically group and normalize Kubernetes clusters, making administration of infrastructure easier. Adopting fleets helps your organization uplevel management from individual clusters to groups of clusters, with a single view on your entire fleet in the Google Cloud console. However, fleets are more than just groups of clusters. The principles of sameness and trust that are assumed within a fleet are what enable you to use the full range of fleet-enabled features.

The first of these fleet principles is sameness. This means that, within a fleet of clusters, some Kubernetes objects such as namespaces in different clusters are treated as if they were the same thing when they have the same name. This normalization makes it simpler to manage many clusters at once and is used by GKE Enterprise fleet-enabled features. For example, you can apply a security policy with Policy Controller to all fleet services in namespace foo, regardless of which clusters they happen to be in, or where those clusters are.

Fleets also assume service sameness (all services in a namespace with the same name can be treated as the same service, for example for traffic management purposes) and identity sameness (services and workloads within a fleet can leverage a common identity for authentication and authorization). The fleet sameness principle also provides some strong guidance about how to set up namespaces, services, and identities, following what many organizations and Google already implement themselves as best practices.

Another key principle is trust - service sameness, Workload Identity sameness, and mesh identity sameness are all built on top of a principle of high trust between members of a fleet. This trust makes it possible to uplevel management of these resources to the fleet, rather than managing cluster by cluster, and ultimately makes the cluster boundary less important.

How you organize your fleets depends on your organizational and technical needs. Each fleet is associated with a specific Google Cloud project, known as your fleet host project, which you use to manage and view your fleet, but can include clusters from other projects. You could, for example, have separate fleets for your prod, test, and dev environments, or separate fleets for different lines of business (different teams as tenants on your infrastructure can be handled within fleets using scopes). Clusters that have large amounts of cross-service communication benefit the most from being managed together in a fleet. Clusters in the same environment (for example, your production environment) should be in the same fleet. We generally recommend the largest fleet size that allows for trust and sameness among services, while keeping in mind that Anthos Service Mesh, if you choose to use it, lets you enable finer-grained service access control within your fleet.


Find out more:


Kubernetes clusters everywhere

Kubernetes is at the core of GKE Enterprise, with a variety of Kubernetes cluster options to choose from when building your fleet:

  • Google Kubernetes Engine (GKE) is Google's managed Kubernetes implementation, with the following options available for GKE Enterprise users:
    • On Google Cloud, GKE has a cloud-hosted control plane and clusters made up of Compute Engine instances. While GKE on Google Cloud on its own helps you automatically deploy, scale, and manage Kubernetes, grouping GKE clusters in a fleet lets you work more easily at scale, and allows you to use GKE Enterprise features in addition to the powerful cluster management features already offered by GKE.
    • Outside Google Cloud, GKE is extended for use with other infrastructure providers, including Azure, AWS, and on your own hardware on-premises (either on VMware or on bare metal). In these options, the Google-provided Kubernetes control plane runs in your data center or cloud provider along with your cluster nodes, with your clusters connected to your fleet host project in Google Cloud.
  • Google Distributed Cloud Edge also lets you add on-premises GKE clusters to your fleet, this time running on Google-provided and maintained hardware and supporting a subset of GKE Enterprise features.
  • GKE clusters are not your only option. GKE Enterprise also provides the ability to register conformant third-party Kubernetes clusters to your fleet, such as EKS and AKS clusters, known as attached clusters. With this option you continue to run existing workloads where they are while adding value with a subset of GKE Enterprise features. GKE Enterprise does not manage the Kubernetes control plane or node components—only the GKE Enterprise services that run on those clusters.

For all GKE-based clusters, including on-premises and public clouds, GKE Enterprise provides tools for cluster management and lifecycle (create, update, delete, and upgrade), including command line utilities and, for some cluster types, management from the Google Cloud console.

Cluster configuration

Wherever your clusters are, Config Sync provides a consistent way to manage cluster configuration across your entire fleet, including attached clusters. Config Sync uses the approach of "configuration as data": the desired state of your environment is defined declaratively, maintained as a single source of truth under version control, and applied directly with repeatable results. Config Sync monitors a central Git repository containing your configuration and automatically applies any changes to its specified target clusters, wherever they happen to be running. Any YAML or JSON that can be applied with kubectl commands can be managed with Config Sync and applied to any Kubernetes cluster.

Migration and VMs

For organizations that want to migrate their applications to containers and Kubernetes as part of their modernization process, GKE Enterprise includes Migrate to Containers, with tools to convert VM-based workloads into containers that run on GKE. On bare metal GKE Enterprise platforms (GKE on Bare Metal and Distributed Cloud Edge), organizations can also use VM Runtime on Google Distributed Cloud to run VMs on top of Kubernetes in the same way that they run containers, letting them continue to use existing VM-based workloads as they also develop and run new container-based applications. When they're ready, they can migrate these VM-based workloads to containers and continue using the same GKE Enterprise management tools.


Find out more:


GKE Enterprise features

The rest of this guide introduces you to the features that GKE Enterprise provides to help you manage your fleets and the applications that run on them. You can see a complete list of available features for each supported Kubernetes cluster type in GKE Enterprise deployment options.

Networking, authentication, and security

After you have built your fleet, GKE Enterprise helps you manage traffic, manage authentication and access control, and consistently enforce security and compliance policies across your fleet.

Connecting to your fleet

To manage the connection to Google in hybrid and multicloud fleets, Google provides a Kubernetes deployment called the Connect Agent. Once installed in a cluster as part of fleet registration, the agent establishes a connection between your cluster outside Google Cloud and its Google Cloud fleet host project, letting you manage your clusters and workloads from Google and use Google services.

In on-premises environments, connectivity to Google can use the public internet, a high-availability VPN, Public Interconnect, or Dedicated Interconnect, depending on your applications' latency, security, and bandwidth requirements when interacting with Google Cloud.


Find out more:


Load balancing

For managing traffic to and within your fleet, GKE Enterprise provides the following load balancing solutions:

  • GKE clusters on Google Cloud can use the following options:
  • GKE clusters on-premises let you choose from variety of load balancing modes to suit your needs, including a bundled MetalLB load balancer and the option to manually configure load balancing to use your existing solutions
  • Distributed Cloud Edge includes bundled MetalLB load balancing
  • GKE clusters on other public clouds use platform-native load balancers

Find out more:


Authentication and access control

A significant challenge when working with multiple clusters across multiple infrastructure providers is managing authentication and authorization. For authenticating to your fleet's clusters, GKE Enterprise provides you with options for consistent, simple, and secured authentication when interacting with clusters from the command line with kubectl, and from the Google Cloud console.

  • Use Google identity: The Connect Gateway lets users and service accounts authenticate to clusters across your fleet with their Google IDs, wherever the clusters live. You can use this feature to connect directly to clusters, or leverage it with build pipelines and other DevOps automation.
  • Use third-party identity: GKE Enterprise's GKE Identity Service lets you configure authentication with third-party identity providers, letting your teams continue to use existing usernames, passwords, and security groups from OIDC (and LDAP where supported) providers such as Microsoft AD FS and Okta across your entire fleet.

You can configure as many supported identity providers as you want for a cluster.

Once you have set up authentication, you can then use standard Kubernetes role-based access control (RBAC) to authorize authenticated users to interact with your clusters, as well as Identity and Access Management to control access to Google services such as the Connect Gateway.

For workloads running on your clusters, GKE Enterprise provides fleet-wide workload identity. This feature lets workloads on fleet member clusters use identities from a fleet-wide workload identity pool when authenticating to external services such as Cloud APIs. This makes it simpler to set up an application's access to these services versus having to configure access cluster by cluster. For example, if you have an application with a backend deployed across multiple clusters in the same fleet, and which needs to authenticate to a Google API, you can configure your application so that all services in the "backend" namespace can use that API.


Find out more:


Policy management

Another challenge when working with multiple clusters is enforcing consistent security and regulatory compliance policies across your fleet. Many organizations have stringent security and compliance requirements, such as those protecting consumer information in financial service applications, and need to be able to meet these at scale.

To help you do this, Policy Controller enforces custom business logic against every Kubernetes API request to the relevant clusters. These policies act as "guardrails" and prevent any changes to the configuration of the Kubernetes API from violating security, operational, or compliance controls. You can set policies to actively block non-compliant API requests across your fleet, or simply to audit the configuration of your clusters and report violations. Common security and compliance rules can easily be expressed using Policy Controller's built-in set of rules, or you can write your own rules using the extensible policy language, based on the open source Open Policy Agent project.


Find out more:


Application-level security

For applications running on your fleet, GKE Enterprise provides defence-in-depth access control and authentication features, including:

  • Binary Authorization, which lets you ensure that only trusted images are deployed on your fleet's clusters.
  • Kubernetes network policy, which lets you specify which Pods are allowed to communicate with each other and other network endpoints.
  • Anthos Service Mesh service access control, which lets you configure fine-grained access control for your mesh services based on service accounts and request contexts.
  • Anthos Service Mesh certificate authority (Mesh CA), which automatically generates and rotates certificates so you can enable mutual TLS authentication (mTLS) easily between your services.

Observability

A key part of operating and managing clusters at scale is being able to easily monitor your fleet's clusters and applications, including their health, resource utilization, and security posture.

GKE Enterprise in the Google Cloud console

The Google Cloud console is Google Cloud's web interface that you can use to manage your projects and resources. GKE Enterprise brings enterprise features and a structured view of your entire fleet into the GKE Google Cloud console pages, providing an integrated interface that helps you manage your applications and resources all in one place. Dashboard pages let you view high level details, as well as letting you drill down as far as necessary to identify issues.

  • Overview: The top-level overview provides an overview of your fleet's resource usage based on information provided through Cloud Monitoring, showing CPU, memory, and disk utilization aggregated by fleet and by cluster, as well as fleet-wide Policy Controller and Config Sync coverage.
  • Cluster management: The GKE Enterprise Clusters view provides a secure console to view the state of all your project and fleet's clusters including cluster health, register clusters to your fleet, and create new clusters for your fleet (Google Cloud only). For information about specific clusters, you can drill down from this view or visit other GKE dashboards to get further details about your cluster nodes and workloads.
  • Team overview: If you have set up teams for your fleet, the Teams overview provides resource utilization, error rates, and other metrics aggregated by team, making it easier for admins and team members to view and troubleshoot errors.
  • Feature management: The Feature Management view lets you view the state of GKE Enterprise features for your fleet clusters.
  • Service Mesh: If you're using Anthos Service Mesh on Google Cloud, the Service Mesh view provides observability into the health and performance of your services. Anthos Service Mesh collects and aggregates data about each service request and response, meaning you don't have to instrument your code to collect telemetry data or manually set up dashboards and charts. Anthos Service Mesh automatically uploads metrics and logs to Cloud Monitoring and Cloud Logging for all traffic within your cluster. This detailed telemetry lets operators observe service behavior, and empowers them to troubleshoot, maintain, and optimize their applications.
  • Security posture: The Security Posture view shows you opinionated, actionable recommendations to improve your fleet's security posture.
  • Configuration management: The Config view gives you an at-a-glance overview of the configuration state of all fleet clusters with Config Sync enabled, and lets you quickly add the feature to clusters that haven't been set up yet. You can easily track configuration changes and see which branch and commit tag has been applied to each cluster. Flexible filters make it simple to view configuration rollout status by cluster, branch, or tag.
  • Policy management: The Policy view shows you how many clusters in your fleet have Policy Controller enabled, provides an overview of any compliance violations, and lets you add the feature to fleet clusters.

Logging and monitoring

For more in-depth information about your clusters and their workloads, you can use Cloud Logging and Cloud Monitoring. Cloud Logging provides a unified place to store and analyze logs data, while Cloud Monitoring automatically collects and stores performance data, as well as providing data visualization and analysis tools. Most GKE Enterprise cluster types send logging and monitoring information for system components (such as workloads in the kube-system and gke-connect namespaces) to Cloud Monitoring and Cloud Logging by default. You can further configure Cloud Monitoring and Cloud Logging to get information about your own application workloads, build dashboards including multiple types of metric, create alerts, and more.

Depending on your organization and project needs, GKE Enterprise also supports integration with other observability tools, including open source Prometheus and Grafana, and third-party tools such as Elastic and Splunk.


Find out more:


Service management

In Kubernetes, a service is an abstract way to expose an application running on a set of Pods as a network service, with a single DNS address for traffic to the service workloads. In a modern microservices architecture, a single application may consist of numerous services, and each service may have multiple versions deployed concurrently. Service-to-service communication in this kind of architecture occurs over the network, so services must be able to deal with network idiosyncrasies and other underlying infrastructure issues.

To make it easier to manage services in your fleet, you can use Anthos Service Mesh. Anthos Service Mesh is based on Istio, which is an open-source implementation of a service mesh infrastructure layer. Service meshes factor out common concerns of running a service such as monitoring, networking, and security, with consistent, powerful tools, making it easier for service developers and operators to focus on creating and managing their applications. With Anthos Service Mesh, these functions are abstracted away from the application's primary container and implemented in a common out-of-process proxy delivered as a separate container in the same Pod. This pattern decouples application or business logic from network functions, and enables developers to focus on the features that the business needs. Service meshes also let operations teams and development teams decouple their work from one another.

Anthos Service Mesh provides you with many features along with all of Istio's functionality:

  • Service metrics and logs for all traffic within your mesh's cluster are automatically ingested to Google Cloud.
  • Automatically generated dashboards display in-depth telemetry in the Anthos Service Mesh dashboard, to let you dig deep into your metrics and logs, filtering and slicing your data on a wide variety of attributes.
  • Service-to-service relationships at a glance: understand what connects to each service and the services it depends on.
  • Secure your inter-service traffic: Anthos Service Mesh certificate authority (Mesh CA) automatically generates and rotates certificates so you can enable mutual TLS authentication (mTLS) easily with Istio policies.
  • Quickly see the communication security posture not only of your service, but its relationships to other services.
  • Dig deeper into your service metrics and combine them with other Google Cloud metrics using Cloud Monitoring.
  • Gain clear and simple insight into the health of your service with service level objectives (SLOs), which allow you to easily define and alert on your own standards of service health.

Anthos Service Mesh lets you choose between a fully-managed service mesh control plane in Google Cloud (for meshes running on fleet member clusters on Google Cloud only) or an in-cluster control plane that you install yourself. You can find out more about the features available for each option in the Anthos Service Mesh documentation.


Find out more:


What's next?