Multi-tenancy in Google Kubernetes Engine (GKE) refers to one or more clusters that are shared between tenants. In Kubernetes, a tenant can be defined as any of the following:
- A team responsible for developing and operating one or more workloads.
- A set of related workloads, whether operated by one or more teams.
- A single workload, such as a Deployment.
Cluster multi-tenancy is often implemented to reduce costs or to consistently apply administration policies across tenants. However, incorrectly configuring a GKE cluster or its associated GKE resources can result in unachieved cost savings, incorrect policy application, or destructive interactions between different tenants' workloads.
This guide provides best practices to safely and efficiently set up multiple multi-tenant clusters for an enterprise organization.
Assumptions and requirements
The best practices in this guide are based on a multi-tenant use case for an enterprise environment, which has the following assumptions and requirements:
- The organization is a single company that has many tenants (two or more application/service teams) that use Kubernetes and would like to share computing and administrative resources.
- Each tenant is a single team developing a single workload.
- Other than the application/service teams, there are other teams that also utilize and manage clusters, including platform team members, cluster administrators, auditors, etc.
- The platform team owns the clusters and defines the amount of resources each tenant team can use; each tenant can request more.
- Each tenant team should be able to deploy their application through the Kubernetes API without having to communicate with the platform team.
- Each tenant should not be able to affect other tenants in the shared cluster, except via explicit design decisions like API calls, shared data sources, etc.
This setup will serve as a model from which we can demonstrate multi-tenant best practices. While this setup might not perfectly describe all enterprise organizations, it can be easily extended to cover similar scenarios.
Setting up folders, projects and clusters
Best practices:
Establish a folder and project hierarchy.Assign roles using IAM.
Centralize network control with Shared VPCs.
Create one cluster admin project per cluster.
Make clusters private.
Ensure the control plane for the cluster is regional.
Ensure nodes in your cluster span at least three zones.
Autoscale cluster nodes and resources.
Schedule maintenance windows for off-peak hours.
Set up an external Application Load Balancer with Ingress.
For enterprise organizations deploying multi-tenant clusters in GKE, additional configuration is needed in other Google Cloud systems in order to manage the complexity which does not exist in simpler single-application, single-team Kubernetes deployments. This includes both project configuration for isolating administrative concerns as well as mapping organization structure to cloud identities and accounts and managing additional Google Cloud resources, such as databases, logging and monitoring, storage, and networking.
Establish a folder and project hierarchy
To capture how your organization manages Google Cloud resources and to enforce a separation of concerns, use folders and projects. Folders allow different teams to set policies that cascade across multiple projects, while projects can be used to segregate environments (for example, production vs. staging) and teams from each other. For example, most organizations have a team to manage network infrastructure and a different team to manage clusters. Each technology is considered a separate piece of the stack requiring its own level of expertise, troubleshooting and access.
A parent folder can contain up to 300 folders, and you can nest folders up to 10 levels deep. If you have over 300 tenants, you can arrange the tenants into nested hierarchies to stay within the limit. For more information about folders, see Creating and Managing Folders.
Assign roles using IAM
You can control access to Google Cloud resources through IAM policies. Start by identifying the groups needed for your organization and their scope of operations, then assign the appropriate IAM role to the group.
Use Google Groups to efficiently assign and manage IAM for users.Centralize network control
To maintain centralized control over network resources, such as subnets, routes, and firewalls, use Shared VPC networks. Resources in a Shared VPC can communicate with each other securely and efficiently across project boundaries using internal IPs. Each Shared VPC network is defined and owned by a centralized host project, and can be used by one or more service projects.
Using Shared VPC and IAM, you can separate network administration from project administration. This separation helps you implement the principle of least privilege. For example, a centralized network team can administer the network without having any permissions into the participating projects. Similarly, the project admins can manage their project resources without any permissions to manipulate the shared network.
When you set up a Shared VPC, you must configure the subnets and their secondary IP ranges in the VPC. To determine the subnet size, you need to know the expected number of tenants, the number of Pods and Services they are expected to run, and the maximum and average Pod size. Calculating the total cluster capacity needed will allow for an understanding of the desired instance size, and this provides the total node count. With the total number of nodes, the total IP space consumed can be calculated, and this can provide the desired subnet size.
Here are some factors that you should also consider when setting up your network:
- The maximum number of service projects that can be attached to a host project is 1,000, and the maximum number of Shared VPC host projects in a single organization is 100.
- The Node, Pod, and Services IP ranges must all be unique. You cannot create a subnet whose primary and secondary IP address ranges overlap.
- The maximum number of Pods and Services for a given GKE cluster is limited by the size of the cluster's secondary ranges.
- The maximum number of nodes in the cluster is limited by the size of the cluster's subnet's primary IP address range and the cluster's Pod address range.
- For flexibility and more control over IP address management, you can configure the maximum number of Pods that can run on a node. By reducing the number of Pods per node, you also reduce the CIDR range allocated per node, requiring fewer IP addresses.
For information on network ranges in a VPC cluster, see Creating a VPC-native cluster.
Tenants that require further isolation for resources that run outside the shared clusters (such as dedicated Compute Engine VMs) may use their own VPC, which is peered to the Shared VPC run by the networking team. This provides additional security at the cost of increased complexity and numerous other limitations. For more information on peering, see Using VPC Network Peering. In the example below, all tenants have chosen to share a single (per-environment) tenant VPC.
Creating reliable and highly available clusters
Design your cluster architecture for high availability and reliability by implementing the following recommendations:
- Create one cluster admin project per cluster to reduce the risk of project-level configurations (for example, IAM bindings) adversely affecting many clusters, and to help provide separation for quota and billing. Cluster admin projects are separate from tenant projects, which individual tenants use to manage, for example, their Google Cloud resources.
- Configure your network isolation to disable access to the nodes and manage access to the control plane. We also recommend configuring your network isolation for development and staging environments.
- Ensure the control plane for the cluster is regional to provide high availability for multi-tenancy; any disruptions to the control plane will impact tenants. Please note, there are cost implications with running regional clusters. Autopilot clusters are pre-configured as regional clusters.
- Ensure the nodes in your cluster span at least three zones to achieve zonal reliability. For information about the cost of egress between zones in the same region, see the network pricing documentation.
Autoscale cluster nodes and resources
To accommodate the demands of your tenants, automatically scale nodes in your cluster by enabling autoscaling.Autoscaling helps systems appear responsive and healthy when heavy workloads are deployed by various tenants in their namespaces, or to respond to zonal outages.
With Autopilot clusters, node pools are automatically scaled to meet the requirements of your workloads.
When you enable autoscaling, you specify the minimum and maximum number of nodes in a cluster based on the expected workload sizes. By specifying the maximum number of nodes, you can ensure there is enough space for all Pods in the cluster, regardless of the namespace they run in. Cluster autoscaling rescales node pools based on the min/max boundary, helping to reduce operational costs when the system load decreases, and avoid Pods going into a pending state when there aren't enough available cluster resources. To determine the maximum number of nodes, identify the maximum amount of CPU and memory that each tenant requires, and add those amounts together to get the total capacity that the cluster should be able to handle if all tenants were at the limit. Using the maximum number of nodes, you can then choose instance sizes and counts, taking into consideration the IP subnet space made available to the cluster.
Use Pod autoscaling to automatically scale Pods based on resource demands. Horizontal Pod Autoscaler (HPA) scales the number of Pod replicas based on CPU/memory utilization or custom metrics. Vertical Pod Autoscaling (VPA) can be used to automatically scale Pods resource demands. It should not be used with HPA unless custom metrics are available as the two autoscalers can compete with each other. For this reason, start with HPA and only later VPA when needed.
Determine the size of your cluster
When determining the size of your cluster, here are some important factors to consider:
- The sizing of your cluster is dependent on the type of workloads you plan to run. If your workloads have greater density, the cost efficiency is higher but there is also a greater chance for resource contention.
- The minimum size of a cluster is defined by the number of zones it spans: one node for a zonal cluster and three nodes for a regional cluster.
- Per project, there is a maximum of 50 clusters per zone, plus 50 regional clusters per region.
Per cluster, the following maximum values apply to nodes:
- 1,000 nodes per node pool
- 1,000 nodes per cluster (if you use the GKE Ingress controller)
- 5,000 nodes per cluster by default. You can increase this limit to 15,000 or to 65,000 nodes. To learn more, see Clusters with more than 5,000 nodes.
- 256 Pods per node
- 150,000 Pods per cluster, and 300,000 containers per cluster
Refer to the Quotas and limits page for additional information.
Schedule maintenance windows
To reduce downtimes during cluster/node upgrades and maintenance, schedule maintenance windows to occur during off-peak hours. During upgrades, there can be temporary disruptions when workloads are moved to recreate nodes. To ensure minimal impact of such disruptions, schedule upgrades for off-peak hours and design your application deployments to handle partial disruptions seamlessly, if possible.
Set up an external Application Load Balancer with Ingress
To help with the management of your tenants' published Services
and the management of incoming traffic to those Services, create an HTTP(s)
load balancer to allow a single
ingress per cluster, where each tenant's Services are registered with the cluster's
Ingress resource. You can create and
configure an HTTP(S) load balancer by creating a Kubernetes Ingress resource,
which defines how traffic reaches your Services and how the traffic is routed to
your tenant's application. By registering Services with the Ingress resource,
the Services' naming convention becomes consistent, showing a single ingress,
such as tenanta.example.com
and tenantb.example.com
.
Securing the cluster for multi-tenancy
Best practices:
Control Pod communication with network policies.Run workloads with GKE Sandbox.
Set up policy-based admission controls.
Use Workload Identity Federation for GKE to grant access to Google Cloud services.
Restrict network access to the control plane.
Control Pod communication with network policies
To control network communication between Pods in each of your cluster's
namespaces, create network policies
based on your tenants' requirements. As an initial recommendation, you
should block traffic between namespaces that host different tenants'
applications. Your cluster administrator can apply a deny-all
network policy
to deny all ingress traffic to avoid Pods from one namespace accidentally
sending traffic to Services or databases in other namespaces.
As an example, here's a network policy that restricts ingress from all other
namespaces to the tenant-a
namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: tenant-a
spec:
podSelector:
matchLabels:
ingress:
- from:
- podSelector: {}
Run workloads with GKE Sandbox
Clusters that run untrusted workloads are more exposed to security vulnerabilities than other clusters. Using GKE Sandbox, you can harden the isolation boundaries between workloads for your multi-tenant environment. For security management, we recommend starting with GKE Sandbox and then using policy-based admission controls to fill in any gaps.
GKE Sandbox is based on gVisor, an open source container sandboxing project, and provides additional isolation for multi-tenant workloads by adding an extra layer between your containers and host OS. Container runtimes often run as a privileged user on the node and have access to most system calls into the host kernel. In a multi-tenant cluster, one malicious tenant can gain access to the host kernel and to other tenant's data. GKE Sandbox mitigates these threats by reducing the need for containers to interact with the host by shrinking the attack surface of the host and restricting the movement of malicious actors.
GKE Sandbox provides two isolation boundaries between the container and the host OS:
- A user-space kernel, written in Go, that handles system calls and limits interaction with the host kernel. Each Pod has its own isolated user-space kernel.
- The user-space kernel also runs inside namespaces and seccomp filtering system calls.
Set up policy-based admission controls
To prevent Pods that violate your security boundaries from running in your cluster, use an admission controller. Admission controllers can check Pod specifications against policies that you define, and can prevent Pods that violate those policies from running in your cluster.
GKE supports the following types of admission control:
- Policy Controller: Declare pre-defined or custom policies and enforce them in clusters at scale using fleets. Policy Controller is an implementation of the open source Gatekeeper open policy agent and is a feature of GKE Enterprise.
- PodSecurity admission controller: Enforce pre-defined policies that correspond to the Pod Security Standards in individual clusters or in specific namespaces.
Use Workload Identity Federation for GKE to grant access to Google Cloud services
To securely grant workloads access to Google Cloud services, enable Workload Identity Federation for GKE in the cluster. Workload Identity Federation for GKE helps administrators manage Kubernetes service accounts that Kubernetes workloads use to access Google Cloud services. When you create a cluster with Workload Identity Federation for GKE enabled, an identity namespace is established for the project that the cluster is housed in. The identity namespace allows the cluster to automatically authenticate service accounts for GKE applications by mapping the Kubernetes service account name to a virtual Google service account handle, which is used for IAM binding of tenant Kubernetes service accounts.
Restrict network access to the control plane
To protect your control plane, restrict access to authorized networks. In GKE, when you enable authorized networks, you can authorize up to 50 CIDR ranges and allow IP addresses only in those ranges to access your control plane. GKE already uses Transport Layer Security (TLS) and authentication to provide secure access to your control plane endpoint from the public internet. By using authorized networks, you can further restrict access to specified sets of IP addresses.
Tenant provisioning
Best practices:
Create tenant projects.Use RBAC to refine tenant access.
Create namespaces for isolation between tenants.
Create tenant projects
To host a tenant's non-cluster resources, create a service project for each tenant. These service projects contain logical resources specific to the tenant applications (for example, logs, monitoring, storage buckets, service accounts, etc.). All tenant service projects are connected to the Shared VPC in the tenant host project.
Use RBAC to refine tenant access
Define finer-grained access to cluster resources for your tenants by using Kubernetes RBAC. On top of the read-only access initially granted with IAM to tenant groups, define namespace-wide Kubernetes RBAC roles and bindings for each tenant group.
Earlier we identified two tenant groups: tenant admins and tenant developers. For those groups, we define the following RBAC roles and access:
Group | Kubernetes RBAC role |
Description |
---|---|---|
Tenant Admin | namespace admin | Grants access to list and watch deployments in their namespace. Grants access to add and remove users in the tenant group. |
Tenant Developer | namespace editor, namespace viewer |
Grants access to create/edit/delete Pods, deployments, Services, configmaps in their namespace. |
In addition to creating RBAC roles and bindings that assign Google Workspace or Cloud Identity groups various permissions inside their namespace, Tenant admins often require the ability to manage users in each of those groups. Based on your organization's requirements, this can be handled by either delegating Google Workspace or Cloud Identity permissions to the Tenant admin to manage their own group membership or by the Tenant admin engaging with a team in your organization that has Google Workspace or Cloud Identity permissions to handle those changes.
You can use IAM and RBAC permissions together with namespaces to restrict user interactions with cluster resources on Google Cloud console. For more information, see Enable access and view cluster resources by namespace.Use Google Groups to bind permissions
To efficiently manage tenant permissions in a cluster, you can bind RBAC permissions to your Google Groups. The membership of those groups are maintained by your Google Workspace administrators, so your cluster administrators do not need detailed information about your users.
As an example, we have a Google Group named tenant-admins@mydomain.com
and a
user named admin1@mydomain.com
is a member of that group, the following
binding provides the user with admin access to the tenant-a
namespace:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: tenant-a
name: tenant-admin-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: tenant-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: "tenant-admins@mydomain.com"
Create namespaces
To provide a logical isolation between tenants that are on the same cluster, implement namespaces. As part of the Kubernetes RBAC process, the cluster admin creates namespaces for each tenant group. The Tenant admin manages users (tenant developers) within their respective tenant namespace. Tenant developers are then able to use cluster and tenant specific resources to deploy their applications.
Avoid reaching namespace limits
The theoretical maximum number of namespaces in a cluster is 10,000, though in practice there are many factors that could prevent you from reaching this limit. For example, you might reach the cluster-wide maximum number of Pods (150,000) and nodes (5,000) before you reach the maximum number of namespaces; other factors (such as the number of Secrets) can further reduce the effective limits. As a result, a good initial rule of thumb is to only attempt to approach the theoretical limit of one constraint at a time, and stay approximately one order of magnitude away from the other limits, unless experimentation shows that your use cases work well. If you need more resources than can be supported by a single cluster, you should create more clusters. For information about Kubernetes scalability, see the Kubernetes Scalability thresholds article.
Standardize namespace naming
To ease deployments across multiple environments that are hosted in different clusters, standardize the namespace naming convention you use. For example, avoid tying the environment name (development, staging, and production) to the namespace name and instead use the same name across environments. By using the same name, you avoid having to change the config files across environments.
Create service accounts for tenant workloads
Create a tenant-specific Google service account for each distinct workload in a tenant namespace. This provides a form of security, ensuring that tenants can manage service accounts for the workloads that they own/deploy in their respective namespaces. The Kubernetes service account for each namespace is mapped to one Google service account by using Workload Identity Federation for GKE.
Enforce resource quotas
To ensure all tenants that share a cluster have fair access to the cluster resources, enforce resources quotas. Create a resource quota for each namespace based on the number of Pods deployed by each tenant, and the amount of memory and CPU required by each Pod.
The following example defines a resource quota where Pods in the tenant-a
namespace can request up to 16 CPU and 64 GB of memory, and the maximum CPU is
32 and the maximum memory is 72 GB.
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-a
spec:
hard: "1"
requests.cpu: "16"
requests.memory: 64Gi
limits.cpu: "32"
limits.memory: 72Gi
Monitoring, logging and usage
Track usage metrics
To obtain cost breakdowns on individual namespaces and labels in a cluster, you can enable GKE cost allocation. GKE cost allocation tracks information about resource requests and resource usage of a cluster's workloads, which you can further break down by namespaces and labels. With GKE cost allocation, you can approximate the cost breakdown for departments/teams that are sharing a cluster, understand the usage patterns of individual applications (or even components of a single application), help cluster admins triage spikes in usage, and provide better capacity planning and budgeting.
When you enable GKE cost allocation, the cluster name and namespace of your GKE workloads appear in the labels field of the billing export to BigQuery.
Provide tenant-specific logs
To provide tenants with log data specific to their project workloads, use Cloud Logging's Log Router. To create tenant-specific logs, the cluster admin creates a sink to export log entries to a log bucket created in the tenant's Google Cloud project.
For details on how to configure these types of logs, see Multi-tenant logging on GKE.
Checklist summary
The following table summarizes the tasks that are recommended for creating multi-tenant clusters in an enterprise organization:
What's next
- For more information on security, see Hardening your cluster's security.
- For more information on VPC networks, see Best practices and reference architectures for VPC design.
- For more enterprise best practices, see Google Cloud Architecture Framework.