Best practices for enterprise multi-tenancy

Multi-tenancy in Google Kubernetes Engine (GKE) refers to one or more clusters that are shared between tenants. In Kubernetes, a tenant can be defined as any of the following:

  • A team responsible for developing and operating one or more workloads.
  • A set of related workloads, whether operated by one or more teams.
  • A single workload, such as a Deployment.

Cluster multi-tenancy is often implemented to reduce costs or to consistently apply administration policies across tenants. However, incorrectly configuring a GKE cluster or its associated GKE resources can result in unachieved cost savings, incorrect policy application, or destructive interactions between different tenants' workloads.

This guide provides best practices to safely and efficiently set up multiple multi-tenant clusters for an enterprise organization.

Assumptions and requirements

The best practices in this guide are based on a multi-tenant use case for an enterprise environment, which has the following assumptions and requirements:

  • The organization is a single company that has many tenants (two or more application/service teams) that use Kubernetes and would like to share computing and administrative resources.
  • Each tenant is a single team developing a single workload.
  • Other than the application/service teams, there are other teams that also utilize and manage clusters, including platform team members, cluster administrators, auditors, etc.
  • The platform team owns the clusters and defines the amount of resources each tenant team can use; each tenant can request more.
  • Each tenant team should be able to deploy their application through the Kubernetes API without having to communicate with the platform team.
  • Each tenant should not be able to affect other tenants in the shared cluster, except via explicit design decisions like API calls, shared data sources, etc.

This setup will serve as a model from which we can demonstrate multi-tenant best practices. While this setup might not perfectly describe all enterprise organizations, it can be easily extended to cover similar scenarios.

Setting up folders, projects and clusters

For enterprise organizations deploying multi-tenant clusters in GKE, additional configuration is needed in other Google Cloud systems in order to manage the complexity which does not exist in simpler single-application, single-team Kubernetes deployments. This includes both project configuration for isolating administrative concerns as well as mapping organization structure to cloud identities and accounts and managing additional Google Cloud resources, such as databases, logging and monitoring, storage, and networking.

Establish a folder and project hierarchy

To capture how your organization manages Google Cloud resources and to enforce a separation of concerns, use folders and projects. Folders allow different teams to set policies that cascade across multiple projects, while projects can be used to segregate environments (for example, production vs. staging) and teams from each other. For example, most organizations have a team to manage network infrastructure and a different team to manage clusters. Each technology is considered a separate piece of the stack requiring its own level of expertise, troubleshooting and access.

A parent folder can contain up to 300 folders, and you can nest folders up to 10 levels deep. If you have over 300 tenants, you can arrange the tenants into nested hierarchies to stay within the limit. For more information about folders, see Creating and Managing Folders.

Assign roles using Cloud IAM

You can control access to Google Cloud resources through Cloud Identity and Access Management (Cloud IAM) policies. Start by identifying the groups needed for your organization and their scope of operations, then assign the appropriate Cloud IAM role to the group. Use Google Groups to efficiently assign and manage Cloud IAM for users.

Centralize network control

To maintain centralized control over network resources, such as subnets, routes, and firewalls, use Shared VPC networks. Resources in a Shared VPC can communicate with each other securely and efficiently across project boundaries using internal IPs. Each Shared VPC network is defined and owned by a centralized host project, and can be used by one or more service projects.

Using Shared VPC and Cloud IAM, you can separate network administration from project administration. This separation helps you implement the principle of least privilege. For example, a centralized network team can administer the network without having any permissions into the participating projects. Similarly, the project admins can manage their project resources without any permissions to manipulate the shared network.

When you set up a Shared VPC, you must configure the subnets and their secondary IP ranges in the VPC. To determine the subnet size, you need to know the expected number of tenants, the number of Pods and Services they are expected to run, and the maximum and average Pod size. Calculating the total cluster capacity needed will allow for an understanding of the desired instance size, and this provides the total node count. With the total number of nodes, the total IP space consumed can be calculated, and this can provide the desired subnet size.

Here are some factors that you should also consider when setting up your network:

  • The maximum number of service projects that can be attached to a host project is 1,000, and the maximum number of Shared VPC host projects in a single organization is 100.
  • The Node, Pod, and Services IP ranges must all be unique. You cannot create a subnet whose primary and secondary IP address ranges overlap.
  • The maximum number of Pods and Services for a given GKE cluster is limited by the size of the cluster's secondary ranges.
  • The maximum number of nodes in the cluster is limited by the size of the cluster's subnet's primary IP address range and the cluster's Pod address range.
  • For flexibility and more control over IP address management, you can configure the maximum number of Pods that can run on a node. By reducing the number of Pods per node, you also reduce the CIDR range allocated per node, requiring fewer IP addresses.

To help calculate subnets for your clusters, you can use the GKE IPAM calculator open source tool. IP Address Management (IPAM) enables efficient use of IP space/subnets and avoids having overlaps in ranges, which prevents connectivity options down the road. For information on network ranges in a VPC cluster, see Creating a VPC-native cluster.

Tenants that require further isolation for resources that run outside the shared clusters (such as dedicated Compute Engine VMs) may use their own VPC, which is peered to the Shared VPC run by the networking team. This provides additional security at the cost of increased complexity and numerous other limitations. For more information on peering, see Using VPC Network Peering. In the example below, all tenants have chosen to share a single (per-environment) tenant VPC.

Creating reliable and highly available clusters

Design your cluster architecture for high availability and reliability by implementing the following recommendations:

  • Create one cluster admin project per cluster to reduce the risk of project-level configurations (for example, Cloud IAM bindings) adversely affecting many clusters ("blast radius"), and to help provide separation for quota and billing. Cluster admin projects are separate from tenant projects, which individual tenants use to manage, for example, their Google Cloud resources.
  • Make the production cluster private to disable access to the nodes and manage access to the control plane. We also recommend using private clusters for development and staging environments.
  • Ensure the control plane for the cluster is regional to provide high availability for multi-tenancy; any disruptions to the control plane will impact tenants. Please note, there are cost implications with running regional clusters.
  • Ensure the nodes in your cluster span at least three zones to achieve zonal reliability. For information about the cost of egress between zones in the same region, see the network pricing documentation.
A private regional cluster with a regional control plane running in three zones
Figure 3: A private regional cluster with a regional control plane running in three zones.

Autoscale cluster nodes and resources

To accommodate the demands of your tenants, automatically scale nodes in your cluster by enabling autoscaling. Autoscaling helps systems appear responsive and healthy when heavy workloads are deployed by various tenants in their namespaces, or to respond to zonal outages.

When you enable autoscaling, you specify the minimum and maximum number of nodes in a cluster based on the expected workload sizes. By specifying the maximum number of nodes, you can ensure there is enough space for all Pods in the cluster, regardless of the namespace they run in. Cluster autoscaling rescales node pools based on the min/max boundary, helping to reduce operational costs when the system load decreases, and avoid Pods going into a pending state when there aren't enough available cluster resources. To determine the maximum number of nodes, identify the maximum amount of CPU and memory that each tenant requires, and add those amounts together to get the total capacity that the cluster should be able to handle if all tenants were at the limit. Using the maximum number of nodes, you can then choose instance sizes and counts, taking into consideration the IP subnet space made available to the cluster.

Use Pod autoscaling to automatically scale Pods based on resource demands. Vertical Pod Autoscaling (VPA) is used to scale CPU/memory to existing Pods and Horizontal Pod Autoscaler (HPA) scales the number of Pod replicas based on CPU/memory utilization or custom metrics. Do not use VPA with HPA on the same Pods unless scaling is based on different metrics.

Determine the size of your cluster

When determining the size of your cluster, here are some important factors to consider:

  • The sizing of your cluster is dependent on the type of workloads you plan to run. If your workloads have greater density, the cost efficiency is higher but there is also a greater chance for resource contention.
  • The minimum size of a cluster is defined by the number of zones it spans: one node for a zonal cluster and three nodes for a regional cluster.
  • Per project, there is a maximum of 50 clusters per zone, plus 50 regional clusters per region.
  • Per cluster, there is a maximum of 5,000 nodes per cluster, 1,000 nodes per node pool, 1,000 nodes per cluster (if you use the GKE ingress controller), 110 Pods per node, and 300,000 containers.

Learn more about quotas for projects and clusters in Choose Size and Scope of Google Kubernetes Engine Clusters.

Schedule maintenance windows

To reduce downtimes during cluster/node upgrades and maintenance, schedule maintenance windows to occur during off-peak hours. During upgrades, there can be temporary disruptions when workloads are moved to recreate nodes. To ensure minimal impact of such disruptions, schedule upgrades for off-peak hours and design your application deployments to handle partial disruptions seamlessly, if possible.

Set up HTTP(S) Load Balancing with Ingress

To help with the management of your tenants' published Services and the management of incoming traffic to those Services, create an HTTP(s) load balancer to allow a single ingress per cluster, where each tenant's Services are registered with the cluster's Ingress resource. You can create and configure an HTTP(S) load balancer by creating a Kubernetes Ingress resource, which defines how traffic reaches your Services and how the traffic is routed to your tenant's application. By registering Services with the Ingress resource, the Services' naming convention becomes consistent, showing a single ingress, such as tenanta.example.com and tenantb.example.com.

Securing the cluster for multi-tenancy

Control Pod communication with network policies

To control network communication between Pods in each of your cluster's namespaces, create network policies based on your tenants' requirements. As an initial recommendation, you should block traffic between namespaces that host different tenants' applications. Your cluster administrator can apply a deny-all network policy to deny all ingress traffic to avoid Pods from one namespace accidentally sending traffic to Services or databases in other namespaces.

As an example, here's a network policy that restricts ingress from all other namespaces to the tenant-a namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: tenant-a
spec:
  podSelector:
    matchLabels:

  ingress:
  - from:
    - podSelector: {}

Run workloads with GKE Sandbox

Clusters that run untrusted workloads are more exposed to security vulnerabilities than other clusters. Using GKE Sandbox, you can harden the isolation boundaries between workloads for your multi-tenant environment. For security management, we recommend starting with GKE Sandbox and then using Pod security policies to fill in any gaps.

GKE Sandbox is based on gVisor, an open source container sandboxing project, and provides additional isolation for multi-tenant workloads by adding an extra layer between your containers and host OS. Container runtimes often run as a privileged user on the node and have access to most system calls into the host kernel. In a multi-tenant cluster, one malicious tenant can gain access to the host kernel and to other tenant's data. GKE Sandbox mitigates these threats by reducing the need for containers to interact with the host by shrinking the attack surface of the host and restricting the movement of malicious actors.

GKE Sandbox provides two isolation boundaries between the container and the host OS:

  • A user-space kernel, written in Go, that handles system calls and limits interaction with the host kernel. Each Pod has its own isolated user-space kernel.
  • The user-space kernel also runs inside namespaces and seccomp filtering system calls.

Create Pod security policies

To prevent Pods from running in a cluster, create a Pod Security Policy (PSP), which specifies conditions that Pods must meet in a cluster. You implement Pod Security Policy control by enabling the admission controller and by authorizing the target Pod's service account to use the policy. You can authorize the use of policies for a Pod in Kubernetes Role-Based Access Control (RBAC) by binding the Pod's serviceAccount to a role that has access to use the policies.

When defining a PSP, we recommend defining the most restrictive policy bound to system:authenticated and more permissive policies bound as needed for exceptions.

As an example, here's a restrictive PSP that requires users to run as unprivileged users, blocks possible escalations to root, and requires the use of several security mechanisms:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  # Required to prevent escalations to root.
  allowPrivilegeEscalation: false
  # The following is redundant with non-root + disallow privilege
  # escalation, but we can provide it for defense in depth.
  requiredDropCapabilities:
    - ALL
  # Allow core volume types.
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    # Assume that persistentVolumes set up by the cluster admin
    # are safe to use.
    - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    # Require the container to run without root privileges.
    rule: 'MustRunAsNonRoot'
  seLinux:
    # Assumes the nodes are using AppArmor rather than SELinux.
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535

Set the following parameters to avoid privilege escalations on the containers:

  • To ensure that no child process of a container can gain more privileges than its parent, set the allowPrivilegeEscalation parameter to false.
  • To disallow escalation privileges outside of the container, disable access to the components of the Host namespaces (hostNetwork, hostIPC, and hostPID). This also blocks snooping on network activity of other Pods on the same node.

Use Workload Identity to grant access to Google Cloud services

To securely grant workloads access to Google Cloud services, enable Workload Identity in the cluster. Workload Identity helps administrators manage Kubernetes service accounts that Kubernetes workloads use to access Google Cloud services. When you create a cluster with Workload Identity enabled, an Identity Namespace is established for the project that the cluster is housed in. The Identity Namespace allows the cluster to automatically authenticate service accounts for GKE applications by mapping the Kubernetes service account name to a virtual Google service account handle, which is used for Cloud IAM binding of tenant Kubernetes service accounts.

Restrict network access to the control plane

To protect your control plane, restrict access to authorized networks. In GKE, when you enable master authorized networks, you can whitelist up to 50 CIDR ranges and allow IP addresses only in those ranges to access your control plane. GKE already uses Transport Layer Security (TLS) and authentication to provide secure access to your cluster master endpoint from the public internet. By using authorized networks, you can further restrict access to specified sets of IP addresses.

Tenant provisioning

Create tenant projects

To host a tenant's non-cluster resources, create a service project for each tenant. These service projects contain logical resources specific to the tenant applications (for example, logs, monitoring, storage buckets, service accounts, etc.). All tenant service projects are connected to the Shared VPC in the tenant host project.

Use RBAC to refine tenant access

Define finer-grained access to cluster resources for your tenants by using Kubernetes RBAC. On top of the read-only access initially granted with Cloud IAM to tenant groups, define namespace-wide Kubernetes RBAC roles and bindings for each tenant group.

Earlier we identified two tenant groups: tenant admins and tenant developers. For those groups, we define the following RBAC roles and access:

Group Kubernetes
RBAC role
Description
Tenant Admin namespace admin

Grants access to list and watch deployments in their namespace.

Grants access to add and remove users in the tenant group.

Tenant Developer namespace admin,
namespace viewer
Grants access to create/edit/delete Pods, deployments, Services, configmaps in their namespace.

In addition to creating RBAC roles and bindings that assign G Suite or Cloud Identity groups various permissions inside their namespace, Tenant admins often require the ability to manage users in each of those groups. Based on your organization's requirements, this can be handled by either delegating G Suite or Cloud Identity permissions to the Tenant admin to manage their own group membership or by the Tenant admin engaging with a team in your organization that has G Suite or Cloud Identity permissions to handle those changes.

Use Google Groups to bind permissions

To efficiently manage tenant permissions in a cluster, you can bind RBAC permissions to your Google Groups. The membership of those groups are maintained by your G Suite administrators, so your cluster administrators do not need detailed information about your users.

As an example, we have a Google Group named tenant-admins@mydomain.com and a user named admin1@mydomain.com is a member of that group, the following binding provides the user with admin access to the tenant-a namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: tenant-a
  name: tenant-admin-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: tenant-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: "tenant-admins@mydomain.com"

Create namespaces

To provide a logical isolation between tenants that are on the same cluster, implement namespaces. As part of the Kubernetes RBAC process, the cluster admin creates namespaces for each tenant group. The Tenant admin manages users (tenant developers) within their respective tenant namespace. Tenant developers are then able to use cluster and tenant specific resources to deploy their applications.

Avoid reaching namespace limits

The theoretical maximum number of namespaces in a cluster is 10,000, though in practice there are many factors that could prevent you from reaching this limit. For example, you might reach the cluster-wide maximum number of Pods (150,000) and nodes (5,000) before you reach the maximum number of namespaces; other factors (such as the number of Secrets) can further reduce the effective limits. As a result, a good initial rule of thumb is to only attempt to approach the theoretical limit of one constraint at a time, and stay approximately one order of magnitude away from the other limits, unless experimentation shows that your use cases work well. If you need more resources than can be supported by a single cluster, you should create more clusters. For information about Kubernetes scalability, see the Kubernetes Scalability thresholds article.

Standardize namespace naming

To ease deployments across multiple environments that are hosted in different clusters, standardize the namespace naming convention you use. For example, avoid tying the environment name (development, staging, and production) to the namespace name and instead use the same name across environments. By using the same name, you avoid having to change the config files across environments.

Create service accounts for tenant workloads

Create a tenant-specific Google service account for each distinct workload in a tenant namespace. This provides a form of security, ensuring that tenants can manage service accounts for the workloads that they own/deploy in their respective namespaces. The Kubernetes service account for each namespace is mapped to one Google service account by using Workload Identity.

Enforce resource quotas

To ensure all tenants that share a cluster have fair access to the cluster resources, enforce resources quotas. Create a resource quota for each namespace based on the number of Pods deployed by each tenant, and the amount of memory and CPU required by each Pod.

The following example defines a resource quota where Pods in the tenant-a namespace can request up to 16 CPU and 64 GB of memory, and the maximum CPU is 32 and the maximum memory is 72 GB.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-a
spec:
  hard: "1"
    requests.cpu: "16"
    requests.memory: 64Gi
    limits.cpu: "32"
    limits.memory: 72Gi

Monitoring, logging and usage

Track usage metrics

To obtain cost breakdowns on individual namespaces and labels in a cluster, you can enable GKE usage metering. GKE usage metering tracks information about resource requests and resource usage of a cluster's workloads, which you can further break down by namespaces and labels. With GKE usage metering, you can approximate the cost breakdown for departments/teams that are sharing a cluster, understand the usage patterns of individual applications (or even components of a single application), help cluster admins triage spikes in usage, and provide better capacity planning and budgeting.

When you enable GKE usage metering on the multi-tenant cluster, resource usage records are written to a BigQuery table. You can export tenant-specific metrics to BigQuery datasets in the corresponding tenant project, which auditors can then analyze to determine cost breakdowns. Auditors can visualize GKE usage metering data by creating dashboards with plug-and-play Google Data Studio templates.

Provide tenant-specific logs

You can provide tenants with logs data specific to their project workloads by using Kubernetes Engine Monitoring. Kubernetes Engine Monitoring manages both the Cloud Monitoring and Cloud Logging services together and provides a dashboard customized for GKE clusters. To create tenant- specific logs, the cluster admin creates a sink to export log entries to BigQuery datasets, filtered by tenant namespace. The exported data in BigQuery can then be accessed by the tenants.

Provide tenant-specific monitoring

To provide tenant-specific monitoring, the cluster admin can use a dedicated namespace that contains a Prometheus to Stackdriver adapter (prometheus-to-sd) with a per namespace config. This configuration ensures tenants can only monitor their own metrics in their projects. However, the downside to this design is the extra cost of managing your own Prometheus deployment(s).

Here are other options you could consider for providing tenant-specific monitoring:

  • Teams accept shared tenancy within the Cloud Monitoring environment and allow tenants to have visibility into all metrics in the project.
  • Deploy a single Grafana instance per tenant, which communicates with the shared Cloud Monitoring environment. Configure the Grafana instance to only view the metrics from a particular namespace. The downside to this option is the cost and overhead of managing these additional deployments of Grafana.

Checklist summary

The following table summarizes the tasks that are recommended for creating multi-tenant clusters in an enterprise organization:

Area Tasks
Organizational setup
Identity and access management
Networking
High availability and reliability
Security
Logging and monitoring

What's next