Optimizing resource usage in a multi-tenant GKE cluster using node auto-provisioning

This tutorial shows how to use node auto-provisioning to scale a multi-tenant Google Kubernetes Engine (GKE) cluster, and how to use Workload Identity to control tenant access to resources like Cloud Storage buckets. This guide is for developers and architects; it assumes basic knowledge of Kubernetes and GKE. If you need an introduction, see GKE overview.

Cluster multi-tenancy is often implemented to reduce costs or to standardize operations across tenants. To fully realize cost savings, you should size your cluster so that cluster resources are used efficiently. You should also minimize resource waste when your cluster is autoscaled by making sure that cluster nodes that are added are of an appropriate size.

In this tutorial, you use node auto-provisioning to scale the cluster. Node auto-provisioning can help optimize your cluster resource usage, and therefore control your costs, by adding cluster nodes that best fit your pending workloads.

Objectives

Create a GKE cluster that has node auto-provisioning and Workload Identity enabled.
Set up the cluster for multi-tenancy.
Submit jobs to the cluster to demonstrate how node auto-provisioning creates and destroys nodes of optimized sizes.
Use taints and labels to instruct node auto-provisioning to create dedicated node pools for each tenant.
Use Workload Identity to control access to tenant-specific resources like Cloud Storage buckets.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In Cloud Shell, enable the APIs for GKE and Cloud Build API:
```
gcloud services enable container.googleapis.com \
    cloudbuild.googleapis.com
```
This operation can take a few minutes to complete.

Preparing your environment

In this section, you get the code you need for this tutorial and you set up your environment with values that you use throughout the tutorial.

In Cloud Shell, define the environment variables that you use for this tutorial:
```
export PROJECT_ID=$(gcloud config get-value project)
```

Clone the GitHub repository that contains the code for this tutorial:

git clone https://github.com/GoogleCloudPlatform/solutions-gke-autoprovisioning

Change to the repository directory:
```
cd solutions-gke-autoprovisioning
```
Update the Kubernetes YAML job configuration file with your Google project ID:
```
sed -i "s/MY_PROJECT/$PROJECT_ID/" manifests/bases/job/base-job.yaml
```
Submit a Cloud Build job to build a container image:
```
gcloud builds submit pi/ --tag gcr.io/$PROJECT_ID/generate-pi
```
The image is a Go program that generates an approximation of pi. You use this container image later.

Cloud Build exports the image to your project's Container Registry.

Creating a GKE cluster

In this section, you create a GKE cluster that has node auto-provisioning and workload identity enabled. Note the following details of the cluster creation process:

You specify CPU and memory limits for the cluster. Node auto-provisioning respects these limits when it adds or removes nodes from the cluster. For more information, see Enabling node auto-provisioning in the GKE documentation.
You specify the default service account and scopes that are used by the nodes within the auto-provisioned node pools. Using these settings, you can control the provisioned node's access permissions. For more information, see Setting identity defaults for auto-provisioned nodes in the GKE documentation.
You set an autoscaling profile that prioritizes utilization. This profile tells the cluster autoscaler to quickly scale down the cluster to minimize unused resources. This can help with resource efficiency for batch or job-centric workloads. The setting applies to all node pools in the cluster.
You enable Workload Identity by specifying the workload pool.

To create the cluster:

Create a service account:
```
gcloud iam service-accounts create nap-sa
```
This service account is used by the auto-provisioned nodes.

Grant the new service account permissions to pull images from the Cloud Storage bucket that's used by Container Registry:

gsutil iam ch \
    serviceAccount:nap-sa@$PROJECT_ID.iam.gserviceaccount.com:objectViewer \
    gs://artifacts.$PROJECT_ID.appspot.com

Create a GKE cluster that has node auto-provisioning and workload identity enabled:

gcloud container clusters create multitenant \
    --release-channel=regular \
    --zone=us-central1-c \
    --num-nodes=2 \
    --machine-type=n1-standard-2 \
    --workload-pool=${PROJECT_ID}.svc.id.goog \
    --autoscaling-profile=optimize-utilization \
    --enable-autoprovisioning \
    --autoprovisioning-service-account=nap-sa@${PROJECT_ID}.iam.gserviceaccount.com \
    --autoprovisioning-scopes=\
https://www.googleapis.com/auth/devstorage.read_write,\
https://www.googleapis.com/auth/cloud-platform \
    --min-cpu 1 \
    --min-memory 1 \
    --max-cpu 50 \
    --max-memory 256 \
    --enable-network-policy \
    --enable-ip-alias

Set the default cluster name and compute zone:

gcloud config set container/cluster multitenant
gcloud config set compute/zone us-central1-c

Setting up the cluster for multi-tenancy

When you operate a multi-tenant software-as-a-service (SaaS) app, you typically should separate your tenants. Separating tenants can help minimize any damage from a compromised tenant. It can also help you allocate cluster resources evenly across tenants, and track how many resources each tenant is consuming. Kubernetes cannot guarantee perfectly secure isolation between tenants, but it does offer features that might be sufficient for specific use cases. For more information about GKE multi-tenancy features, see the overview and best practices guides in the GKE documentation.

In the example app, you create two tenants, tenant1 and tenant2. You separate each tenant and its Kubernetes resources into its own namespace. You create a simple network policy that enforces tenant isolation by preventing communication from other namespaces. Later, you use node taints and nodeSelector fields to prevent Pods from different tenants from being scheduled on the same node. You can provide an additional degree of separation by running tenant workloads on dedicated nodes.

You use Kustomize to manage the Kubernetes manifests that you submit to the cluster. Kustomize lets you combine and customize YAML files for multiple purposes.

Create a namespace, a service account, and a network policy resource for tenant1:

kubectl apply -k manifests/setup/tenant1

The output looks like the following:

namespace/tenant1-ns created
serviceaccount/tenant1-ksa created
networkpolicy.networking.k8s.io/tenant1-deny-from-other-namespaces created

Create the cluster resources for tenant2:

kubectl apply -k manifests/setup/tenant2

Verifying the behavior of node auto-provisioning

A GKE cluster consists of one of more node pools. All nodes within a node pool have the same machine type, which means that they have the same amount of CPU and memory. If your workload resource demands are variable, you might benefit from having multiple node pools that have different machine types within your cluster. In this way, the cluster autoscaler can add nodes of the most suitable type, which can improve your resource efficiency and therefore lower costs. However, maintaining many node pools adds management overhead. It also might not be practical in a multi-tenant cluster if you want to execute tenant workloads in dedicated node pools.

Instead, you can use node auto-provisioning to extend the cluster autoscaler. When node auto-provisioning is enabled, the cluster autoscaler can create new node pools automatically based on the specifications of pending Pods. As a result, the cluster autoscaler can create nodes of the most suitable type, but you don't have to create or manage the node pools yourself. Using node auto-provisioning, your cluster can efficiently autoscale without over-provisioning, which can help lower your costs.

Furthermore, if pending Pods have workload separation constraints, node auto-provisioning can create nodes that satisfy the constraints. In this way, you can use node auto-provisioning to automatically create node pools that will be used by only a single tenant.

In this section, you submit various jobs to the cluster to verify the behavior of node auto-provisioning. The jobs use the generate-pi image that you created earlier.

Submit a simple job

First, you submit a simple job to the cluster. The job does not specify any tenant-specific constraints. There is enough spare capacity in the cluster to handle the job's CPU and memory requests. Therefore, you expect the job to be scheduled into one of the existing nodes in the default node pool. No additional nodes are provisioned.

List the node pools in the cluster:
```
gcloud container node-pools list
```
You see a single default pool.
Print the job's configuration to the console:
```
kubectl kustomize manifests/jobs/simple-job/
```
The output looks like the following:
```
apiVersion: batch/v1
kind: Job
metadata:
name: pi-job
spec:
...
```
The configuration does not specify any node taints or selectors.

Submit the job:

kubectl apply -k manifests/jobs/simple-job/

Watch the node pools in the cluster:
```
watch -n 5 gcloud container node-pools list
```
You still see a single default pool. No new node pools are created.
After about 30 seconds, press Control+C to stop watching the node pools.
Watch the nodes in the cluster:
```
kubectl get nodes -w
```
You do not see any new nodes being created.
After watching for 1 minute, press Control+C to stop watching.
List the jobs in the cluster:
```
kubectl get jobs --all-namespaces
```
The output looks like the following:
```
NAMESPACE   NAME     COMPLETIONS   DURATION   AGE
default     pi-job   1/1           14s        21m
```
The 1/1 value in the Completions column indicates that 1 job out of a total of 1 jobs has completed.

Submit a job that has tenant-specific constraints

In this section, you submit another job to confirm that node auto-provisioning obeys workload separation constraints. The job configuration includes a tenant-specific node selector and a tenant-specific toleration. The job can be scheduled only onto a node that has labels that match the selector's key-value pairs. A toleration works in conjunction with node taints, which also limit which jobs can be scheduled onto a node. A best practice with node auto-provisioning is to include both a node selector and a toleration for workload separation.

This job cannot be scheduled into the default node pool, because that pool does not have any nodes that satisfy the selector constraint. Therefore, node auto-provisioning creates a new node pool with node labels that satisfy the selector requirement. Node auto-provisioning also adds a tenant-specific taint to the nodes that matches the toleration in the job configuration. Only Pods that have a matching toleration can be scheduled onto the nodes in the pool, which lets you further separate tenant workloads.

List the node pools in the cluster:
```
gcloud container node-pools list
```
You see a single default pool.
Print the job's configuration to the console:
```
kubectl kustomize manifests/jobs/one-tenant/
```
The configuration includes a tenant-specific node selector requirement and a toleration. The output looks like the following:
```
apiVersion: batch/v1
kind: Job
metadata:
name: tenant1-pi-job
spec:
...
```

Submit the job:

kubectl apply -k manifests/jobs/one-tenant/

Watch the node pools in the cluster:
```
watch -n 5 gcloud container node-pools list
```
After some time, you see a new node pool. The output looks like the following:
```
NAME                            MACHINE_TYPE       DISK_SIZE_GB
default-pool                    n1-standard-2      100
nap-n1-standard-1-15jwludl      n1-standard-1      100
```
The node pool name is prefixed with nap-, which indicates that it was created by node auto-provisioning. The node pool name also includes the machine type of the nodes in the pool, for example, n1-standard-1.
Watch the nodes in the cluster:
```
kubectl get nodes -w
```
After about a minute, you see a new node appear in the list. The node name includes the name of the nap- node pool. The new node initially has a Not Ready status. After some time, the status of the new node changes to Ready, which means the node can now accept pending work.
To stop watching the nodes, press Control+C.
List the node taints:
```
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
```
You see that the new node has a NoSchedule taint for the key-value pair tenant: tenant1. Therefore, only Pods that have a corresponding toleration for tenant: tenant1 can be scheduled onto the node.
Watch the jobs in the cluster:
```
kubectl get jobs -w --all-namespaces
```
After some time, you see that tenant1-pi-job has 1/1 completion, which indicates that it finished successfully.
To stop watching the jobs, press Control+C.
Watch the node pools in the cluster:
```
watch -n 5 gcloud container node-pools list
```
After some time, you see that the nap- pool is deleted, and the cluster once again has only the single default node pool. Node auto-provisioning has deleted the nap- node pool, because there is no more pending work that matches the pool's constraints.
To stop watching the node pools, press Control+C.

Submit two larger jobs that have tenant constraints

In this section, you submit two jobs that have tenant-specific constraints, and you also increase the resource requests for each job. Once again, these jobs cannot be scheduled into the default node pool due to the node selector constraints. Because each job has its own selector constraint, node auto-provisioning creates two new node pools. In this way, you can use node auto-provisioning to keep the tenant jobs separated. Because the jobs have a higher number of resource requests compared to the previous job, node auto-provisioning creates node pools that have larger machine types than last time.

List the node pools in the cluster:
```
gcloud container node-pools list
```
You see a single default pool.
Print the combined configuration:
```
kubectl kustomize manifests/jobs/two-tenants/
```
The configuration includes two separate jobs, each with a tenant-specific node selector and toleration, and with increased resource requests.

The output looks like the following:
```
apiVersion: batch/v1
kind: Job
metadata:
name: tenant1-larger-pi-job
spec:
...
```

Submit the jobs:

kubectl apply -k manifests/jobs/two-tenants/

Watch the node pools in the cluster:
```
watch -n 5 gcloud container node-pools list
```
After some time, you see two additional node pools. The output looks like the following:
```
NAME                            MACHINE_TYPE       DISK_SIZE_GB
default-pool                    n1-standard-2      100
nap-n1-standard-2-6jxjqobt      n1-standard-2      100
nap-n1-standard-2-z3s06luj      n1-standard-2      100
```
The node pool names are prefixed with nap-, which indicates that they were created by node auto-provisioning. The node pool names also include the machine type of the nodes in the pool, for example, n1-standard-2.
To stop watching the nodes, press Control+C.
Watch the nodes in the cluster:
```
kubectl get nodes -w
```
After about a minute, you see two new nodes appear in the list. The node names include the name of their associated nap- node pool. The new nodes initially have a Not Ready status. After some time, the status of the new nodes changes to Ready, which means that the nodes can now accept pending work.
To stop watching the nodes, press Control+C.
List the node taints:
```
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
```
You see that the new nodes have NoSchedule taints, one with the key-value pair tenant: tenant1, and the other with tenant: tenant2. Only Pods that have corresponding tenant tolerations can be scheduled onto the nodes.
Watch the jobs in the cluster:
```
kubectl get jobs -w --all-namespaces
```
After some time, you see that tenant1-larger-pi-job and tenant2-larger-pi-job change to have 1/1 completion each, which indicates that the jobs finished successfully.
To stop watching the jobs, press Control+C.
Watch the node pools in the cluster:
```
watch -n 5 gcloud container node-pools list
```
After some time, you see that both nap- pools are deleted, and the cluster once again has only a single default node pool. Node auto-provisioning has deleted the nap- node pools, because there is no more pending work that matches the pools constraints.
To stop watching the node pools, press Control+C.

Controlling access to Google Cloud resources

In addition to maintaining separation of tenants within the cluster, you typically want to control tenant access to Google Cloud resources such as Cloud Storage buckets or Pub/Sub topics. For example, each tenant might require a Cloud Storage bucket that shouldn't be accessible by other tenants.

Using Workload Identity, you can create a mapping between Kubernetes service accounts and Google Cloud service accounts. You can then assign appropriate Identity and Access Management (IAM) roles to the Google Cloud service account. In this way, you can enforce the principle of least privilege so that tenant jobs can access their assigned resources, but they're prevented from accessing the resources that are owned by other tenants.

Set up GKE workload identity

Configure the mapping between your Kubernetes service account and a Google Cloud service account that you create.

Create a Google Cloud service account for tenant1:
```
gcloud iam service-accounts create tenant1-gsa
```

Grant the Kubernetes service account for tenant1 IAM permissions to use the corresponding Google Cloud service account for tenant1:

gcloud iam service-accounts add-iam-policy-binding \
    tenant1-gsa@${PROJECT_ID}.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:${PROJECT_ID}.svc.id.goog[tenant1-ns/tenant1-ksa]"

Complete the mapping between the service accounts by annotating the Kubernetes service account with the Google Cloud service account:

kubectl annotate serviceaccount tenant1-ksa -n tenant1-ns \
    iam.gke.io/gcp-service-account=tenant1-gsa@${PROJECT_ID}.iam.gserviceaccount.com

Submit a job that writes to a Cloud Storage bucket

In this section, you confirm that a job that's executing as a particular Kubernetes service account can use the IAM permissions of its mapped Google Cloud service account.

Create a new Cloud Storage bucket for tenant1:
```
export BUCKET=tenant1-$PROJECT_ID
gsutil mb -b on -l us-central1 gs://$BUCKET
```
You use your project ID as a suffix on the bucket name to make the name unique.

Update the job's configuration file to use the Cloud Storage bucket:

sed -i "s/MY_BUCKET/$BUCKET/" \
    manifests/jobs/write-gcs/bucket-write.yaml

Grant the tenant1 service account permissions to read and write objects in the bucket:

gsutil iam ch \
    serviceAccount:tenant1-gsa@$PROJECT_ID.iam.gserviceaccount.com:objectAdmin \
    gs://$BUCKET

Print the job configuration:
```
kubectl kustomize manifests/jobs/write-gcs/
```
The output looks like the following:
```
apiVersion: batch/v1
kind: Job
metadata:
name: tenant1-pi-job-gcs
spec:
...
```
The new bucket name is passed as an argument to the generate-pi container, and the job specifies the appropriate tenant1-ksa Kubernetes service account.
Submit the job:
```
kubectl apply -k manifests/jobs/write-gcs/
```
As in the previous section, node auto-provisioning creates a new node pool and a new node to execute the job.
Watch the job's Pod:
```
kubectl get pods -n tenant1-ns -w
```
In this case, you watch the Pod rather than watching the node pool. You see the Pod transition through different statuses. After a couple of minutes, the status changes to Completed. This status indicates that the job has successfully finished.
To stop watching, press Control+C.
Confirm that a file has been written to the Cloud Storage bucket:
```
gsutil ls -l gs://$BUCKET
```
You see a single file.
To clean up, delete the job:
```
kubectl delete job tenant1-pi-job-gcs -n tenant1-ns
```
You will resubmit this job in the next section.

Revoke IAM permissions

Finally, you confirm that revoking IAM permissions from the Google Cloud service account prevents the mapped Kubernetes service account from accessing the Cloud Storage bucket.

Revoke the Google Cloud service account's permissions to write to the Cloud Storage bucket:

gsutil iam ch -d \
    serviceAccount:tenant1-gsa@$PROJECT_ID.iam.gserviceaccount.com:objectAdmin \
    gs://$BUCKET

Submit the same job as previously:

kubectl apply -k manifests/jobs/write-gcs/

Once again watch the job's Pod status:
```
kubectl get pods -n tenant1-ns -w
```
After a couple of minutes, the status changes to Error, which indicates that the job failed. This error is expected, because the job is executing as a Kubernetes service account that maps to a Google Cloud service account that in turn no longer has write permissions to the Cloud Storage bucket.
To stop watching the Pod, press Control+C.
List the files in the bucket:
```
gsutil ls -l gs://$BUCKET
```
You see a single file in the bucket; a new file hasn't been written.

Clean up

The easiest way to eliminate billing is to delete the Google Cloud project you created for the tutorial.

Delete the project

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the GKE cluster

If you don't want to delete the project, delete the GKE cluster:

gcloud container clusters delete multitenant

What's next

Learn more about GKE multi-tenancy.
Explore the cluster autoscaler.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.