Preparing a Google Kubernetes Engine Environment for Production

This solution provides a blueprint and methodology for onboarding your workloads more securely, reliably, and cost-effectively to Google Kubernetes Engine. It provides guidance for configuring administrative and network access to clusters. This article assumes a working understanding of Kubernetes resources and cluster administration as well as familiarity with Google Cloud networking features.

Structuring projects, Virtual Private Cloud (VPC) networks, and clusters

The following diagram shows the best structure for projects, VPC networks, regions, subnetworks, zones, and clusters.

Project, Network, and Cluster Structure


Google Cloud creates all of its resources within a project entity. Projects are the unit of billing and allow administrators to associate Cloud Identity and Access Management (Cloud IAM) roles with users. When roles are applied at the project level, they apply to all resources encapsulated within the project.

You should use projects to encapsulate your various operating environments. For example, you might have production and staging projects for operations teams as well as a test-dev project for developers. You can apply more granular and strict policies to the projects that hold your most mission-critical and sensitive data and workloads while applying permissive and flexible policies for developers in the test-dev environment to experiment.


A project may contain multiple clusters. If you have multiple workloads to deploy, you can choose to use either a single, shared cluster or separate clusters for these workloads. To help you decide, consider our best practices on choosing the size and scope of a GKE cluster.

Networks and subnetworks

Within each project, you can have one or more VPC networks, which are virtual versions of physical networks. Each VPC network is a global resource that contains other networking-related resources, such as subnetworks, external IP addresses, firewall rules, routes, your VPN, and Cloud Router. Within a VPC network, you can use subnetworks, which are regional resources, to isolate and control traffic into or out of each region between your GKE clusters.

Each project comes with a single default network. You can create and configure an additional network to map to your existing IP address management (IPAM) convention. You can then apply firewall rules to this network to filter traffic to and from your GKE nodes. By default, all internet traffic to your GKE nodes is denied.

To control communication between subnetworks, you need to create firewall rules that allow traffic to pass between the subnetworks. Use the --tags flag during cluster or node-pool creation to appropriately tag your GKE nodes for the firewall rules to take effect. You can also use tags to create routes between your subnetworks if needed.

Multi-zone and regional clusters

By default, a cluster creates its cluster master and its nodes in a single zone that you specify at the time of creation. You can improve your clusters' availability and resilience by creating multi-zone or regional clusters. Multi-zone and regional clusters distribute Kubernetes resources across multiple zones within a region.

Multi-zone clusters:

  • Create a single cluster master in one zone.
  • Create nodes in multiple zones.

Regional clusters:

  • Create three cluster masters across three zones.
  • By default, create nodes in three zones, or in as many zones as you want.

The primary difference between regional and multi-zone clusters is that regional clusters create three masters and multi-zone clusters create only one. Note that in both cases, you are charged for node-to-node traffic across zones.

You can choose to create multi-zone or regional clusters at the time of cluster creation. You can add new zones to an existing cluster to make it multi-zone. However, you cannot modify an existing cluster to be regional. You also cannot make a regional cluster non-regional.

To learn more about multi-zone and regional clusters, see the GKE documentation.

Managing identity and access

Project-level access

The previous section noted that you can bind IAM roles to users at the project level. In addition to granting roles to individual users, you can also use groups to simplify the application of roles.

Below is an illustration of IAM policy layout that provides the principle of least privilege for a dev project that is set up for developers to develop and test their upcoming features and bug fixes, as well as a prod project for production traffic:

Identity and Access Management

As the following table shows, there are 4 groups of users within the organization with varying levels of permissions, granted through IAM roles across the 2 projects:

Team IAM Role Project Permissions
Developers container.developer dev Can create Kubernetes resources for the existing clusters within the project, is not allowed to create or delete clusters.
Operations container.admin prod Full administrative access to the clusters and Kubernetes resources running within the project.
Security container.viewer
prod Create, modify, and delete firewall rules and SSL certificates as well as view resources that were created within each cluster including the logs of the running pods.
Network network.admin prod Create, modify, and delete networking resources, except for firewall rules and SSL certificates.

In addition to the 3 teams with access to the prod project, an additional service account is given the container.developer role for prod, allowing it to create, list, and delete resources within the cluster. Service accounts can be used to give automation scripts or deployment frameworks the ability to act on your behalf. Deployments to your production project and clusters should go through an automated pipeline.

In the dev project there are multiple developers working on the same application within the same cluster. This is facilitated by namespaces, which the cluster user can create. Each developer can create resources within their own namespace, therefore avoiding naming conflicts. They can also reuse the same YAML configuration files for their deployments so that their configurations stay as similar as possible during development iterations. Namespaces can also be used to create quotas on CPU, memory, and storage usage within the cluster, ensuring that one developer isn't using too many resources within the cluster. The next section discusses restricting users to operating within certain namespaces.

RBAC authorization

GKE clusters running Kubernetes 1.6 and above can take advantage of further restrictions to what users are authorized to do in individual clusters. Cloud IAM can provide users access to full clusters and the resources within them, but Kubernetes Role-Based Access Control (RBAC) allows you to use the Kubernetes API to further constrain the actions users can perform inside their clusters.

With RBAC, cluster administrators apply fine-grained policies to individual namespaces within their clusters or to the cluster as a whole. The Kubernetes command line interface kubectl uses the active credentials from the gcloud tool, allowing cluster admins to map roles to Google Cloud identities (users and service accounts) as subjects in RoleBindings.

For example, in the figure below there are two users, user-a and user-b, who have been granted the config-reader and pod-reader roles on the app-a namespace.

RBAC Authorization

As another example, there are Google Cloud project-level IAM roles that give certain users access to all clusters in a project. In addition, individual namespace- and cluster-level role bindings are added through RBAC to give fine-grained access to resources within particular clusters or namespaces.

IAM Role Bindings

Kubernetes includes some default roles, but as a cluster administrator, you can create your own that map more closely to your organizational needs. Below is an example role that allows users only to view, edit, and update ConfigMaps but not delete them, because the delete verb is not included:

kind: Role
  namespace: default
  name: config-editor
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]

After you have defined roles, you can apply those roles to the cluster or namespace through bindings. Bindings associate roles to their users, groups, or service accounts. Below is an example of binding our previously created role (config-editor) to the user and the development namespace.

kind: RoleBinding
  name: config-editors
  namespace: development
- kind: User
  kind: ClusterRole
  name: config-editor

For more information on RBAC, see the GKE documentation.

Image access and sharing

Images in Container Registry are stored in Cloud Storage. This section discusses two ways to share images. One way is to make the images public, and the other is to share images between projects.

Making images public

You can make images public by making the objects and buckets backing them public. For more detailed instructions, see the Container Registry Access Control documentation.

Accessing images across projects

You can share container images between projects by ensuring that your Kubernetes nodes use a service account. The default service account associated with your project is in the form [PROJECT_ID] After you have this identifier, you can grant it access as a storage.viewer on projects where you want to use the Container Registry. Use a custom service account that has restricted permissions, however, because the default has editor access to the entire project.

To use a different service account for your clusters, provide the service account at cluster or node-pool creation by using the --service-account flag. For example, to use the gke-sa service account in the project my-project:

gcloud container clusters create west --service-account \

Configuring networking

Kubernetes provides the Service abstraction that provides load-balancing and service discovery across sets of pods within a cluster as well as to legacy systems running outside the cluster. The sections below describe best practices for communication between Kubernetes pods and with other systems, including other Kubernetes clusters.

Communicating within the same cluster

Service discovery

Kubernetes allows you to define services that group pods that are running in the cluster based on a set of labels. This group of pods can be discovered within your cluster using DNS. For more information on service discovery in Kubernetes, go to the Connecting Applications with Services documentation.


A cluster-local DNS server, kube-dns, is deployed in each GKE cluster that handles mapping service names to healthy pod IPs. By default, the Kubernetes DNS server returns the service's cluster IP address. This IP address is static throughout the lifetime of the service. When sending traffic to this IP, the iptables on the node will load balance packets across the ready pods that match the selectors of the service. These iptables are programmed automatically by the kube-proxy service running on each node.

If you want service discovery and health monitoring but would rather have the DNS service return you the IPs of pods rather than a virtual IP, you can provision the service with the ClusterIP field set to "None," which makes the service headless. In this case, the DNS server returns a list of A records that map the DNS name of your service to the A records of the ready pods that match the label selectors defined by the service. The records in the response rotate to facilitate spreading load across the various pods. Some client-side DNS resolvers might cache DNS replies, rendering the A record rotation ineffective. The advantages of using the ClusterIP are listed in the Kubernetes documentation.

One typical use case for headless services is with StatefulSets. StatefulSets are well-suited to run stateful applications that require stable storage and networking among their replicas. This type of deployment provisions pods that have a stable network identity, meaning their hostnames can be resolved in the cluster. Although the pod's IP address may change, its hostname DNS entry will be kept up to date and resolvable.

Packet flow: ClusterIP

The following diagram shows the DNS response and packet flow of a standard Kubernetes service. While pod IP addresses are routable from outside the cluster, a service's cluster IP address is only accessible within the cluster. These virtual IP addresses are implemented by doing destination network address translation (DNAT) in each Kubernetes node. The kube-proxy service running on each node keeps forwarding rules up to date on each node that map the cluster IP address to the IP addresses of healthy pods across the cluster. If there is a pod of the service running on the local node, then that pod is used, otherwise a random pod in the cluster is chosen.

Cluster IP Service

For more information on how Service IPs are implemented, go to the Kubernetes documentation. For a deep dive into GKE networking, watch the Next 2017 talk on YouTube:

Headless services

Below is an example of the DNS response and traffic pattern for a headless service. Pod IP addresses are routable through the default GCP subnetwork route tables and are accessed by your application directly.

Example DNS response and traffic pattern for headless service

Network policies

You can use Kubernetes Engine's network policy enforcement to control the communication between your cluster's Pods and Services. To define a network policy on Kubernetes Engine, you can use the Kubernetes Network Policy API to create Pod-level firewall rules. These firewall rules determine which Pods and Services can access one another inside your cluster.

Network policies are a kind of defense indepth that enhances the security of the workloads running on your cluster. For example, you can create a network policy to ensure that a compromised front-end service in your application cannot communicate directly with a billing or accounting service several levels down.

Network policies can also be used to isolate workloads belonging to different tenants. For example, you can provide secure multi-tenancy by defining a tenant-per-namespace model. In such a model, network policy rules can ensure that Pods and Services in a given namespace cannot access other Pods or Services in a different namespace.

To learn more about network policies, see the GKE documentation.

Connecting to a GKE cluster from inside Google Cloud

To connect to your services from outside of your cluster but within the GCP network's private IP space, use internal load balancing. When creating a Service with type: Load Balancer and a Internal annotation in Kubernetes, an internal Network Load Balancer is created in your GCP project and configured to distribute TCP and UDP traffic among pods.

Connecting from inside a cluster to external services

In many cases it is necessary to connect your applications running inside of Kubernetes with a service, database, or application that lives outside of the cluster. You have 3 options, as outlined below.

Stub domains

In Kubernetes 1.6 and above, you can configure the cluster internal DNS service (kube-dns) to forward DNS queries for a certain domain to an external DNS server. This is useful when you have authoritative DNS servers that should be queried for a domain that your Kubernetes pods will need to leverage.

External name services

External name services allow you to map a DNS record to a service name within the cluster. In this case, DNS lookups for the in-cluster service return a CNAME record of your choosing. You should use this if you only have a few records that you want to map back to existing DNS services.

Services without selectors

You can create services without a selector and then manually add endpoints to it to populate service discovery with the correct values. This allows you to use the same service discovery mechanism for your in- cluster services while ensuring that systems without service discovery through DNS are still reachable. While this approach is the most flexible, it also requires the most configuration and maintenance in the long term.

For more information on DNS, go to the Kubernetes DNS Pods and Services documentation page.

Receiving traffic from the internet to your cluster

Traffic from the internet can be directed to your services running in Kubernetes by using two different methods, namely network- or HTTP(s)-load balancing.

Kubernetes services should be created as LoadBalancer type for external TCP/UDP load balancing. Kubernetes creates a Network Load Balancer in your GCP project and maps it to the nodes of your Kubernetes Engine cluster. This is an easy way to get load balancing for your TCP and UDP workloads with minimal configuration. The Network Load Balancer is scoped regionally, so it can only balance traffic against nodes running within the same region.

Network load balancer in region us-west1

For HTTP(S) Load Balancing, you should take advantage of Google's Global HTTP(S) Load Balancer, which can load balance traffic across multiple regions using a single anycast IP address. To balance traffic to a single cluster, you can create an Ingress resource in Kubernetes that allows you to map hostnames and paths to services within your cluster. Kubernetes creates an HTTP(S) Load Balancer and configures it to forward traffic to the nodes in your GKE cluster. For Ingress to work properly, your services must be created with type: NodePort. You can optionally add the '{"ingress": true}' annotation to enable container-native load balancing, which allows the load balancer to direct traffic directly to pods running in your cluster, allowing a more even distribution of traffic.

load-balancing to a single cluster

To balance traffic across multiple regions, we recommend that you create and configure an HTTP(S) Load Balancer to direct traffic to standalone network endpoint groups exposing services in your clusters. For multi-regional load balancing, you must create your services with type: ClusterIP and include the annotation to specify which service and port to associate with a network endpoint group. You must also create a corresponding firewall rule, health check, backend service, and forwarding rule to configure your load balancer to begin routing traffic to your backend services.

load-balancing across multiple regions


GKE nodes are provisioned as instances in Compute Engine. As such, they adhere to the same stateful firewall mechanism as other instances. These firewall rules are applied within your network to instances by using tags. Each node pool receives its own set of tags that you can use in rules. By default, each instance belonging to a node pool receives a tag that identifies a specific Kubernetes Engine cluster that this node pool is a part of. This tag is used in firewall rules that Kubernetes Engine creates automatically for you. You can add your own custom tags at either cluster or node pool creation time using the --tags flag in the gcloud command line.

For example, to allow an internal load balancer to access port 8080 on all your nodes, you would use the following commands:

gcloud compute firewall-rules create \
  allow-8080-fwr --target-tags allow-8080 --allow tcp:8080 \
  --network gke --source-range
gcloud container clusters create my-cluster --tags allow-8080

The following example shows how to tag one cluster so that internet traffic can access nodes on port 30000 while the other cluster is tagged to allow traffic from the VPN to port 40000. This is useful when exposing a service through a NodePort that should only be accessible using privileged networks like a VPN back to a corporate data center, or from another cluster within your project.

tagging two clusters differently

Connecting to an on-premises data center

There are several Cloud Interconnect options for connecting to on-premises data centers. These options are not mutually exclusive, so you may have a combination, based on workload and requirements:

  1. Internet for workloads that aren't data intensive or latency sensitive. Google has more than 100 points of presence (PoPs) connecting to service providers across the world.
  2. Direct Peering for workloads that require dedicated bandwidth, are latency sensitive, and need access to all Google services, including the full suite of Google Cloud products. Direct Peering is a Layer 3 connection, done by exchanging BGP routes, and thus requires a registered ASN.
  3. Carrier Peering is that same as Direct Peering, but done through a service provider. This is a great option if you don't have a registered ASN, or have existing relationships with a preferred service provider.
  4. Cloud VPN is configured over Layer 3 interconnect and internet options (1, 2, and 3), if IPsec encryption is required, or if you want to extend your private network into your private Compute Engine network.

What's next

  • Try out other Google Cloud features for yourself. Have a look at our tutorials.