Preparing a Google Kubernetes Engine environment for production

This solution provides a blueprint and methodology for onboarding your workloads more securely, reliably, and cost-effectively to Google Kubernetes Engine (GKE). It provides guidance for configuring administrative and network access to clusters. This article assumes a working understanding of Kubernetes resources and cluster administration as well as familiarity with Google Cloud networking features.

Structuring projects, Virtual Private Cloud (VPC) networks, and clusters

The following diagram shows an example of a flexible and highly available structure for projects, VPC networks, regions, subnets, zones, and clusters.

Project, network, and cluster structure.

Projects

Google Cloud creates all of its resources within a project entity. Projects are the unit of billing and allow administrators to associate Identity and Access Management (IAM) roles with users. When roles are applied at the project level, they apply to all resources encapsulated within the project.

You should use projects to encapsulate your various operating environments. For example, you might have production and staging projects for operations teams as well as a test-dev project for developers. You can apply more granular and strict policies to the projects that hold your most mission-critical and sensitive data and workloads while applying permissive and flexible policies for developers in the test-dev environment to experiment.

Clusters

A project might contain multiple clusters. If you have multiple workloads to deploy, you can choose to use either a single, shared cluster or separate clusters for these workloads. To help you decide, consider the best practices on choosing the size and scope of a GKE cluster.

Networks and subnets

Within each project, you can have one or more VPC networks, which are virtual versions of physical networks. Each VPC network is a global resource that contains other networking-related resources, such as subnets, external IP addresses, firewall rules, routes, your VPN, and Cloud Router. Within a VPC network, you can use subnets, which are regional resources, to isolate and control traffic into or out of each region between your GKE clusters.

Each project comes with a single default network. You can create and configure an additional network to map to your existing IP address management (IPAM) convention. You can then apply firewall rules to this network to filter traffic to and from your GKE nodes. By default, all internet traffic to your GKE nodes is denied.

To control communication between subnets, you must create firewall rules that allow traffic to pass between the subnets. Use the --tags flag during cluster or node-pool creation to appropriately tag your GKE nodes for the firewall rules to take effect. You can also use tags to create routes between your subnets if needed.

Multi-zone and regional clusters

By default, a cluster creates its cluster master and its nodes in a single zone that you specify at the time of creation. You can improve your clusters' availability and resilience by creating multi-zone or regional clusters. Multi-zone and regional clusters distribute Kubernetes resources across multiple zones within a region.

Multi-zone clusters:

  • Create a single cluster master in one zone.
  • Create nodes in multiple zones.

Regional clusters:

  • Create three cluster masters across three zones.
  • By default, create nodes in three zones, or in as many zones as you want.

The primary difference between regional and multi-zone clusters is that regional clusters create three masters and multi-zone clusters create only one. Note that in both cases, you are charged for node-to-node traffic across zones.

You can choose to create multi-zone or regional clusters at the time of cluster creation. You can add new zones to an existing cluster to make it multi-zone. However, you cannot modify an existing cluster to be regional. You also cannot make a regional cluster non-regional.

The service availability of nodes in your GKE-managed clusters is covered by the Compute Engine Service Level Agreement (SLA). Additionally, the SLA for GKE guarantees a monthly uptime of 99.5% for your Kubernetes cluster masters for zonal clusters and 99.95% for regional clusters.

As of June 6, 2020, GKE charges a cluster management fee of $0.10 per cluster per hour. For details, see the pricing page.

To learn more about multi-zone and regional clusters, see the GKE documentation.

Master authorized networks

An additional security measure that you can enforce in your cluster is to enable master authorized networks. This feature locks down access to the API server from the CIDR ranges that you specify and ensures that only teams within your network can administer your cluster.

When you enable this feature, keep the following in mind:

  • Only 50 CIDR ranges are allowed.
  • If you're using a CI/CD pipeline, ensure that your CI/CD tools have access to the cluster's API server by allowing (whitelisting) their IP addresses or CIDR range.

You can also use this feature in conjunction with Cloud Interconnect or Cloud VPN to enable access to the master node only from within your private data center.

Private clusters

By default, all nodes in a GKE cluster have public IP addresses. A good practice is to create private clusters, which gives all worker nodes only private RFC 1918 IP addresses. Private clusters enforce network isolation, reducing the risk exposure surface for your clusters. Using private clusters means that by default only clients inside your network can access services in the cluster. In order to allow external services to reach services in your cluster, you can use an HTTP(S) load balancer or a network load balancer.

When you want to open up access to the master node outside your VPC network, you can use private clusters with master authorized networks. When you enable master authorized networks, your cluster master endpoint gets two IP addresses, an internal (private) one and a public one. The internal IP address can be used by anything internal to your network that's within the same region. The public IP address of the master can be used by any user or process that's external to your network and that's from an allowed CIDR range or IP address. Private nodes don't have external IP addresses, and therefore by default they don't have outbound internet access. This also implies that by default, your cluster's container runtime can't pull container images from an external container registry, because that requires egress (outbound) connectivity. You can consider hosting your container images in Container Registry and accessing these images using Private Google Access. Alternatively, you can use Cloud NAT or deploy a NAT gateway to provide outbound access for your private nodes.

Additionally, you can use VPC Service Controls to help mitigate the risk of data exfiltration. VPC Service Controls help you protect managed Google Cloud services in one or more projects by letting you define a service perimeter for access to these services. You can give applications that run in your GKE clusters access to reach these managed services by setting up appropriate access levels. You can also use VPC Service Controls to protect the GKE cluster-creation control plane.

Managing identity and access

Project-level access

The previous section noted that you can bind IAM roles to users at the project level. In addition to granting roles to individual users, you can also use groups to simplify the application of roles.

The following illustration of an IAM policy layout shows the principle of least privilege for a dev project that's set up for developers to develop and test their upcoming features and bug fixes, as well as a prod project for production traffic:

Identity and access management.

As the following table shows, there are 4 groups of users within the organization with varying levels of permissions, granted through IAM roles across the 2 projects:

Team IAM Role Project Permissions
Developers container.developer dev Can create Kubernetes resources for the existing clusters within the project, is not allowed to create or delete clusters.
Operations container.admin prod Full administrative access to the clusters and Kubernetes resources running within the project.
Security container.viewer
security.admin
prod Create, modify, and delete firewall rules and SSL certificates as well as view resources that were created within each cluster including the logs of the running Pods.
Network network.admin prod Create, modify, and delete networking resources, except for firewall rules and SSL certificates.

In addition to the 3 teams with access to the prod project, an additional service account is given the container.developer role for prod, allowing it to create, list, and delete resources within the cluster. Service accounts can be used to give automation scripts or deployment frameworks the ability to act on your behalf. Deployments to your production project and clusters should go through an automated pipeline.

In the dev project there are multiple developers working on the same application within the same cluster. This is facilitated by namespaces, which the cluster user can create. Each developer can create resources within their own namespace, therefore avoiding naming conflicts. They can also reuse the same YAML configuration files for their deployments so that their configurations stay as similar as possible during development iterations. Namespaces can also be used to create quotas on CPU, memory, and storage usage within the cluster, ensuring that one developer isn't using too many resources within the cluster. The next section discusses restricting users to operating within certain namespaces.

Role-based access control (RBAC)

GKE clusters running Kubernetes 1.6 and later can take advantage of further restrictions to what users are authorized to do in individual clusters. IAM can provide users access to full clusters and the resources within them, but Kubernetes Role-Based Access Control (RBAC) allows you to use the Kubernetes API to further constrain the actions users can perform inside their clusters.

With RBAC, cluster administrators apply fine-grained policies to individual namespaces within their clusters or to the cluster as a whole. The Kubernetes kubectl tool uses the active credentials from the gcloud tool, allowing cluster admins to map roles to Google Cloud identities (users, service accounts, and Google Groups) as subjects in RoleBindings.

Google Groups for GKE (beta) enables you to use Groups with Kubernetes RBAC. To use this feature, you must configure Google Workspace Google Groups, create a cluster with the feature enabled, and use RoleBindings to associate your Groups with the roles that you want to bind them to. For more information, see Role-based access control.

For example, in the following figure, there are two users, user-a and user-b, who have been granted the config-reader and pod-reader roles on the app-a namespace.

RBAC authorization.

As another example, there are Google Cloud project-level IAM roles that give certain users access to all clusters in a project. In addition, individual namespace- and cluster-level role bindings are added through RBAC to give fine-grained access to resources within particular clusters or namespaces.

IAM RoleBindings.

Kubernetes includes some default roles, but as a cluster administrator, you can create your own that map more closely to your organizational needs. The following example role allows users only to view, edit, and update ConfigMaps but not delete them, because the delete verb is not included:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: config-editor
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]

After you have defined roles, you can apply those roles to the cluster or namespace through bindings. Bindings associate roles to their users, groups, or service accounts. The following example shows how to bind a previously created role (config-editor) to the bob@example.org user and to the development namespace.

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: config-editors
  namespace: development
subjects:
- kind: User
  name: bob@example.org
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: config-editor
  apiGroup: rbac.authorization.k8s.io

For more information about RBAC, see the GKE documentation.

Image access and sharing

Images in Container Registry or Artifact Registry (beta) are stored in Cloud Storage. This section discusses two ways to share images. One way is to make the images public, and the other is to share images between projects.

Making images public in Container Registry

You can make images public by making the objects and buckets backing them public. For more detailed instructions, see the Container Registry Access Control documentation.

Accessing images across projects in Container Registry

You can share container images between projects by ensuring that your Kubernetes nodes use a service account. The default service account associated with your project is in the following form:

project-id-compute@developer.gserviceaccount.com

After you have this identifier, you can grant it access as a storage.viewer on projects where you want to use the Container Registry. Use a custom service account that has restricted permissions, because the default service account has editor access to the entire project.

To use a different service account for your clusters, provide the service account at cluster or node-pool creation by using the --service-account flag. For example, to use the gke-sa service account in the project my-project:

gcloud container clusters create west --service-account \
    gke-sa@my-project.google.com.iam.gserviceaccount.com

For information about migrating from Container Registry to Artifact Registry for your container images, see Transitioning from Container Registry.

Determining the right image pull policy

The imagePullPolicy property determines whether Kubelet attempts to pull an image while it's starting up a Pod. You must consider an appropriate imagePullPolicy setting to specify for your container images. For example, you might specify the following image pull policy:

imagePullPolicy: IfNotPresent

In this case, Kubelet retrieves a copy of the image only if the image isn't available in the node's cache.

For more information about the possible image pull policies that you can specify, see Container Images in the Kubernetes documentation.

Using dynamic admission webhooks to enforce policies

Dynamic admission webhooks are a part of the Kubernetes control plane. They can intercept incoming requests made to the API server. Admission webhooks are a powerful tool that can help you enforce enterprise-specific custom policies in your GKE clusters.

Kubernetes supports two types of admission webhooks: mutating admission webhooks and validating admission webhooks.

Mutating admission webhooks intercept admission requests and can mutate (alter) the request. The request is then passed on to the API server.

Validating admission webhooks examine a request and determine if it's valid according to rules that you specify. If any validating admission webhooks are configured in the cluster, they're invoked after the request has been validated by the API server. Validating admission webhooks can reject requests in order to ensure conformance to policies that are defined in the webhook.

For example, you can enforce an image pull policy by using a mutating admission webhook to make sure that the policy is set to Always regardless of the imagePullPolicy setting that was specified by developers who submitted pod-creation requests.

Other image deployment considerations

It's a best practice to use a private container registry such as Container Registry to hold your organization's curated set of images. This helps to reduce the risk of introducing vulnerabilities into your deployment pipeline (and eventually into application workloads). If possible, enable container analysis, such as vulnerability scanning, to help further reduce your security risks.

If you must use public images, consider validating the set of public images that are allowed to be deployed into your clusters. (For more information, see the Binary Authorization section.) You can also consider deploying prepackaged Kubernetes apps from Google Cloud Marketplace. The Kubernetes apps listed on Google Cloud Marketplace are tested and vetted by Google, including vulnerability scanning and partner agreements for maintenance and support.

In addition, make sure that you use good image versioning practices—use good tagging conventions, and consider using digests instead of tags when applicable.

Using Workload Identity to interact with Google Cloud service APIs

Often, enterprise architectures involve architectural components that span cloud services—cloud-managed services and hosted services. It's a common pattern for your GKE applications or services to have to communicate with Google Cloud managed services such as Cloud Storage and BigQuery. As an example, you might need to store customer records after they're processed by batch jobs in GKE into BigQuery for later analysis.

Workload Identity is a GKE feature that lets your GKE services interact with the wider Google Cloud ecosystem without having to store service account credentials as Kubernetes secrets. This feature allows you to map a Kubernetes service account to a Google Cloud service account with the help of an IAM binding. Subsequently, when Pods run using the Kubernetes service account, they can assume the identity that's required in order to access the Google Cloud service. Note that this assumes that you have granted the required level of access for the service to the Google Cloud service account.

For more information about Workload Identity, see the GKE documentation.

Managing cluster security

Security is a multifaceted discipline that's of paramount importance in enterprise deployments of GKE clusters. This section covers several factors that you can use to harden your cluster's security.

Vulnerability scanning for images

Container Registry can scan images that are pushed to it, looking for known security vulnerabilities for images based on Ubuntu, Alpine, Debian, CentOS, and RedHat. We recommend that you take advantage of this feature to scan images that you plan to use in your Kubernetes clusters.

You can view vulnerabilities for an image in the Google Cloud console or by running the following gcloud command:

gcloud beta container images describe \
    hostname/project-id/image-id:tag  \
    --show-package-vulnerability

Replace the following:

  • hostname: one of the following hostname locations:
    • gcr.io, which currently hosts the images in the United States.
    • us.gcr.io, which hosts the image in the United States in a separate storage bucket from images hosted by gcr.io.
    • eu.gcr.io, which hosts the images in the European Union.
    • asia.gcr.io, which hosts the images in Asia.
  • project-id: the ID of the project that contains the images.
  • image-id: the ID of the image for which you want to view vulnerabilities.
  • tag: the image tag that you want to get information about.

Your organization can benefit from automating the tracking and receipt of notifications when changes are made to your Container Registry repository. For example, you can be notified when a new image is created or one is deleted. You can build a pipeline where application listeners are subscribed to a Pub/Sub topic to which Container Registry events are published. You can then use these events to trigger builds or automated deployments. For more information, see the Container Registry documentation.

Binary Authorization

With Kubernetes, you must determine if and when an image should be considered valid for deployment into your cluster. For this task you can use Binary Authorization. This is a deploy-time construct that lets you define a workflow that enforces the signatures (attestations) that an image must have in order to be deployable to your cluster.

The workflow is defined in terms of policies. As you move your code and therefore your container image through a CI/CD pipeline, Binary Authorization records attestations for each of these stages as defined in your Binary Authorization policy. These attestations validate that an image has successfully passed the defined milestones.

Binary Authorization integrates with the GKE deployment API and can ensure that deployment of an image is subject to the image having all the required attestations. Failed deployment attempts are autologged, and cluster administrators can review and audit them.

For a tutorial about how to implement Binary Authorization for GKE using Cloud Build, see Implementing Binary Authorization using Cloud Build and GKE.

Secure access with gVisor in GKE Sandbox

A container provides a layer of security and kernel isolation, but it might still be susceptible to breaches that lead to attackers gaining access to the host's operating system (OS). A more resilient approach to security isolation between a container and its host OS is to create another layer of separation. One approach is to use GKE Sandbox.

GKE Sandbox uses gVisor, an open source container runtime that was released by Google. Internally, gVisor creates a virtual kernel for containers to interact with, which abstracts the reach that a container has to the host kernel. Additionally, it enforces control on the file and network operations that the container can perform.

Because GKE Sandbox creates an additional layer of isolation, it might incur additional memory and CPU overhead. Before you use GKE Sandbox, consider which workloads need this elevated level of security. Typically, good candidates are services that are based on external images.

The following gcloud command shows how to create a node pool with GKE Sandbox enabled:

gcloud container node-pools create node-pool-name \
    --cluster=cluster \
    --image-type=cos_containerd \
    --sandbox type=gvisor \
    --enable-autoupgrade

Replace the following:

  • node-pool-name: The name of the mode pool to create.
  • cluster: The cluster to add the node pool to.

To specify which application Pods run using GKE Sandbox, incorporate gVisor in the Pod spec as shown in the following example:

apiVersion: v1
kind: Pod
metadata:
  name: sample-saas-app
  labels:
    app: saas-v1
spec:
  runtimeClassName: gvisor
  containers:
    - name: sample-node-app-v1
      image: [image]

For more information about GKE Sandbox, see GKE Sandbox: Bring defense in depth to your Pods on the Google Cloud blog. For more information about whether your application is suited for GKE Sandbox, see the GKE documentation.

Audit logging

Kubernetes audit logging records all API requests that are made to the Kubernetes API server. This logging is useful for helping you detect anomalies and unusual patterns of access and configuration setup. Examples of what you might want to check and alert on are the following:

  • Deleting a deployment.
  • Attaching to or using exec to access a container that has privileged access.
  • Modifying ClusterRole objects or creating role bindings for the cluster roles.
  • Creating service accounts in the kube-system namespace.

GKE integrates Kubernetes audit logging with Cloud Logging. You can access these logs the same way you access logs for resources that run in your Cloud project. API requests made to the Kubernetes API server can be logged, and you can use them to review API activity patterns.

Each request (event) that's captured by the Kubernetes API server is processed using one or more policies that you define. These can be either Kubernetes audit policies that determine which events are logged, or they can be Google Kubernetes Engine audit policies that determine whether the events are logged in the admin activity log or the data log. Admin activity logs are enabled by default. You can also enable data access logging if you need to log details about what metadata and data was read or written within the clusters. Note that enabling data access logging can incur additional charges. For more information, see the pricing documentation.

PodSecurityPolicies

A common attack vector is to deploy Pods that have escalated privileges in an attempt to gain access to a Kubernetes cluster. PodSecurityPolicies define a set of rules in the Pod specification that outline what a Pod is allowed to do. You implement a PodSecurityPolicy in Kubernetes as an admission controller resource. You can use it to restrict access to how namespaces are used, to how volume types are used, and to underlying OS capabilities.

To create a GKE cluster with a PodSecurityPolicy enabled, use the following command. Replace cluster-name with the name of the cluster that you're adding a PodSecurityPolicy to.

gcloud beta container clusters create cluster-name \
    --enable-pod-security-policy

The following example shows a PodSecurityPollicy that restricts the ability to create privileged Pods.

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: default-pod-security-policy
spec:
  privileged: false
  hostPID: false
  seLinux:
    rule: RunAsAny
  runAsUser:
    rule: MustRunAsNonRoot

Container security considerations

The fundamental building block for Kubernetes services is the container. This makes container security a key factor when you plan cluster security and policies. Carefully consider the following:

  • The images that you build your containers from.
  • The privileges that you assign to containers.
  • How containers interact with the host OS and other services.
  • How containers access and log sensitive information.
  • How you manage the lifecycle of the containers in your clusters.

For more information and best practices, see the documentation for building and operating containers.

Configuring networking

Kubernetes provides a service abstraction that includes load-balancing and service discovery across sets of Pods within a cluster as well as to legacy systems that are running outside the cluster. The following sections describe best practices for communication between Kubernetes Pods and with other systems, including other Kubernetes clusters.

VPC-native clusters compared to routes-based clusters

Based on how GKE clusters route traffic from one Pod to another, the clusters can be categorized into two types. The first is a cluster that uses alias IP ranges for routing traffic; this is called a VPC-native cluster. The second is a cluster that uses Google Cloud routes; this is called a routes-based cluster.

VPC-native clusters use alias IP ranges for Pod networking. This means that the control plane automatically manages the routing configuration for Pods instead of configuring and maintaining static routes for each node in the GKE cluster. Using alias IP ranges, you can configure multiple internal IP addresses that represent containers or applications hosted in a VM, without having to define a separate network interface. Google Cloud automatically installs VPC network routes for primary and alias IP ranges for the subnet of the primary network interface. This greatly simplifies Pod-to-Pod traffic routing.

Additionally, VPC-native clusters are not subject to route quotas. Leveraging alias IP ranges in the cluster fabric provides direct access to Google services like Cloud Storage and BigQuery; this access otherwise would be possible only with a NAT gateway.

It's a common pattern for enterprises to have their GKE clusters communicate securely with their on-premises ecosystem of applications and services. Alias IP ranges permit this, because the alias IP addresses are discoverable over Cloud VPN or Cloud Interconnect. This helps give you secure connectivity across your on-premises and Google Cloud infrastructure.

You need to decide which cluster type is best suited to your network topology. Key factors are the availability of IP addresses in your network, cluster (nodes) expansion plans in your enterprise, and connectivity to other applications in the ecosystem. VPC-native clusters tend to consume more IP addresses in the network, so you should take that into account. Note that you cannot migrate a VPC-native cluster to a route-based cluster after creation and a route-based cluster to a VPC-native cluster, so it's important to understand the implications of your choice before you implement it.

Communicating within the same cluster

Service discovery

Kubernetes allows you to define services that group Pods that are running in the cluster based on a set of labels. This group of Pods can be discovered within your cluster using DNS. For more information about service discovery in Kubernetes, go to the Connecting Applications with Services documentation.

DNS

A cluster-local DNS server, kube-dns, is deployed in each GKE cluster that handles mapping service names to healthy Pod IPs. By default, the Kubernetes DNS server returns the service's cluster IP address. This IP address is static throughout the lifetime of the service. When sending traffic to this IP, the iptables on the node load balance packets across the ready Pods that match the selectors of the service. These iptables are programmed automatically by the kube-proxy service running on each node.

If you want service discovery and health monitoring but would rather have the DNS service return you the IP addresses of Pods rather than a virtual IP address, you can provision the service with the ClusterIP field set to "None," which makes the service headless. In this case, the DNS server returns a list of A records that map the DNS name of your service to the A records of the ready Pods that match the label selectors defined by the service. The records in the response rotate to facilitate spreading load across the various Pods. Some client-side DNS resolvers might cache DNS replies, rendering the A record rotation ineffective. The advantages of using the ClusterIP are listed in the Kubernetes documentation.

One typical use case for headless services is with StatefulSets. StatefulSets are well-suited to run stateful applications that require stable storage and networking among their replicas. This type of deployment provisions Pods that have a stable network identity, meaning their hostnames can be resolved in the cluster. Although the Pod's IP address might change, its hostname DNS entry is kept up to date and resolvable.

Packet flow: ClusterIP

The following diagram shows the DNS response and packet flow of a standard Kubernetes service. While Pod IP addresses are routable from outside the cluster, a service's cluster IP address is only accessible within the cluster. These virtual IP addresses are implemented by doing destination network address translation (DNAT) in each Kubernetes node. The kube-proxy service running on each node keeps forwarding rules up to date on each node that map the cluster IP address to the IP addresses of healthy Pods across the cluster. If there is a Pod of the service running on the local node, then that Pod is used, otherwise a random Pod in the cluster is chosen.

ClusterIP service.

For more information about how ClusterIP is implemented, go to the Kubernetes documentation. For a deep dive into GKE networking, watch the Next 2017 talk on YouTube:

Headless services

The following is an example of the DNS response and traffic pattern for a headless service. Pod IP addresses are routable through the default Google Cloud subnet route tables and are accessed by your application directly.

Example DNS response and traffic pattern for headless service.

Network policies

You can use GKE network policy enforcement to control the communication between your cluster's Pods and services. To define a network policy on GKE, you can use the Kubernetes Network Policy API to create Pod-level firewall rules. These firewall rules determine which Pods and services can access one another inside your cluster.

Network policies are a kind of defense in depth that enhances the security of the workloads running on your cluster. For example, you can create a network policy to ensure that a compromised front-end service in your application cannot communicate directly with a billing or accounting service several levels down.

Network policies can also be used to isolate workloads belonging to different tenants. For example, you can provide secure multi-tenancy by defining a tenant-per-namespace model. In such a model, network policy rules can ensure that Pods and services in a given namespace cannot access other Pods or services in a different namespace.

To learn more about network policies, see the GKE documentation.

Connecting to a GKE cluster from inside Google Cloud

To connect to your services from outside of your cluster but within the Google Cloud network's private IP address space, use internal load balancing. When creating a service with type: Load Balancer and a cloud.google.com/load-balancer-type: Internal annotation in Kubernetes, an internal Network Load Balancer is created in your Google project and configured to distribute TCP and UDP traffic among Pods.

Connecting from inside a cluster to external services

In many cases it is necessary to connect your applications running inside of Kubernetes with a service, database, or application that lives outside of the cluster. You have 3 options, as outlined in the following sections.

Stub domains

In Kubernetes 1.6 and later, you can configure the cluster internal DNS service (kube-dns) to forward DNS queries for a certain domain to an external DNS server. This is useful when you have authoritative DNS servers that should be queried for a domain that your Kubernetes Pods must use.

External name services

External name services allow you to map a DNS record to a service name within the cluster. In this case, DNS lookups for the in-cluster service return a CNAME record of your choosing. Use this if you only have a few records that you want to map back to existing DNS services.

Services without selectors

You can create services without a selector and then manually add endpoints to it to populate service discovery with the correct values. This allows you to use the same service discovery mechanism for your in- cluster services while ensuring that systems without service discovery through DNS are still reachable. While this approach is the most flexible, it also requires the most configuration and maintenance in the long term.

For more information about DNS, go to the Kubernetes DNS Pods and Services documentation page.

Configuring your services in Kubernetes to receive internet traffic

Kubernetes services can be exposed using NodePort, ClusterIP, and LoadBalancer.

However, when you have many external-facing services, you can consider using Kubernetes Ingress resources. Ingress provides an entry point for your cluster and lets you define routing rules that route incoming requests to one or more backend services in your cluster. In GKE, the GKE Ingress controller implements an Ingress resource as Google Cloud HTTP(S) load balancer and configures it according to the information in the Ingress resource and its associated services.

A Kubernetes Ingress resource can be used only when your applications serve traffic over HTTP(S). If your backend services use TCP or UDP protocols, you must use a network load balancer instead. This might be necessary, for example, if you need to expose your database as a service.

Backend configuration

A BackendConfig is a custom resource definition that can provide additional prescriptive configuration that's used by the Kubernetes Ingress controller. When you deploy an Ingress object in your GKE cluster, the Kubernetes Ingress controller configures an HTTP(S) load balancer that routes incoming requests to the backend services as you specified in the Ingress manifest.

You can supplement the configuration of the load balancer with specifications like the following:

  • Enabling caching with Cloud CDN.
  • Adding IP address or CIDR allowlists (whitelists) with Google Cloud Armor.
  • Controlling application-level access with Identity-Aware Proxy (IAP).
  • Configuring service timeouts and connection-draining timeouts for services that are governed by the Ingress object in a cluster.

For more information about configuring the BackendConfig custom resource in GKE, see the GKE documentation.

Using a service mesh

A service mesh provides a uniform way to connect, secure, and manage microservices that are running in your Kubernetes clusters. For example, the Istio service mesh that you can add as a GKE add-on can manage service-to- service authentication and communication, enforce access policies, and collect rich telemetry data points that you can use to audit and administer your GKE clusters.

Key features that a service mesh provides are the following:

  • Traffic management. The service mesh allows you to define granular rules that determine how traffic is routed and split among services or among different versions of the same service. This makes it easier to roll out canary and blue-green deployments.

  • Built-in observability. The mesh records network traffic (Layer 4 and Layer 7) metrics in a uniform manner without requiring you to write code to instrument your services.

  • Security. The mesh enables mutual TLS (mTLS) between services. It not only provides secure channels for data in transit but also helps you manage the authentication and authorization of services within the mesh.

In summary, service meshes like Istio allow you to delegate system-level tasks to the mesh infrastructure. This improves the overall agility, robustness, and loose coupling of services that are running in your Kubernetes clusters.

For more information, see Istio on Google Kubernetes Engine.

Firewalling

GKE nodes are provisioned as instances in Compute Engine. As such, they adhere to the same stateful firewall mechanism as other instances. These firewall rules are applied within your network to instances by using tags. Each node pool receives its own set of tags that you can use in rules. By default, each instance belonging to a node pool receives a tag that identifies a specific Kubernetes Engine cluster that this node pool is a part of. This tag is used in firewall rules that Kubernetes Engine creates automatically for you. You can add your own custom tags at either cluster or node pool creation time using the --tags flag in the gcloud tool.

For example, to allow an internal load balancer to access port 8080 on all your nodes, you would use the following commands:

gcloud compute firewall-rules create allow-8080-fwr \
    --target-tags allow-8080 \
    --allow tcp:8080 \
    --network gke \
    --source-range 130.211.0.0/22
gcloud container clusters create my-cluster --tags allow-8080

The following example shows how to tag one cluster so that internet traffic can access nodes on port 30000 while the other cluster is tagged to allow traffic from the VPN to port 40000. This is useful when exposing a service through a NodePort that should only be accessible using privileged networks like a VPN back to a corporate data center, or from another cluster within your project.

Tagging two clusters differently.

Connecting to an on-premises data center

There are several Cloud Interconnect options for connecting to on-premises data centers. These options are not mutually exclusive, so you might have a combination, based on workload and requirements:

  1. Internet for workloads that aren't data intensive or latency sensitive. Google has more than 100 points of presence (PoPs) connecting to service providers across the world.
  2. Direct Peering for workloads that require dedicated bandwidth, are latency sensitive, and need access to all Google services, including the full suite of Google Cloud products. Direct Peering is a Layer 3 connection, done by exchanging BGP routes, and thus requires a registered ASN.
  3. Carrier Peering is the same as Direct Peering, but done through a service provider. This is a great option if you don't have a registered ASN, or have existing relationships with a preferred service provider.
  4. Cloud VPN is configured over Layer 3 interconnect and internet options (1, 2, and 3), if IPsec encryption is required, or if you want to extend your private network into your private Compute Engine network.

Managing cluster operability

This section discusses the key factors to consider when you're administrating and operating your GKE clusters.

Resource quotas

Kubernetes resource quotas provide constraints that limit the aggregate permissible resource consumption for each namespace in a cluster. If you have clusters with Kubernetes namespaces that isolate business functions or development stages, you can use quotas to limit a wide array of resources, such as CPU utilization, memory, or the number of Pods and services that can be created within a namespace. To ensure stability of the control plane of your GKE clusters, Kubernetes automatically applies default, non-overridable resource quotas to each namespace in any GKE clusters that have five nodes or fewer.

Resource limits

You can use the Kubernetes LimitRange object to enforce granular constraints on the minimum and maximum resource boundaries that containers and Pods can be created with. The following example shows how to use LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: sample-limits
spec:
  limits:
    - max:
        cpu: "400m"
        memory: "1Gi"
      defaultRequest:
        cpu: "200m"
        memory: "500Mi"
      type: Container

Pod disruption budgets

Pod disruption budgets (PDB) help guard against voluntary or accidental deletion of Pods or Deployments by your team. PDBs cannot prevent involuntary disruptions that can be caused by a node going down or restarting. Typically an operator creates a PDB for an application that defines the minimum number of replicas of the Pods for the application.

In an enterprise where developers work on multiple applications, mistakes do happen, and a developer or an administrator might accidentally run a script that deletes Pods or Deployments—in other words, that deletes your Kubernetes resources. But by defining a PDB, you help ensure that you maintain a minimum viable set of resources for your Kubernetes applications at all times.

PDBs that you configure for your GKE clusters are honored during GKE upgrades. This means that you can control the availability of your applications during an upgrade. The following example shows how you can configure a PDB.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
  spec:
    minAvailable: 4
    selector:
      matchLabels:
        app: nginx

Managing Kubernetes upgrades

Keep your Kubernetes clusters on GKE updated to the latest version of Kubernetes that fits your requirements. This allows you to leverage new features that are rolled out and to make sure that the underlying operating system of your cluster nodes is patched and up to date.

When an upgrade is needed, you can consider the following types:

  • Major and minor Kubernetes version upgrades of your GKE cluster for the master and worker nodes.
  • OS patches and upgrades of the virtual machines (nodes) that constitute your cluster.

Upgrading the Kubernetes version

You have two options for upgrading your GKE master nodes. The first is to let Google Cloud automatically upgrade your GKE cluster master. The second is to initiate a manual upgrade when a newer version becomes available.

You can review notifications in the console that show up against your GKE clusters when upgrades are available. We recommend that you trigger the version upgrade after you've reviewed the release content, and after you've tested your applications in a sandboxed cluster that's running on the version that you want to upgrade to.

When the master node of a zonal cluster is undergoing an upgrade, the control plane is unavailable. This means that you are not able to interact with the API server to add or remove resources in your cluster. If you can't afford the downtime for upgrading a master node in your zonal cluster, you can make the master node highly available by deploying regional GKE clusters instead. With this approach, you can have multiple master nodes that are spread across zones. When one master node is being upgraded, any control plane requests to the API server are routed to the other master node or nodes.

As with master nodes, you have two options for upgrading your GKE worker nodes to the same version as the cluster's master node:

  • You can have GKE manage the worker node upgrades for you. You do this by enabling automatic node upgrade for the node pools in your GKE cluster.
  • You can manually upgrade your GKE worker nodes. When an upgrade is available, the GKE console shows an alert. When you see that alert, you can apply the upgrade to your GKE worker nodes.

In both cases, when an upgrade is applied, GKE applies a rolling update to the worker nodes—it systematically drains, brings down, and upgrades one node at a time when a new replacement node becomes available to respond to incoming requests.

Node auto-repair

The GKE node auto-repair feature manages the health checks of your GKE nodes. If any of the nodes are found to be unhealthy, GKE initiates the node-repair process.

The managed node-repair process involves draining and recreating the node. If multiple nodes in your GKE cluster need to be repaired at the same time, Google Cloud internally determines how many nodes can be repaired in parallel.

If you create clusters in the Google Cloud console, the auto-repair feature is automatically enabled. For the GKE clusters that you create using the gcloud tool, you can explicitly enable auto-repair by including the --enable-autorepair flag in the cluster creation command.

If your GKE cluster has multiple node pools, the auto-repair feature gives you granular control over which node pools you want to enable node auto-repair for.

Autoscaling GKE clusters

Enterprises often experience varying incoming load on the applications that are running in their Kubernetes clusters. To respond to these business-driven changes, you can enable your GKE clusters to respond automatically and to scale up and down based on metrics.

Autoscaling includes multiple dimensions, as discussed in the following sections.

Cluster autoscaler

The GKE cluster autoscaler automatically adds and removes nodes from your cluster depending on the demand of your workloads. Cluster autoscaler is enabled for individual node pools. For each node pool, GKE checks whether there are Pods that are waiting to be scheduled due to lack of capacity. If so, the cluster autoscaler adds nodes to that node pool.

A combination of factors influences how GKE decides to scale down. If a node is utilized less than 50% by the Pods running on it, and if the running Pods can be scheduled on other nodes that have capacity, the underutilized node is drained and terminated.

You can set the boundaries for a node pool by specifying the minimum and maximum nodes that the cluster autoscaler can scale to.

Horizontal Pod Autoscaling (HPA)

Kubernetes lets you create a Horizontal Pod Autoscaler (HPA) that lets you configure how your Kubernetes Deployments or ReplicaSets should scale, and what metrics the scaling decision should be based on. By default, the HPA controller bases the autoscaling decisions on CPU utilization. However, the HPA controller can compute how Pods should scale based on custom metrics as well, such as an HTTP request count. For an HPA to respond to custom metrics, additional monitoring instrumentation is usually required.

For more information, see the Kubernetes and GKE documentation.

Vertical Pod Autoscaling (VPA)

The Vertical Pod Autoscaling (VPA) feature in GKE clusters lets you offload the task of specifying optimal CPU and memory requests for containers. When necessary, VPA tunes the resource allocations that are made to containers in your cluster. VPA lets you optimize resource utilization for clusters by optimizing at the container level on each node. It also frees up administrative time that you would otherwise have to invest in maintaining your resources.

VPA works in tandem with the node auto-provisioning feature that's described in the next section.

Due to Kubernetes limitations, the resource requests on a Pod can be changed only when a Pod is restarted. Therefore, to make changes, VPA evicts the Pod. For more information, see the GKE and Kubernetes documentation.

Node auto-provisioning

Node auto-provisioning enables the GKE cluster autoscaler to automatically provision additional node pools when the autoscaler determines that they're required. The cluster autoscaler can also delete auto-provisioned node pools when there are no nodes in those node pools.

Node auto-provisioning decisions are made by the GKE cluster autoscaler based on a number of factors. These include the quantity of resources requested by Pods, Pod affinities that you have specified, and node taints and tolerations that are defined in your GKE cluster.

Node auto-provisioning is useful if you have a variety of workloads running in your GKE clusters. As an example, if your GKE cluster has a GPU-dependent workload, you can run it in a dedicated node pool that's provisioned with GPU-capable nodes. You can define the node pool scaling boundaries by specifying a minimum and maximum node pool size.

For more information about node auto-provisioning and when to enable it, see Using node auto-provisioning.

What's next

  • Learn about best practices for building and operating containers.
  • Learn about authenticating end users to Cloud Run on GKE using Istio in this tutorial.
  • Try out other Google Cloud features for yourself. Have a look at our tutorials.