Hardening your cluster's security

With the speed of development in Kubernetes, there are often new security features for you to use. This document describes how to harden your Google Distributed Cloud clusters.

This document prioritizes high-value security mitigations that require your action at cluster creation time. Less critical features, secure-by-default settings, and those that can be enabled after cluster creation are mentioned later in the document. For a general overview of security topics, review Security.

Checklist

The following deployment checklist highlights best practices for hardening your GKE clusters platform deployment. For more information about each practice, see the sections in this document.

Deployment checklist Description
Identity and access control

Use vSphere account privileges:
Use a vSphere administrator account with minimal privileges.

Secure Google Cloud service accounts:
Minimize Google Cloud service account privileges.

Configure OpenID Connect (OIDC):
Configure OpenID Connect for user authentication.

Use Kubernetes namespaces and RBAC to restrict access:
Use namespaces with RBAC for administrative isolation and least privilege roles and entitlements.

Data protection

Encrypt vSphere virtual machines:
Set vSphere to encrypt the volumes used by Google Distributed Cloud.

Manage secrets:
Encrypt secrets at rest.

Network protection

Restrict network access to the control plane and nodes:
Set up controls to isolate and protect the control plane networks and nodes.

Use network policies to restrict traffic:
Implement network policies to restrict intra-cluster traffic.

Declarative security

Use Config Management policy controller:
Install Config Management Policy Controller for declarative security policy within your clusters.

Maintenance

Upgrade GKE Enterprise:
Make sure that you're running the latest version of GKE Enterprise for your platform.

Monitor security bulletins:
Check GKE Enterprise security bulletins for the latest advice and guidance on versioning.

Monitoring and logging

Set options for GKE clusters logging:
Ensure logging is enabled and integrated into a SIEM solution.

Identity and access control

Use vSphere account privileges

The vCenter user account that you use to install Google Distributed Cloud must have sufficient privileges. For example, a user account that is assigned the vCenter's Administrator role has privileges for complete access to all vCenter objects and provides an Google Distributed Cloud cluster administrator with full access.

The principle of least privilege is recommended, granting only the necessary privileges to successfully install GKE Enterprise. We have predefined the minimum set of privileges needed to perform the installation, as well as the commands needed to grant these permissions.

Secure Google Cloud service accounts

Google Distributed Cloud requires three Google Cloud service accounts:

  • A predefined service account for accessing the Google Distributed Cloud software. You create this when you purchase GKE Enterprise.
  • A register service account to be used by Connect for registering Google Distributed Cloud clusters with Google Cloud.
  • A Cloud Logging service account for collecting cluster logs for use by Cloud Logging.

During installation, you bind Identity and Access Management roles to these service accounts. Those roles grant the service accounts specific privileges within your project and can be generated for you during installation.

Configure authentication for cluster users

To configure user authentication for your cluster, you can use OpenID Connect (OIDC) or Lightweight Directory Access Protocol (LDAP).

For more information, See GKE Identity Service.

Use Kubernetes namespaces and RBAC to restrict access

To give teams least-privilege access to Kubernetes, create Kubernetes Namespaces or environment-specific clusters. Assign cost centers and appropriate labels to each namespace for accountability and chargeback. Only give developers the level of access to their Namespaces that they need to deploy and manage their applications, especially in production.

Map the tasks that users need to complete against the cluster and define the permissions required to complete each task. To grant cluster- and namespace-level permissions, use Kubernetes RBAC.

Beyond the permissions for Google Cloud service accounts used to install Google Distributed Cloud, IAM does not apply to Google Distributed Cloud clusters.

For more information, see the following documentation:

Data protection

Encrypt vSphere virtual machines

Google Distributed Cloud cluster nodes run on virtual machines (VMs) in your vSphere cluster. Google strongly recommends that you encrypt all data at rest. To do so on vSphere, follow the VMware vSphere security PDF and best practice guidance for encrypting VMs.

This must be done prior to the installation of GKE Enterprise.

Manage secrets

To provide an extra layer of protection for sensitive data, such as Kubernetes Secrets stored in etcd, configure a secrets manager that is integrated with Google Distributed Cloud clusters.

If you are running workloads across multiple environments, you might prefer a solution that works for both Google Kubernetes Engine and Google Distributed Cloud. If you choose to use an external secrets manager, such as HashiCorp Vault, set it up prior to integrating your Google Distributed Cloud clusters.

You have several options for secrets management:

  • You can use Kubernetes Secrets natively in Google Distributed Cloud. We expect clusters to be using vSphere encryption for VMs as described earlier, which provides basic encryption-at-rest protection for secrets. Secrets are not further encrypted by default.
  • You can use an external secrets manager, such as HashiCorp Vault. You can authenticate to HashiCorp using either a Kubernetes service account or a Google Cloud service account.

For more information, see the following documentation:

Network protection

Restrict network access to the control plane and nodes

Limit exposure of your cluster control plane and nodes to the internet. These choices cannot be changed after cluster creation. By default, Google Distributed Cloud cluster nodes are created using RFC 1918 addresses, and it is best practice to not change this. Implement firewall rules in your on-premises network to restrict access to the control plane.

Use network policies to restrict traffic

By default, all Services in a Google Distributed Cloud cluster can communicate with each other. For information about controlling Service-to-Service communication as needed for your workloads, see the following sections.

Restricting network access to services makes it much more difficult for attackers to move laterally within your cluster and offers services some protection against accidental or deliberate denial of service. There are two recommended ways to control traffic:

  • To control L7 traffic to your applications' endpoints, use Istio. Choose this if you're interested in load balancing, service authorization, throttling, quota, and metrics.
  • To control L4 traffic between Pods, use Kubernetes network policies. Choose this if you're looking for the basic access control functionality managed by Kubernetes.

You can enable both Istio and Kubernetes network policy after you create your Google Distributed Cloud clusters. You can use them together if you need to.

For more information, see the following documentation:

Declarative security

Use Config Management policy controller

Kubernetes admission controllers are plugins that govern and enforce how a Kubernetes cluster is used. Admission controllers are an important part of the defence-in-depth approach to hardening your cluster.

The best practice is to use Config Management's Policy Controller. Policy Controller uses the OPA Constraint Framework to describe and enforce policy as CRDs. The constraints that you apply to your cluster are defined in constraint templates, which are deployed in your clusters.

For information about how to use Policy Controller constraints to achieve many of the same protections as PodSecurityPolicies, with the added ability to test your policies before enforcing them, see Using constraints to enforce Pod security.

For more information, see the following documentation:

Restrict the ability for workloads to self-modify

Certain Kubernetes workloads, especially system workloads, have permission to self-modify. For example, some workloads vertically autoscale themselves. While convenient, this can allow an attacker who has already compromised a node to escalate further in the cluster. For example, an attacker could have a workload on the node change itself to run as a more privileged service account that exists in the same namespace.

Ideally, workloads should not be granted the permission to modify themselves in the first place. When self-modification is necessary, you can limit permissions by applying Gatekeeper or Policy Controller constraints, such as NoUpdateServiceAccount from the open source Gatekeeper library, which provides several useful security policies.

When you deploy policies, it is usually necessary to allow the controllers that manage the cluster lifecycle to bypass the policies and the logging and monitoring pipelines. This is necessary so that the controllers can make changes to the cluster, such as applying cluster upgrades. For example, if you deploy the NoUpdateServiceAccount policy on Google Distributed Cloud, you must set the following parameters in the Constraint:

parameters:
  allowedGroups:
  - system:masters
  allowedUsers:
  - system:serviceaccount:kube-system:monitoring-operator
  - system:serviceaccount:kube-system:stackdriver-operator
  - system:serviceaccount:kube-system:metrics-server-operator
  - system:serviceaccount:kube-system:logmon-operator

Maintenance

Upgrade GKE Enterprise

Kubernetes regularly introduces new security features and provides security patches.

You are responsible for keeping your Google Distributed Cloud clusters up to date. For each release, review the release notes. Additionally, plan to update to new patch releases every month and minor versions every three months. Learn how to upgrade your clusters.

You are also responsible for upgrading and securing the vSphere infrastructure:

Monitor security bulletins

The GKE Enterprise security team publishes security bulletins for high and critical severity vulnerabilities.

These bulletins follow a common Google Cloud vulnerability numbering scheme and are linked from the main Google Cloud bulletins page and the Google Distributed Cloud release notes. Each security bulletin page has an RSS feed where users can subscribe to updates.

When customer action is required to address these high and critical vulnerabilities, Google contacts customers by email. In addition, Google might also contact customers with support contracts through support channels.

For more information, see the following documentation:

Monitoring and logging

Set options for GKE clusters logging

Google Distributed Cloud includes multiple options for cluster logging and monitoring, including cloud-based managed services, open source tools, and validated compatibility with third-party commercial solutions:

  • Cloud Logging and Cloud Monitoring, enabled by in-cluster agents deployed with Google Distributed Cloud
  • Prometheus and Grafana, disabled by default
  • Validated configurations with third-party solutions

Whichever logging solution you choose based on business requirements, we strongly advise logging forward-relevant events and alerts to a centralized security information and event management (SIEM) service for management of security incidents.

For more information, see the following documentation: