Harden your cluster security

This document describes how to harden the security of your GKE on Bare Metal clusters.

Secure your containers using SELinux

You can secure your containers by enabling SELinux, which is supported for Red Hat Enterprise Linux (RHEL). If your host machines are running RHEL and you want to enable SELinux for your cluster, you must enable SELinux in all of your host machines. See secure your containers using SELinux for details.

Use seccomp to restrict containers

Secure computing mode (seccomp) is available in version 1.11 of GKE on Bare Metal and higher. Running containers with a seccomp profile improves the security of your cluster because it restricts the system calls that containers are allowed to make to the kernel. This reduces the chance of kernel vulnerabilities being exploited.

The default seccomp profile contains a list of system calls that a container is allowed to make. Any system calls not on the list are disallowed. seccomp is enabled by default in version 1.11 of GKE on Bare Metal. This means that all system containers and customer workloads are run with the container runtime's default seccomp profile. Even containers and workloads that don't specify a seccomp profile in their configuration files are subject to seccomp restrictions.

How to disable seccomp cluster-wide or on particular workloads

You can disable seccomp during cluster creation or cluster upgrade only. bmctl update can't be used to disable this feature. If you want to disable seccomp within a cluster, add the following clusterSecurity section to the cluster's configuration file:

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: example
  namespace: cluster-example
spec:
...
  clusterSecurity:
    enableSeccomp: false
...

In the unlikely event that some of your workloads need to execute system calls that seccomp blocks by default, you don't have to disable seccomp on the whole cluster. Instead, you can single out particular workloads to run in unconfined mode. Running a workload in unconfined mode frees that workload from the restrictions that the seccomp profile imposes on the rest of the cluster.

To run a container in unconfined mode, add the following securityContext section to the Pod manifest:

apiVersion: v1
kind: Pod
....
spec:
  securityContext:
    seccompProfile:
      type: Unconfined
....

Don't run containers as root user

By default, processes in containers execute as root. This poses a potential security problem, because if a process breaks out of the container, that process runs as root on the host machine. It's therefore advisable to run all your workloads as a non-root user.

The following sections describe two ways of running containers as a non-root user.

Method #1: add USER instruction in Dockerfile

This method uses a Dockerfile to ensure that containers don't run as a root user. In a Dockerfile, you can specify which user the process inside a container should be run as. The following snippet from a Dockerfile shows how to do this:

....

#Add a user with userid 8877 and name nonroot
RUN useradd −u 8877 nonroot

#Run Container as nonroot
USER nonroot
....

In this example, the Linux command useradd -u creates a user called nonroot inside the container. This user has a user ID (UID) of 8877.

The next line in the Dockerfile runs the command USER nonroot. This command specifies that from this point on in the image, commands are run as the user nonroot.

Grant permissions to UID 8877 so that the container processes can execute properly for nonroot.

Method #2: add securityContext fields in Kubernetes manifest file

This method uses a Kubernetes manifest file to ensure that containers don't run as a root user. Security settings are specified for a Pod, and those security settings are in turn applied to all containers within the Pod.

The following example shows an excerpt of a manifest file for a given Pod:

apiVersion: v1
kind: Pod
metadata:
  name: name-of-pod
spec:
  securityContext:
    runAsUser: 8877
    runAsGroup: 8877
....

The runAsUser field specifies that for any containers in the Pod, all processes run with user ID 8877. The runAsGroup field specifies that these processes have a primary group ID (GID) of 8877. Remember to grant the necessary and sufficient permissions to UID 8877 so that the container processes can execute properly.

This ensures that processes within a container are run as UID 8877, which has fewer privileges than root.

System containers in GKE on Bare Metal help install and manage clusters. The UIDs and GIDs used by these containers can be controlled by the field startUIDRangeRootlessContainers in the cluster specification. The startUIDRangeRootlessContainers is an optional field which, if not specified, has a value of 2000. Allowed values for startUIDRangeRootlessContainers are 1000-57000. The startUIDRangeRootlessContainers value can be changed during upgrades only. The system containers use the UIDs and GIDs in the range startUIDRangeRootlessContainers to startUIDRangeRootlessContainers + 2999.

The following example shows an excerpt of a manifest file for a Cluster resource:

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: name-of-cluster
spec:
 clusterSecurity:
    startUIDRangeRootlessContainers: 5000
...

Choose the value for startUIDRangeRootlessContainers so that the UID and GID spaces used by the system containers don't overlap with those assigned to user workloads.

How to disable rootless mode

Starting with GKE on Bare Metal release 1.10, Kubernetes control plane containers and system containers run as non-root users by default. GKE on Bare Metal assigns these users UIDs and GIDs in the range 2000-4999. However, this assignment can cause problems if those UIDs and GIDs have already been allocated to processes running inside your environment.

Starting with GKE on Bare Metal release 1.11, you can disable rootless mode when you upgrade your cluster. When rootless mode is disabled, Kubernetes control plane containers and system containers run as the root user.

To disable rootless mode, perform the following steps:

  1. Add the following clusterSecurity section to the cluster's configuration file:

    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: example
      namespace: cluster-example
    spec:
    ...
      clusterSecurity:
        enableRootlessContainers: false
    ...
    
  2. Upgrade your cluster. For details, see Upgrade clusters.

Restrict the ability for workloads to self-modify

Certain Kubernetes workloads, especially system workloads, have permission to self-modify. For example, some workloads vertically autoscale themselves. While convenient, this can allow an attacker who has already compromised a node to escalate further in the cluster. For example, an attacker could have a workload on the node change itself to run as a more privileged service account that exists in the same namespace.

Ideally, workloads shouldn't be granted the permission to modify themselves in the first place. When self-modification is necessary, you can limit permissions by applying Gatekeeper or Policy Controller constraints, such as NoUpdateServiceAccount from the open source Gatekeeper library, which provides several useful security policies.

When you deploy policies, it's usually necessary to allow the controllers that manage the cluster lifecycle to bypass the policies. This is necessary so that the controllers can make changes to the cluster, such as applying cluster upgrades. For example, if you deploy the NoUpdateServiceAccount policy on GKE on Bare Metal, you must set the following parameters in the Constraint:

parameters:
  allowedGroups:
  - system:masters
  allowedUsers: []

Disable kubelet read-only port

Starting with release 1.15.0, GKE on Bare Metal disables by default port 10255, the kubelet read-only port. Any customer workloads that are configured to read data from this insecure kubelet port 10255 should migrate to use the secure kubelet port 10250.

Only clusters created with version 1.15.0 or higher have this port disabled by default. The kubelet read-only port 10255 remains accessible for clusters created with a version lower than 1.15.0, even after a cluster upgrade to version 1.15.0 or higher.

This change was made because the kubelet leaks low sensitivity information over port 10255, which is unauthenticated. The information includes the full configuration information for all Pods running on a Node, which can be valuable to an attacker. It also exposes metrics and status information, which can provide business-sensitive insights.

Disabling the kubelet read-only port is recommended by the CIS Kubernetes Benchmark.

Maintenance

Monitoring security bulletins and upgrading your clusters are important security measures to take once your clusters are up and running.

Monitor security bulletins

The GKE security team publishes security bulletins for high and critical severity vulnerabilities.

These bulletins follow a common Google Cloud vulnerability numbering scheme and are linked from the main Google Cloud bulletins page and the GKE on Bare Metal release notes.

Use this XML feed to subscribe to security bulletins for GKE on Bare Metal and related products. Subscribe

When customer action is required to address these high and critical vulnerabilities, Google contacts customers by email. In addition, Google might also contact customers with support contracts through support channels.

For more information about how Google manages security vulnerabilities and patches for GKE and GKE Enterprise, see Security patching.

Upgrade clusters

Kubernetes regularly introduces new security features and provides security patches. GKE on Bare Metal releases incorporate Kubernetes security enhancements that address security vulnerabilities that may affect your clusters.

You are responsible for keeping your GKE on Bare Metal clusters up to date. For each release, review the release notes. To minimize security risks to your clusters, plan to update to new patch releases every month and minor versions every four months.

One of the many advantages of upgrading a cluster is that it automatically refreshes the cluster kubeconfig file. The kubeconfig file authenticates a user to a cluster. The kubeconfig file is added to your cluster directory when you create a cluster with bmctl. The default name and path is bmctl-workspace/CLUSTER_NAME/CLUSTER_NAME-kubeconfig. When you upgrade a cluster, that cluster's kubeconfig file is automatically renewed. Otherwise, the kubeconfig file expires one year after it was created.

For information about how to upgrade your clusters, see upgrade your clusters.

Use VPC Service Controls with Cloud Interconnect or Cloud VPN

Cloud Interconnect provides low latency, high availability connections that let you transfer data reliably between your on-premises bare metal machines and Google Cloud Virtual Private Cloud (VPC) networks. To learn more about Cloud Interconnect, see Dedicated Interconnect provisioning overview.

Cloud VPN securely connects your peer network to your Virtual Private Cloud (VPC) network through an IPsec VPN connection. To learn more about Cloud VPN, see Cloud VPN overview.

VPC Service Controls works with either Cloud Interconnect or Cloud VPN to provide additional security for your clusters. VPC Service Controls helps to mitigate the risk of data exfiltration. Using VPC Service Controls, you can add projects to service perimeters that protect resources and services from requests that originate outside the perimeter. To learn more about service perimeters, see Service perimeter details and configuration.

To fully protect GKE on Bare Metal, you need to use Restricted VIP and add the following APIs to the service perimeter:

  • Artifact Registry API (artifactregistry.googleapis.com)
  • Resource Manager API (cloudresourcemanager.googleapis.com)
  • Compute Engine API (compute.googleapis.com)
  • Connect gateway API (connectgateway.googleapis.com)
  • Google Container Registry API (containerregistry.googleapis.com)
  • GKE Connect API (gkeconnect.googleapis.com)
  • GKE Hub API (gkehub.googleapis.com)
  • GKE On-Prem API (gkeonprem.googleapis.com)
  • Identity and Access Management (IAM) API (iam.googleapis.com)
  • Cloud Logging API (logging.googleapis.com)
  • Cloud Monitoring API (monitoring.googleapis.com)
  • Config Monitoring for Ops API (opsconfigmonitoring.googleapis.com)
  • Service Control API (servicecontrol.googleapis.com)
  • Cloud Storage API (storage.googleapis.com)

When you use bmctl to create or upgrade a cluster, use the --skip-api-check flag to bypass calling Service Usage API (serviceusage.googleapis.com). Service Usage API isn't supported by VPC Service Controls.