Resource hierarchy and access control best practices

This document describes best practices for designing access control and boundaries between workloads on Google Distributed Cloud Hosted (GDCH). This document is intended for an architect or technical lead designing workloads on GDCH who wants to balance efficient resource usage, isolation of workloads, and ease of operations.

GDCH resource hierarchy overview

This section introduces key concepts of the resource hierarchy and how that hierarchy relates to access control in the GDCH platform. For more information on the GDCH resource hierarchy, see the Overview page.

GDCH implements Kubernetes Role-Based Access Control (RBAC) but also has a unique resource model. Some aspects of the resource hierarchy are relevant for any workload type, and some aspects like Kubernetes clusters are only required for Kubernetes-based workloads. If you are deploying only virtual machine (VM) based workloads, you can skip the sections related to nodes and cluster design.

Logical view of resources

This section introduces the GDCH resource hierarchy of organizations and projects and the logical relationship between resources. The purpose of the resource hierarchy is two-fold:

  • Provide a hierarchy of ownership, which binds the lifecycle of a resource to its immediate parent in the hierarchy.
  • Provide attach points and inheritance for access control policies.

The following diagram shows a logical view of the resource hierarchy.

GDCH Platform Instance

One instance of GDCH can contain multiple customer organizations. Each organization can contain multiple projects. Each project contains multiple resources such as VM-based workloads and container-based workloads. Each organization has separate physical and administrative boundaries and configures identity authentication separately.

Design access boundaries between resources

This section introduces Google recommended best practices for designing the hierarchy and segregation for workloads in GDCH using organizations, projects, and Kubernetes clusters. This guidance balances efficient resource usage, isolation of workloads, and ease of operations.

Design organizations for physical and logical isolation between customers

The Organization resource is the root for all the resources owned by a single customer. Granular access control between workloads within an organization can be defined through IAM allow policies and network policies. See Identity and access management for more information.

Each organization within a GDCH instance provides physical isolation for compute infrastructure, and logical isolation for networking, storage, and other services. Users in one organization have no access to resources in another organization unless explicitly granted access. Network connectivity from one organization to another is not allowed by default, unless explicitly configured to allow data transfer out from one organization and data transfer in to another.

Define the scope of workloads that can share an organization

The scope of an organization in your company context might vary based on how your company defines trust boundaries. Some companies might prefer to create multiple organization resources for different entities in the company. For example, each company department might be an independent customer of GDCH with an independent organization if the departments require complete physical and administrative separation of their workloads.

In general, we recommend that you group multiple workloads into a single organization based on the following signals:

  • Workloads can share dependencies. For example, this might be a shared data source, connectivity between workloads, or a shared monitoring tool.
  • Workloads can share an administrative root of trust. The same Platform Administrator (PA) can be trusted with privileged access over all workloads in the organization.
  • Workloads are allowed to share underlying physical infrastructure with other workloads in the same organization, as long as sufficient logical segregation is in place.
  • The same budget holder has accountability for workload budgets in aggregate. For details on viewing aggregate costs for the organization or granular analysis per workload, see the Billing page.
  • Workload availability requirements can be met in a single site GDCH instance. When you require high availability that spans geographic distance, you must configure organizations across multiple GDCH instances and external network connectivity.

Design projects for logical isolation between workloads

Within an organization, we recommend provisioning multiple projects to create a logical separation between resources. Projects in the same organization might share the underlying physical infrastructure, but projects are used to separate workloads with a logical boundary based on Identity and Access Management (IAM) policies and network policies.

When designing project boundaries, think of the largest set of functionality that can be shared by resources, such as IAM allow policies, network policies, or observability requirements. Group the resources that can share this functionality into a project, and move resources that cannot share this functionality to another project.

In Kubernetes terms, a project is a Kubernetes namespace that is reserved across all clusters in an organization. Though a namespace is reserved across multiple clusters, that does not mean a pod is automatically scheduled across all clusters. A pod scheduled to a particular cluster remains scheduled to that particular cluster.

The following diagram shows how an IAM allow policy is applied to a project that spans multiple clusters.

GDCH RBAC policy

IAM allow policies are set at the project level to define who can do what to which resource type. User workloads like VMs or pods are deployed into a project and access to these workloads is governed by the IAM allow policy. The IAM allow policy applies consistently to VM-based workloads and container-based workloads, regardless of which cluster they are deployed on.

The following diagram shows how network policies manage access between projects. Inter-project communication between the Backend Project, Frontend Project, and Database Project is disabled. However, resources within each project can communicate with each other.

Network policies are set at the project level to selectively allow network access between resources. By default, all resources within a single project are allowed to communicate with each other on the internal network, and a resource in one project cannot communicate with a resource in another project. This behavior for network policies applies whether resources are deployed to the same cluster or not.

You can also define a ProjectNetworkPolicy custom resource to enable inter-project communication. This policy is defined for each project to allow ingress traffic from other projects. The following diagram illustrates a ProjectNetworkPolicy custom resource defined for the Backend Project to enable data transfer in from the Frontend Project and Database Project.

GDCH project level policy

Additionally, the observability stack collects metrics across the organization, but you can filter and query at various levels of the resource hierarchy. As needed for your operations, you can query metrics per cluster, per namespace, and so on.

Create projects per deployment environment

For each workload, we recommend creating separate projects for production, development, and any other deployment environments you require. Separating your production and development deployment environments enables you to define IAM allow policies and network policies granularly, so that changes made in a project used for development do not impact the production environment.

Grant resource-level IAM allow policies within projects

Depending on your team structure and requirements, you might allow Application Operators (AO) to modify any resource within the project they manage, or you might require more granular access control. Within a project, you grant granular IAM allow policies to allow individual developers access to some but not all resources in the project. For example, a team might have a database administrator who must manage the database but not modify other code, while the software developers on the team must not have permission to modify the database.

Design clusters for logical isolation of Kubernetes operations

A Kubernetes cluster is not a hard tenant boundary because IAM allow policies and network policies apply to projects, not Kubernetes clusters. Kubernetes clusters and projects have a many-to-many relationship. You might have multiple Kubernetes clusters in a single project, or a single Kubernetes cluster that spans multiple projects.

Best practices for designing Kubernetes clusters

To deploy container-based workloads, you must first create a Kubernetes cluster. Kubernetes clusters are not required if you plan to only have VM-based workloads.

This section introduces best practices for designing Kubernetes clusters:

Create separate clusters per deployment environment

In addition to separate projects per deployment environment, we recommend that you design separate Kubernetes clusters per deployment environment. By segregating both the Kubernetes cluster and project per environment, you isolate resource consumption, access policies, maintenance events, and cluster-level configuration changes between your production and non-production workloads.

The following diagram shows a sample Kubernetes cluster design for multiple workloads that span projects, clusters, deployment environments, and machine classes.

GDCH configuration

This sample architecture assumes that workloads within a deployment environment are allowed to share clusters. Each deployment environment has a separate set of Kubernetes clusters. You then assign projects to the Kubernetes cluster of the appropriate deployment environment. A Kubernetes cluster might be further subdivided into multiple node pools for different machine class requirements.

Alternatively, designing multiple Kubernetes clusters is useful for container operations like the following scenarios:

  • You have some workloads pinned to a specific Kubernetes version, so you maintain different clusters at different versions.
  • You have some workloads that require different GDCH Kubernetes cluster configurations, such as the backup policy, so you create multiple clusters with different configurations.
  • You run copies of a cluster in parallel to facilitate disruptive version upgrades or a blue-green deployment strategy.
  • You build an experimental workload that risks throttling the API server or other single point of failures within a cluster, so you isolate it from existing workloads.

The following diagram shows an example where multiple clusters are configured per deployment environment due to requirements such as the container operations described in the previous section.

GDCH configuration

Create fewer clusters

For efficient resource utilization, we recommend designing the fewest number of Kubernetes clusters that meet your requirements for segregating deployment environments and container operations. Each additional cluster incurs additional overhead resource consumption, such as additional control plane nodes required. Therefore, a larger cluster with many workloads utilizes underlying compute resources more efficiently than many small clusters.

If users are responsible for choosing where to deploy between multiple clusters of similar configuration, it creates additional complexity for users to monitor cluster capacity and plan for cross-cluster dependencies.

If a cluster is approaching capacity, we recommend that you add additional nodes to a cluster instead of creating a new cluster.

Create fewer node pools within a cluster

For efficient resource utilization, we recommend designing fewer, larger node pools within a Kubernetes cluster.

Configuring multiple node pools is useful when you need to schedule pods that require a different machine class or different operating system (OS) image than others. Create a node pool for each combination of machine class and OS image your workloads require, and set the node capacity to autoscaling to allow for efficient usage of compute resources.

Where to deploy user workloads

On the GDCH platform, operations to deploy VM-based workloads and operations to deploy container-based workloads are different. This section introduces the differences and where you deploy each resource.

Connect to the org admin cluster for VM-based workloads

To deploy VM-based workloads, connect to the org admin cluster.

Projects containing only VM-based workloads do not require a Kubernetes cluster. Therefore, you do not need to provision Kubernetes clusters for these workloads.

Connect and deploy to the cluster for container-based workloads

For container-based workloads, pods are deployed to a single Kubernetes cluster. You are responsible for creating Kubernetes clusters, and for assigning Kubernetes clusters to a project. We recommend only allocating clusters to projects in the appropriate deployment environment. For example, a cluster for production workloads is assigned to a project for production workloads.

For pod scheduling within a Kubernetes cluster, GDCH adopts the general Kubernetes concepts of scheduling, preemption, and eviction. Best practices on scheduling pods within a cluster vary based on the requirements of your workload.

Identity and access management

Configure an identity provider per organization

As an Operator, you configure one or more identity providers per organization.

You might have a scenario where your company has multiple departments with separate organizations, and each organization connects to the same identity provider for authentication. In that case, it is your responsibility to understand and audit the combination of privileges a user has across organizations. Ensure that a user with privileges in multiple organizations does not violate the requirements for separating workloads into distinct organizations.

Alternatively, you might have a scenario where different sets of users use different identity providers to authenticate within a single organization, such as when multiple vendor teams work together in a single organization. Consider whether consolidating user identities into a single identity provider or maintaining separate identity providers works best with your company's approach to identity management.

Configure multi-factor authentication for your identity provider

GDCH relies on your Identity Platform for Authentication, including additional security settings such as multi-factor authentication. It is good practice to configure multi-factor authentication with a physical key for any user that might potentially access sensitive resources.

Restrict managed services and marketplace services

You might prefer to block some projects from certain services, to either limit the potential attack surface in a project or avoid use of unapproved services. By default, managed services like Artificial Intelligence and Machine Learning are available to use in any project. In comparison to managed services, marketplace services must first be enabled for the organization.

To deny service access from projects, apply Gatekeeper constraints against the custom resource definition of a service and a list of namespaces. The approach to deny access with Gatekeeper applies to managed and marketplace services.

Managing kubeconfig files for multiple clusters

Different operational tasks require a connection to different clusters. For example, you perform tasks like binding an IAM role to a project on the org admin cluster, and tasks like deploying a Kubernetes Pod resource - on a Kubernetes cluster.

When using the user interface (UI), you do not need to be aware of which underlying cluster performs a task, as the UI abstracts away the low-level operations like connecting to a cluster.

However, when working with the gdcloud CLI or kubectl CLI, a single user might have multiple kubeconfig files for accomplishing their tasks. Ensure that you sign in using kubeconfig credentials for the appropriate cluster for your task.

Best practices for Kubernetes service accounts

For Kubernetes service accounts, authorization is based on a secret token. To mitigate the risk of service account tokens, consider the following best practices:

  • Avoid downloading persistent service account credentials for use outside of GDCH.
  • Be aware of Kubernetes escalation paths for users or service accounts who have the ability to create and edit pods.
  • Set the expirationSeconds field to a short time period for the service account token projection of your workloads.
  • Regularly rotate service account credentials.

Consider principles of least privilege when granting IAM allow policies

Consider the principle of least privilege (PoLP) when granting IAM allow policies to users. In accordance with PoLP, you should give a subject only privileges required to complete its task.

For example, you grant the Project IAM Admin role within a single project to a user, so that this user delegates authority to grant roles within that project. This user then grants granular roles to other developers in the project based on the specific services they use. The Project IAM Admin role must be restricted to a trusted lead because this role could be used to escalate privilege, granting oneself or others additional roles in the project.

Audit regularly for excessive privilege

Be sure to review roles granted within your organization and audit against excessive privilege. You must ensure that the roles granted are necessary for an individual user to complete their job, and that combinations of roles across projects do not lead to an escalation or exfiltration risk.

If your company uses multiple organizations, we do not recommend that an individual user has highly privileged roles across multiple organizations, as this might violate the reason for segregating organizations in the first place.