Enterprise foundations blueprint

Last reviewed 2023-12-20 UTC

This content was last updated in December 2023, and represents the status quo as of the time it was written. Google's security policies and systems may change going forward, as we continually improve protection for our customers.

This document describes the best practices that let you deploy a foundational set of resources in Google Cloud. A cloud foundation is the baseline of resources, configurations, and capabilities that enable companies to adopt Google Cloud for their business needs. A well-designed foundation enables consistent governance, security controls, scale, visibility, and access to shared services across all workloads in your Google Cloud environment. After you deploy the controls and governance that are described in this document, you can deploy workloads to Google Cloud.

The enterprise foundations blueprint (formerly known as the security foundations blueprint) is intended for architects, security practitioners, and platform engineering teams who are responsible for designing an enterprise-ready environment on Google Cloud. This blueprint consists of the following:

You can use this guide in one of two ways:

  • To create a complete foundation based on Google's best practices. You can deploy all the recommendations from this guide as a starting point, and then customize the environment to address your business' specific requirements.
  • To review an existing environment on Google Cloud. You can compare specific components of your design against Google-recommended best practices.

Supported use cases

The enterprise foundation blueprint provides a baseline layer of resources and configurations that help enable all types of workloads on Google Cloud. Whether you're migrating existing compute workloads to Google Cloud, building containerized web applications, or creating big data and machine learning workloads, the enterprise foundation blueprint helps you build your environment to support enterprise workloads at scale.

After you deploy the enterprise foundation blueprint, you can deploy workloads directly or deploy additional blueprints to support complex workloads that require additional capabilities.

A defense-in-depth security model

Google Cloud services benefit from the underlying Google infrastructure security design. It is your responsibility to design security into the systems that you build on top of Google Cloud. The enterprise foundation blueprint helps you to implement a defense-in-depth security model for your Google Cloud services and workloads.

The following diagram shows a defense-in-depth security model for your Google Cloud organization that combines architecture controls, policy controls, and detective controls.

The defense-in-depth security model.

The diagram describes the following controls:

  • Policy controls are programmatic constraints that enforce acceptable resource configurations and prevent risky configurations. The blueprint uses a combination of policy controls including infrastructure-as-code (IaC) validation in your pipeline and organization policy constraints.
  • Architecture controls are the configuration of Google Cloud resources like networks and resource hierarchy. The blueprint architecture is based on security best practices.
  • Detective controls let you detect anomalous or malicious behavior within the organization. The blueprint uses platform features such as Security Command Center, integrates with your existing detective controls and workflows such as a security operations center (SOC), and provides capabilities to enforce custom detective controls.

Key decisions

This section summarizes the high-level architectural decisions of the blueprint.

Key Google Cloud services in the blueprint.

The diagram describes how Google Cloud services contribute to key architectural decisions:

  • Cloud Build: Infrastructure resources are managed using a GitOps model. Declarative IaC is written in Terraform and managed in a version control system for review and approval, and resources are deployed using Cloud Build as the continuous integration and continuous deployment (CI/CD) automation tool. The pipeline also enforces policy-as-code checks to validate that resources meet expected configurations before deployment.
  • Cloud Identity: Users and group membership are synchronized from your existing identity provider. Controls for user account lifecycle management and single sign-on (SSO) rely on the existing controls and processes of your identity provider.
  • Identity and Access Management (IAM): Allow policies (formerly known as IAM policies) allow access to resources and are applied to groups based on job function. Users are added to the appropriate groups to receive view-only access to foundation resources. All changes to foundation resources are deployed through the CI/CD pipeline which uses privileged service account identities.
  • Resource Manager: All resources are managed under a single organization, with a resource hierarchy of folders that organizes projects by environments. Projects are labeled with metadata for governance including cost attribution.
  • Networking: Network topologies use Shared VPC to provide network resources for workloads across multiple regions and zones, separated by environment, and managed centrally. All network paths between on-premises hosts, Google Cloud resources in the VPC networks, and Google Cloud services are private. No outbound traffic to or inbound traffic from the public internet is permitted by default.
  • Cloud Logging: Aggregated log sinks are configured to collect logs relevant for security and auditing into a centralized project for long-term retention, analysis, and export to external systems.
  • Cloud Monitoring: Monitoring scoping projects are configured to view application performance metrics across multiple projects in one place.
  • Organization Policy Service: Organization policy constraints are configured to prevent various high-risk configurations.
  • Secret Manager: Centralized projects are created for a team responsible for managing and auditing the use of sensitive application secrets to help meet compliance requirements.
  • Cloud Key Management Service (Cloud KMS): Centralized projects are created for a team responsible for managing and auditing encryption keys to help meet compliance requirements.
  • Security Command Center: Threat detection and monitoring capabilities are provided using a combination of built-in security controls from Security Command Center and custom solutions that let you detect and respond to security events.

For alternatives to these key decisions, see alternatives.

What's next

Authentication and authorization

This section introduces how to use Cloud Identity to manage the identities that your employees use to access Google Cloud services.

External identity provider as the source of truth

We recommend federating your Cloud Identity account with your existing identity provider. Federation helps you ensure that your existing account management processes apply to Google Cloud and other Google services.

If you don't have an existing identity provider, you can create user accounts directly in Cloud Identity.

The following diagram shows a high-level view of identity federation and single sign-on (SSO). It uses Microsoft Active Directory, located in the on-premises environment, as the example identity provider.

External identity provider federation.

This diagram describes the following best practices:

  • User identities are managed in an Active Directory domain that is located in the on-premises environment and federated to Cloud Identity. Active Directory uses Google Cloud Directory Sync to provision identities to Cloud Identity.
  • Users attempting to sign in to Google services are redirected to the external identity provider for single sign-on with SAML, using their existing credentials to authenticate. No passwords are synchronized with Cloud Identity.

The following table provides links to setup guidance for identity providers.

Identity provider Guidance
Active Directory
Microsoft Entra ID (formerly Azure AD)
Other external identity providers (for example, Ping or Okta)

We strongly recommend that you enforce multi-factor authentication at your identity provider with a phishing-resistant mechanism such as a Titan Security Key.

The recommended settings for Cloud Identity aren't automated through the Terraform code in this blueprint. See administrative controls for Cloud Identity for the recommended security settings that you must configure in addition to deploying the Terraform code.

Groups for access control

A principal is an identity that can be granted access to a resource. Principals include Google Accounts for users, Google groups, Google Workspace accounts, Cloud Identity domains, and service accounts. Some services also let you grant access to all users who authenticate with a Google Account, or to all users on the internet. For a principal to interact with Google Cloud services, you must grant them roles in Identity and Access Management (IAM).

To manage IAM roles at scale, we recommend that you assign users to groups based on their job functions and access requirements, then grant IAM roles to those groups. You should add users to groups using the processes in your existing identity provider for group creation and membership.

We don't recommend granting IAM roles to individual users because individual assignments can increase the complexity of managing and auditing roles.

The blueprint configures groups and roles for view-only access to foundation resources. We recommend that you deploy all resources in the blueprint through the foundation pipeline, and that you don't grant roles to users to groups to modify foundation resources outside of the pipeline.

The following table shows the groups that are configured by the blueprint for viewing foundation resources.

Name Description Roles Scope
grp-gcp-org-admin@example.com Highly privileged administrators who can grant IAM roles at the organization level. They can access any other role. This privilege is not recommended for daily use. Organization Administrator organization
grp-gcp-billing-admin@example.com Highly privileged administrators who can modify the Cloud Billing account. This privilege is not recommended for daily use. Billing Account Admin organization
grp-gcp-billing-viewer@example.com The team who is responsible for viewing and analyzing the spending across all projects. Billing Account Viewer organization
BigQuery User billing project
grp-gcp-audit-viewer@example.com The team who is responsible for auditing security-related logs.

Logs Viewer

BigQuery User

logging project
grp-gcp-monitoring-users@example.com The team who is responsible for monitoring application performance metrics. Monitoring Viewer monitoring project
grp-gcp-security-reviewer@example.com The team who is responsible for reviewing cloud security. Security Reviewer organization
grp-gcp-network-viewer@example.com The team who is responsible for viewing and maintaining network configurations. Compute Network Viewer organization
grp-gcp-scc-admin@example.com The team who is responsible for configuring Security Command Center. Security Center Admin Editor organization
grp-gcp-secrets-admin@example.com The team who is responsible for managing, storing, and auditing credentials and other secrets that are used by applications. Secret Manager Admin secrets projects
grp-gcp-kms-admin@example.com The team who is responsible for enforcing encryption key management to meet compliance requirements. Cloud KMS Viewer kms projects

As you build your own workloads on top of the foundation, you create additional groups and grant IAM roles that are based on the access requirements for each workload.

We strongly recommend that you avoid basic roles (such as Owner, Editor, or Viewer) and use predefined roles instead. Basic roles are overly permissive and a potential security risk. Owner and Editor roles can lead to privilege escalation and lateral movement, and the Viewer role includes access to read all data. For best practices on IAM roles, see Use IAM securely.

Super admin accounts

Cloud Identity users with the super admin account bypass the organization's SSO settings and authenticate directly to Cloud Identity. This exception is by design, so that the super admin can still access the Cloud Identity console in the event of an SSO misconfiguration or outage. However, it means you must consider additional protection for super admin accounts.

To protect your super admin accounts, we recommend that you always enforce 2-step verification with security keys in Cloud Identity. For more information, see Security best practices for administrator accounts.

Issues with consumer user accounts

If you didn't use Cloud Identity or Google Workspace before you onboarded to Google Cloud, it's possible that your organization's employees are already using consumer accounts that are associated with their corporate email identities to access other Google services such as Google Marketing Platform or YouTube. Consumer accounts are accounts that are fully owned and managed by the individuals who created them. Because those accounts aren't under your organization's control and might include both personal and corporate data, you must decide how to consolidate these accounts with other corporate accounts.

We recommend that you consolidate existing consumer user accounts as part of onboarding to Google Cloud. If you aren't using Google Workspace for all your user accounts already, we recommend blocking the creation of new consumer accounts.

Administrative controls for Cloud Identity

Cloud Identity has various administrative controls that are not automated by Terraform code in the blueprint. We recommend that you enforce each of these best practice security controls early in the process of building your foundation.

Control Description
Deploy 2-step verification

User accounts might be compromised through phishing, social engineering, password spraying, or various other threats. 2-step verification helps mitigate these threats.

We recommend that you enforce 2-step verification for all user accounts in your organization with a phishing-resistant mechanism such as Titan Security Keys or other keys that are based on the phishing-resistant FIDO U2F (CTAP1) standards.

Set session length for Google Cloud services Persistent OAuth tokens on developer workstations can be a security risk if exposed. We recommend that you set a reauthentication policy to require authentication every 16 hours using a security key.
Set session length for Google Services (Google Workspace customers only)

Persistent web sessions across other Google services can be a security risk if exposed. We recommend that you enforce a maximum web session length and align this with session length controls in your SSO provider.

Share data from Cloud Identity with Google Cloud services

Admin Activity audit logs from Google Workspace or Cloud Identity are ordinarily managed and viewed in the Admin Console, separately from your logs in your Google Cloud environment. These logs contain information that is relevant for your Google Cloud environment, such as user login events.

We recommend that you share Cloud Identity audit logs to your Google Cloud environment to centrally manage logs from all sources.

Set up post SSO verification

The blueprint assumes that you set up SSO with your external identity provider.

We recommend that you enable an additional layer of control based on Google's sign-in risk analysis. After you apply this setting, users might see additional risk-based login challenges at sign-in if Google deems that a user sign-in is suspicious.

Remediate issues with consumer user accounts

Users with a valid email address at your domain but no Google Account can sign up for unmanaged consumer accounts. These accounts might contain corporate data, but are not controlled by your account lifecycle management processes.

We recommend that you take steps to ensure that all user accounts are managed accounts.

Disable account recovery for super admin accounts

Super admin account self-recovery is off by default for all new customers (existing customers might have this setting on). Turning this setting off helps to mitigate the risk that a compromised phone, compromised email, or social engineering attack could let an attacker gain super admin privileges over your environment.

Plan an internal process for a super admin to contact another super admin in your organization if they have lost access to their account, and ensure that all super admins are familiar with the process for support-assisted recovery.

Enforce and monitor password requirements for users In most cases, user passwords are managed through your external identity provider, but super admin accounts bypass SSO and must use a password to sign in to Cloud Identity. Disable password reuse and monitor password strength for any users who use a password to log in to Cloud Identity, particularly super admin accounts.
Set organization-wide policies for using groups

By default, external user accounts can be added to groups in Cloud Identity. We recommend that you configure sharing settings so that group owners can't add external members.

Note that this restriction doesn't apply to the super admin account or other delegated administrators with Groups admin permissions. Because federation from your identity provider runs with administrator privileges, the group sharing settings don't apply to this group synchronization. We recommend that you review controls in the identity provider and synchronization mechanism to ensure that non-domain members aren't added to groups, or that you apply group restrictions.

What's next

Organization structure

The root node for managing resources in Google Cloud is the organization. The Google Cloud organization provides a resource hierarchy that provides an ownership structure for resources and attachment points for organization policies and access controls. The resource hierarchy consists of folders, projects, and resources, and it defines the structure and use of Google Cloud services within an organization.

Resources lower in the hierarchy inherit policies such as IAM allow policies and organization policies. All access permissions are denied by default, until you apply allow policies directly to a resource or the resource inherits the allow policies from a higher level in the resource hierarchy.

The following diagram shows the folders and projects that are deployed by the blueprint.

The example.com organization structure.

The following sections describe the folders and projects in the diagram.

Folders

The blueprint uses folders to group projects based on their environment. This logical grouping is used to apply configurations like allow policies and organization policies at the folder level and then all resources within the folder inherit the policies. The following table describes the folders that are part of the blueprint.

Folder Description
bootstrap Contains the projects that are used to deploy foundation components.
common Contains projects with resources that are shared by all environments.
production Contains projects with production resources.
nonproduction Contains a copy of the production environment to let you test workloads before you promote them to production.
development Contains the cloud resources that are used for development.
networking Contains the networking resources that are shared by all environments.

Projects

The blueprint uses projects to group individual resources based on their functionality and intended boundaries for access control. This following table describes the projects that are included in the blueprint.

Folder Project Description
bootstrap prj-b-cicd Contains the deployment pipeline that's used to build out the foundation components of the organization. For more information, see deployment methodology.
prj-b-seed Contains the Terraform state of your infrastructure and the Terraform service account that is required to run the pipeline. For more information, see deployment methodology.
common prj-c-secrets Contains organization-level secrets. For more information, see store application credentials with Secret Manager.
prj-c-logging Contains the aggregated log sources for audit logs. For more information, see centralized logging for security and audit.
prj-c-scc Contains resources to help configure Security Command Center alerting and other custom security monitoring. For more information, see threat monitoring with Security Command Center.
prj-c-billing-logs Contains a BigQuery dataset with the organization's billing exports. For more information, see allocate costs between internal cost centers.
prj-c-infra-pipeline Contains an infrastructure pipeline for deploying resources like VMs and databases to be used by workloads. For more information, see pipeline layers.
prj-c-kms Contains organization-level encryption keys. For more information, see manage encryption keys.
networking prj-net-{env}-shared-base Contains the host project for a Shared VPC network for workloads that don't require VPC Service Controls. For more information, see network topology.
prj-net-{env}-shared-restricted Contains the host project for a Shared VPC network for workloads that do require VPC Service Controls. For more information, see network topology.
prj-net-interconnect Contains the Cloud Interconnect connections that provide connectivity between your on-premises environment and Google Cloud. For more information, see hybrid connectivity.
prj-net-dns-hub Contains resources for a central point of communication between your on-premises DNS system and Cloud DNS. For more information, see centralized DNS setup.
environment folders (production, non-production, and development) prj-{env}-monitoring Contains a scoping project to aggregate metrics from projects in that environment. For more information, see alerting on log-based metrics and performance metrics
prj-{env}-secrets Contains folder-level secrets. For more information, see store and audit application credentials with Secret Manager.
prj-{env}-kms Contains folder-level encryption keys. For more information, see manage encryption keys.
application projects Contains various projects in which you create resources for applications. For more information, see project deployment patterns and pipeline layers.

Governance for resource ownership

We recommend that you apply labels consistently to your projects to assist with governance and cost allocation. The following table describes the project labels that are added to each project for governance in the blueprint.

Label Description
application The human-readable name of the application or workload that is associated with the project.
businesscode A short code that describes which business unit owns the project. The code shared is used for common projects that are not explicitly tied to a business unit.
billingcode A code that's used to provide chargeback information.
primarycontact The username of the primary contact that is responsible for the project. Because project labels can't include special characters such as the ampersand (@), it is set to the username without the @example.com suffix.
secondarycontact The username of the secondary secondary contact that is responsible for the project. Because project labels can't include special characters such as @, set only the username without the @example.com suffix.
environment A value that identifies the type of environment, such as bootstrap, common, production, non-production,development, or network.
envcode A value that identifies the type of environment, shortened to b, c, p, n, d, or net.
vpc The ID of the VPC network that this project is expected to use.

Google might occasionally send important notifications such as account suspensions or updates to product terms. The blueprint uses Essential Contacts to send those notifications to the groups that you configure during deployment. Essential Contacts is configured at the organization node and inherited by all projects in the organization. We recommend that you review these groups and ensure that emails are monitored reliably.

Essential Contacts is used for a different purpose than the primarycontact and secondarycontact fields that are configured in project labels. The contacts in project labels are intended for internal governance. For example, if you identify non-compliant resources in a workload project and need to contact the owners, you could use the primarycontact field to find the person or team responsible for that workload.

What's next

  • Read about networking (next document in this series).

Networking

Networking is required for resources to communicate within your Google Cloud organization and between your cloud environment and on-premises environment. This section describes the structure in the blueprint for VPC networks, IP address space, DNS, firewall policies, and connectivity to the on-premises environment.

Network topology

The blueprint repository provides the following options for your network topology:

  • Use separate Shared VPC networks for each environment, with no network traffic directly allowed between environments.
  • Use a hub-and-spoke model that adds a hub network to connect each environment in Google Cloud, with the network traffic between environments gated by a network virtual appliance (NVA).

Choose the dual Shared VPC network topology when you don't want direct network connectivity between environments. Choose the hub-and-spoke network topology when you want to allow network connectivity between environments that is filtered by an NVA such as when you rely on existing tools that require a direct network path to every server in your environment.

Both topologies use Shared VPC as a principal networking construct because Shared VPC allows a clear separation of responsibilities. Network administrators manage network resources in a centralized host project, and workload teams deploy their own application resources and consume the network resources in service projects that are attached to the host project.

Both topologies include a base and restricted version of each VPC network. The base VPC network is used for resources that contain non-sensitive data, and the restricted VPC network is used for resources with sensitive data that require VPC Service Controls. For more information on implementing VPC Service Controls, see Protect your resources with VPC Service Controls.

Dual Shared VPC network topology

If you require network isolation between your development, non-production, and production networks on Google Cloud, we recommend the dual Shared VPC network topology. This topology uses separate Shared VPC networks for each environment, with each environment additionally split between a base Shared VPC network and a restricted Shared VPC network.

The following diagram shows the dual Shared VPC network topology.

The blueprint VPC network.

The diagram describes these key concepts of the dual Shared VPC topology:

  • Each environment (production, non-production, and development) has one Shared VPC network for the base network and one Shared VPC network for the restricted network. This diagram shows only the production environment, but the same pattern is repeated for each environment.
  • Each Shared VPC network has two subnets, with each subnet in a different region.
  • Connectivity with on-premises resources is enabled through four VLAN attachments to the Dedicated Interconnect instance for each Shared VPC network, using four Cloud Router services (two in each region for redundancy). For more information, see Hybrid connectivity between on-premises environment and Google Cloud.

By design, this topology doesn't allow network traffic to flow directly between environments. If you do require network traffic to flow directly between environments, you must take additional steps to allow this network path. For example, you might configure Private Service Connect endpoints to expose a service from one VPC network to another VPC network. Alternatively, you might configure your on-premises network to let traffic flow from one Google Cloud environment to the on-premises environment and then to another Google Cloud environment.

Hub-and-spoke network topology

If you deploy resources in Google Cloud that require a direct network path to resources in multiple environments, we recommend the hub-and-spoke network topology.

The hub-and-spoke topology uses several of the concepts that are part of the dual Shared VPC topology, but modifies the topology to add a hub network. The following diagram shows the hub-and-spoke topology.

The example.com VPC network structure when using hub-and-spoke
connectivity based on VPC peering

The diagram describes these key concepts of hub-and-spoke network topology:

  • This model adds a hub network, and each of the development, non-production, and production networks (spokes) are connected to the hub network through VPC Network Peering. Alternatively, if you anticipate exceeding the quota limit, you can use an HA VPN gateway instead.
  • Connectivity to on-premises networks is allowed only through the hub network. All spoke networks can communicate with shared resources in the hub network and use this path to connect to on-premises networks.
  • The hub networks include an NVA for each region, deployed redundantly behind internal Network Load Balancer instances. This NVA serves as the gateway to allow or deny traffic to communicate between spoke networks.
  • The hub network also hosts tooling that requires connectivity to all other networks. For example, you might deploy tools on VM instances for configuration management to the common environment.
  • The hub-and-spoke model is duplicated for a base version and restricted version of each network.

To enable spoke-to-spoke traffic, the blueprint deploys NVAs on the hub Shared VPC network that act as gateways between networks. Routes are exchanged from hub-to-spoke VPC networks through custom routes exchange. In this scenario, connectivity between spokes must be routed through the NVA because VPC Network Peering is non-transitive, and therefore, spoke VPC networks can't exchange data with each other directly. You must configure the virtual appliances to selectively allow traffic between spokes.

For more information on using NVAs to control traffic between spokes, see centralized network appliances on Google Cloud.

Project deployment patterns

When creating new projects for workloads, you must decide how resources in this project connect to your existing network. The following table describes the patterns for deploying projects that are used in the blueprint.

Pattern Description Example usage
Shared base projects

These projects are configured as service projects to a base Shared VPC host project.

Use this pattern when resources in your project have the following criteria:

  • Require network connectivity to the on-premises environment or resources in the same Shared VPC topology.
  • Require a network path to the Google services that are contained on the private virtual IP address.
  • Don't require VPC Service Controls.
example_base_shared_vpc_project.tf
Shared restricted projects

These projects are configured as service projects to a restricted Shared VPC host project.

Use this pattern when resources in your project have the following criteria:

  • Require network connectivity to the on-premises environment or resources in the same Shared VPC topology.
  • Require a network path to the Google services contained on the restricted virtual IP address.
  • Require VPC Service Controls.
example_restricted_shared_vpc_project.tf
Floating projects

Floating projects are not connected to other VPC networks in your topology.

Use this pattern when resources in your project have the following criteria:

  • Don't require full mesh connectivity to an on-premises environment or resources in the Shared VPC topology.
  • Don't require a VPC network, or you want to manage the VPC network for this project independently of your main VPC network topology (such as when you want to use an IP address range that clashes with the ranges already in use).

You might have a scenario where you want to keep the VPC network of a floating project separate from the main VPC network topology but also want to expose a limited number of endpoints between networks. In this case, publish services by using Private Service Connect to share network access to an individual endpoint across VPC networks without exposing the entire network.

example_floating_project.tf
Peering projects

Peering projects create their own VPC networks and peer to other VPC networks in your topology.

Use this pattern when resources in your project have the following criteria:

  • Require network connectivity in the directly peered VPC network, but don't require transitive connectivity to an on-premises environment or other VPC networks.
  • Must manage the VPC network for this project independently of your main network topology.

If you create peering projects, it's your responsibility to allocate non-conflicting IP address ranges and plan for peering group quota.

example_peering_project.tf

IP address allocation

This section introduces how the blueprint architecture allocates IP address ranges. You might need to change the specific IP address ranges used based on the IP address availability in your existing hybrid environment.

The following table provides a breakdown of the IP address space that's allocated for the blueprint. The hub environment only applies in the hub-and-spoke topology.

Purpose VPC type Region Hub environment Development environment Non-production environment Production environment
Primary subnet ranges Base Region 1 10.0.0.0/18 10.0.64.0/18 10.0.128.0/18 10.0.192.0/18
Region 2 10.1.0.0/18 10.1.64.0/18 10.1.128.0/18 10.1.192.0/18
Unallocated 10.{2-7}.0.0/18 10.{2-7}.64.0/18 10.{2-7}.128.0/18 10.{2-7}.192.0/18
Restricted Region 1 10.8.0.0/18 10.8.64.0/18 10.8.128.0/18 10.8.192.0/18
Region 2 10.9.0.0/18 10.9.64.0/18 10.9.128.0/18 10.9.192.0/18
Unallocated 10.{10-15}.0.0/18 10.{10-15}.64.0/18 10.{10-15}.128.0/18 10.{10-15}.192.0/18
Private services access Base Global 10.16.0.0/21 10.16.8.0/21 10.16.16.0/21 10.16.24.0/21
Restricted Global 10.16.32.0/21 10.16.40.0/21 10.16.48.0/21 10.16.56.0/21
Private Service Connect endpoints Base Global 10.17.0.1/32 10.17.0.2/32 10.17.0.3/32 10.17.0.4/32
Restricted Global 10.17.0.5/32 10.17.0.6/32 10.17.0.7/32 10.17.0.8/32
Proxy-only subnets Base Region 1 10.18.0.0/23 10.18.2.0/23 10.18.4.0/23 10.18.6.0/23
Region 2 10.19.0.0/23 10.19.2.0/23 10.19.4.0/23 10.19.6.0/23
Unallocated 10.{20-25}.0.0/23 10.{20-25}.2.0/23 10.{20-25}.4.0/23 10.{20-25}.6.0/23
Restricted Region 1 10.26.0.0/23 10.26.2.0/23 10.26.4.0/23 10.26.6.0/23
Region 2 10.27.0.0/23 10.27.2.0/23 10.27.4.0/23 10.27.6.0/23
Unallocated 10.{28-33}.0.0/23 10.{28-33}.2.0/23 10.{28-33}.4.0/23 10.{28-33}.6.0/23
Secondary subnet ranges Base Region 1 100.64.0.0/18 100.64.64.0/18 100.64.128.0/18 100.64.192.0/18
Region 2 100.65.0.0/18 100.65.64.0/18 100.65.128.0/18 100.65.192.0/18
Unallocated 100.{66-71}.0.0/18 100.{66-71}.64.0/18 100.{66-71}.128.0/18 100.{66-71}.192.0/18
Restricted Region 1 100.72.0.0/18 100.72.64.0/18 100.72.128.0/18 100.72.192.0/18
Region 2 100.73.0.0/18 100.73.64.0/18 100.73.128.0/18 100.73.192.0/18
Unallocated 100.{74-79}.0.0/18 100.{74-79}.64.0/18 100.{74-79}.128.0/18 100.{74-79}.192.0/18

The preceding table demonstrates these concepts for allocating IP address ranges:

  • IP address allocation is subdivided into ranges for each combination of base Shared VPC, restricted Shared VPC, region, and environment.
  • Some resources are global and don't require subdivisions for each region.
  • By default, for regional resources, the blueprint deploys in two regions. In addition, there are unused IP address ranges so that you can can expand into six additional regions.
  • The hub network is only used in the hub-and-spoke network topology, while the development, non-production, and production environments are used in both network topologies.

The following table introduces how each type of IP address range is used.

Purpose Description
Primary subnet ranges Resources that you deploy to your VPC network, such as virtual machine instances, use internal IP addresses from these ranges.
Private services access Some Google Cloud services such as Cloud SQL require you to preallocate a subnet range for private services access. The blueprint reserves a /21 range globally for each of the Shared VPC networks to allocate IP addresses for services that require private services access. When you create a service that depends on private services access, you allocate a regional /24 subnet from the reserved /21 range.
Private Service Connect The blueprint provisions each VPC network with a Private Service Connect endpoint to communicate with Google Cloud APIs. This endpoint lets your resources in the VPC network reach Google Cloud APIs without relying on outbound traffic to the internet or publicly advertised internet ranges.
Proxy-based load balancers Some types of Application Load Balancers require you to preallocate proxy-only subnets. Although the blueprint doesn't deploy Application Load Balancers that require this range, allocating ranges in advance helps reduce friction for workloads when they need to request a new subnet range to enable certain load balancer resources.
Secondary subnet ranges Some use cases, such as container-based workloads, require secondary ranges. The blueprint allocates ranges from the RFC 6598 IP address space for secondary ranges.

Centralized DNS setup

For DNS resolution between Google Cloud and on-premises environments, we recommend that you use a hybrid approach with two authoritative DNS systems. In this approach, Cloud DNS handles authoritative DNS resolution for your Google Cloud environment and your existing on-premises DNS servers handle authoritative DNS resolution for on-premises resources. Your on-premises environment and Google Cloud environment perform DNS lookups between environments through forwarding requests.

The following diagram demonstrates the DNS topology across the multiple VPC networks that are used in the blueprint.

Cloud DNS setup for the blueprint.

The diagram describes the following components of the DNS design that is deployed by the blueprint:

  • The DNS hub project in the common folder is the central point of DNS exchange between the on-premises environment and the Google Cloud environment. DNS forwarding uses the same Dedicated Interconnect instances and Cloud Routers that are already configured in your network topology.
    • In the dual Shared VPC topology, the DNS hub uses the base production Shared VPC network.
    • In the hub-and-spoke topology, the DNS hub uses the base hub Shared VPC network.
  • Servers in each Shared VPC network can resolve DNS records from other Shared VPC networks through DNS forwarding, which is configured between Cloud DNS in each Shared VPC host project and the DNS hub.
  • On-premises servers can resolve DNS records in Google Cloud environments using DNS server policies that allow queries from on-premises servers. The blueprint configures an inbound server policy in the DNS hub to allocate IP addresses, and the on-premises DNS servers forward requests to these addresses. All DNS requests to Google Cloud reach the DNS hub first, which then resolves records from DNS peers.
  • Servers in Google Cloud can resolve DNS records in the on-premises environment using forwarding zones that query on-premises servers. All DNS requests to the on-premises environment originate from the DNS hub. The DNS request source is 35.199.192.0/19.

Firewall policies

Google Cloud has multiple firewall policy types. Hierarchical firewall policies are enforced at the organization or folder level to inherit firewall policy rules consistently across all resources in the hierarchy. In addition, you can configure network firewall policies for each VPC network. The blueprint combines these firewall policies to enforce common configurations across all environments using Hierarchical firewall policies and to enforce more specific configurations at each individual VPC network using network firewall policies.

The blueprint doesn't use legacy VPC firewall rules. We recommend using only firewall policies and avoid mixing use with legacy VPC firewall rules.

Hierarchical firewall policies

The blueprint defines a single hierarchical firewall policy and attaches the policy to each of the production, non-production, development, bootstrap, and common folders. This hierarchical firewall policy contains the rules that should be enforced broadly across all environments, and delegates the evaluation of more granular rules to the network firewall policy for each individual environment.

The following table describes the hierarchical firewall policy rules deployed by the blueprint.

Rule description Direction of traffic Filter (IPv4 range) Protocols and ports Action
Delegate the evaluation of inbound traffic from RFC 1918 to lower levels in the hierarchy. Ingress

192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12

all Go to next
Delegate the evaluation of outbound traffic to RFC 1918 to lower levels in the hierarchy. Egress

192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/12

all Go to next
IAP for TCP forwarding Ingress

35.235.240.0/20

tcp:22,3390 Allow
Windows server activation Egress

35.190.247.13/32

tcp:1688 Allow
Health checks for Cloud Load Balancing Ingress

130.211.0.0/22, 35.191.0.0/16, 209.85.152.0/22, 209.85.204.0/22

tcp:80,443 Allow

Network firewall policies

The blueprint configures a network firewall policy for each network. Each network firewall policy starts with a minimum set of rules that allow access to Google Cloud services and deny egress to all other IP addresses.

In the hub-and-spoke model, the network firewall policies contain additional rules to allow communication between spokes. The network firewall policy allows outbound traffic from one to the hub or another spoke, and allows inbound traffic from the NVA in the hub network.

The following table describes the rules in the global network firewall policy deployed for each VPC network in the blueprint.

Rule description Direction of traffic Filter Protocols and ports
Allow outbound traffic to Google Cloud APIs. Egress The Private Service Connect endpoint that is configured for each individual network. See Private access to Google APIs. tcp:443
Deny outbound traffic not matched by other rules. Egress all all

Allow outbound traffic from one spoke to another spoke (for hub-and-spoke model only).

Egress The aggregate of all IP addresses used in the hub-and-spoke topology. Traffic that leaves a spoke VPC is routed to the NVA in the hub network first. all

Allow inbound traffic to a spoke from the NVA in the hub network (for hub-and-spoke model only).

Ingress Traffic originating from the NVAs in the hub network. all

When you first deploy the blueprint, a VM instance in a VPC network can communicate with Google Cloud services, but not to other infrastructure resources in the same VPC network. To allow VM instances to communicate, you must add additional rules to your network firewall policy and tags that explicitly allow the VM instances to communicate. Tags are added to VM instances, and traffic is evaluated against those tags. Tags additionally have IAM controls so that you can define them centrally and delegate their use to other teams.

The following diagram shows an example of how you can add custom tags and network firewall policy rules to let workloads communicate inside a VPC network.

Firewall rules in example.com.

The diagram demonstrates the following concepts of this example:

  • The network firewall policy contains Rule 1 that denies outbound traffic from all sources at priority 65530.
  • The network firewall policy contains Rule 2 that allows inbound traffic from instances with the service=frontend tag to instances with the service=backend tag at priority 999.
  • The instance-2 VM can receive traffic from instance-1 because the traffic matches the tags allowed by Rule 2. Rule 2 is matched before Rule 1 is evaluated, based on the priority value.
  • The instance-3 VM doesn't receive traffic. The only firewall policy rule that matches this traffic is Rule 1, so outbound traffic from instance-1 is denied.

Private access to Google Cloud APIs

To let resources in your VPC networks or on-premises environment reach Google Cloud services, we recommend private connectivity instead of outbound internet traffic to public API endpoints. The blueprint configures Private Google Access on every subnet and creates internal endpoints with Private Service Connect to communicate with Google Cloud services. Used together, these controls allow a private path to Google Cloud services, without relying on internet outbound traffic or publicly advertised internet ranges.

The blueprint configures Private Service Connect endpoints with API bundles to differentiate which services can be accessed in which network. The base network uses the all-apis bundle and can reach any Google service, and the restricted network uses the vpcsc bundle which allows access to a limited set of services that support VPC Service Controls.

For access from hosts that are located in an on-premises environment, we recommend that you use a convention of custom FQDN for each endpoint, as described in the following table. The blueprint uses a unique Private Service Connect endpoint for each VPC network, configured for access to a different set of API bundles. Therefore, you must consider how to route service traffic from the on-premises environment to the VPC network with the correct API endpoint, and if you're using VPC Service Controls, ensure that traffic to Google Cloud services reaches the endpoint inside the intended perimeter. Configure your on-premise controls for DNS, firewalls, and routers to allow access to these endpoints, and configure on-premise hosts to use the appropriate endpoint. For more information, see access Google APIs through endpoints.

The following table describes the Private Service Connect endpoints created for each network.

VPC Environment API bundle Private Service Connect endpoint IP address Custom FQDN
Base Common all-apis 10.17.0.1/32 c.private.googleapis.com
Development all-apis 10.17.0.2/32 d.private.googleapis.com
Non-production all-apis 10.17.0.3/32 n.private.googleapis.com
Production all-apis 10.17.0.4/32 p.private.googleapis.com
Restricted Common vpcsc 10.17.0.5/32 c.restricted.googleapis.com
Development vpcsc 10.17.0.6/32 d.restricted.googleapis.com
Non-production vpcsc 10.17.0.7/32 n.restricted.googleapis.com
Production vpcsc 10.17.0.8/32 p.restricted.googleapis.com

To ensure that traffic for Google Cloud services has a DNS lookup to the correct endpoint, the blueprint configures private DNS zones for each VPC network. The following table describes these private DNS zones.

Private zone name DNS name Record type Data
googleapis.com. *.googleapis.com. CNAME private.googleapis.com. (for base networks) or restricted.googleapis.com. (for restricted networks)
private.googleapis.com (for base networks) or restricted.googleapis.com (for restricted networks) A The Private Service Connect endpoint IP address for that VPC network.
gcr.io. *.gcr.io CNAME gcr.io.
gcr.io A The Private Service Connect endpoint IP address for that VPC network.
pkg.dev. *.pkg.dev. CNAME pkg.dev.
pkg.dev. A The Private Service Connect endpoint IP address for that VPC network.

The blueprint has additional configurations to enforce that these Private Service Connect endpoints are used consistently. Each Shared VPC network also enforces the following:

  • A network firewall policy rule that allows outbound traffic from all sources to the IP address of the Private Service Connect endpoint on TCP:443.
  • A network firewall policy rule that denies outbound traffic to 0.0.0.0/0, which includes the default domains that are used for access to Google Cloud services.

Internet connectivity

The blueprint doesn't allow inbound or outbound traffic between its VPC networks and the internet. For workloads that require internet connectivity, you must take additional steps to design the access paths required.

For workloads that require outbound traffic to the internet, we recommend that you manage outbound traffic through Cloud NAT to allow outbound traffic without unsolicited inbound connections, or through Secure Web Proxy for more granular control to allow outbound traffic to trusted web services only.

For workloads that require inbound traffic from the internet, we recommend that you design your workload with Cloud Load Balancing and Google Cloud Armor to benefit from DDoS and WAF protections.

We don't recommend that you design workloads that allow direct connectivity between the internet and a VM using an external IP address on the VM.

Hybrid connectivity between an on-premises environment and Google Cloud

To establish connectivity between the on-premises environment and Google Cloud, we recommend that you use Dedicated Interconnect to maximize security and reliability. A Dedicated Interconnect connection is a direct link between your on-premises network and Google Cloud.

The following diagram introduces hybrid connectivity between the on-premises environment and a Google Virtual Private Cloud network.

The hybrid connection structure.

The diagram describes the following components of the pattern for 99.99% availability for Dedicated Interconnect:

  • Four Dedicated Interconnect connections, with two connections in one metropolitan area (metro) and two connections in another metro.
  • The connections are divided into two pairs, with each pair connected to a separate on-premises data center.
  • VLAN attachments are used to connect each Dedicated Interconnect instance to Cloud Routers that are attached to the Shared VPC topology. These VLAN attachments are hosted in the prj-c-interconnect project.
  • Each Shared VPC network has four Cloud Routers, two in each region, with the dynamic routing mode set to global so that every Cloud Router can announce all subnets, independent of region.

With global dynamic routing, Cloud Router advertises routes to all subnets in the VPC network. Cloud Router advertises routes to remote subnets (subnets outside of the Cloud Router's region) with a lower priority compared to local subnets (subnets that are in the Cloud Router's region). Optionally, you can change advertised prefixes and priorities when you configure the BGP session for a Cloud Router.

Traffic from Google Cloud to an on-premises environment uses the Cloud Router closest to the cloud resources. Within a single region, multiple routes to on-premises networks have the same multi-exit discriminator (MED) value, and Google Cloud uses equal cost multi-path (ECMP) routing to distribute outbound traffic between all possible routes.

On-premises configuration changes

To configure connectivity between the on-premises environment and Google Cloud, you must configure additional changes in your on-premises environment. The Terraform code in the blueprint automatically configures Google Cloud resources but doesn't modify any of your on-premises network resources.

Some of the components for hybrid connectivity from your on-premises environment to Google Cloud are automatically enabled by the blueprint, including the following:

  • Cloud DNS is configured with DNS forwarding between all Shared VPC networks to a single hub, as described in DNS setup. A Cloud DNS server policy is configured with inbound forwarder IP addresses.
  • Cloud Router is configured to export routes for all subnets and custom routes for the IP addresses used by the Private Service Connect endpoints.

To enable hybrid connectivity, you must take the following additional steps:

  1. Order a Dedicated Interconnect connection.
  2. Configure on-premises routers and firewalls to allow outbound traffic to the internal IP address space defined in IP address space allocation.
  3. Configure your on-premises DNS servers to forward DNS lookups bound for Google Cloud to the inbound forwarder IP addresses that is already configured by the blueprint.
  4. Configure your on-premises DNS servers, firewalls, and routers to accept DNS queries from the Cloud DNS forwarding zone (35.199.192.0/19).
  5. Configure on-premise DNS servers to respond to queries from on-premises hosts to Google Cloud services with the IP addresses defined in private access to Cloud APIs.
  6. For encryption in transit over the Dedicated Interconnect connection, configure MACsec for Cloud Interconnect or configure HA VPN over Cloud Interconnect for IPsec encryption.

For more information, see Private Google Access for on-premises hosts.

What's next

Detective controls

Threat detection and monitoring capabilities are provided using a combination of built-in security controls from Security Command Center and custom solutions that let you detect and respond to security events.

Centralized logging for security and audit

The blueprint configures logging capabilities to track and analyze changes to your Google Cloud resources with logs that are aggregated to a single project.

The following diagram shows how the blueprint aggregates logs from multiple sources in multiple projects into a centralized log sink.

Logging structure for example.com.

The diagram describes the following:

  • Log sinks are configured at the organization node to aggregate logs from all projects in the resource hierarchy.
  • Multiple log sinks are configured to send logs that match a filter to different destinations for storage and analytics.
  • The prj-c-logging project contains all the resources for log storage and analytics.
  • Optionally, you can configure additional tooling to export logs to a SIEM.

The blueprint uses different log sources and includes these logs in the log sink filter so that the logs can be exported to a centralized destination. The following table describes the log sources.

Log source

Description

Admin Activity audit logs

You cannot configure, disable, or exclude Admin Activity audit logs.

System Event audit logs

You cannot configure, disable, or exclude System Event audit logs.

Policy Denied audit logs

You cannot configure or disable Policy Denied audit logs, but you can optionally exclude them with exclusion filters.

Data Access audit logs

By default, the blueprint doesn't enable data access logs because the volume and cost of these logs can be high.

To determine whether you should enable data access logs, evaluate where your workloads handle sensitive data and consider whether you have a requirement to enable data access logs for each service and environment working with sensitive data.

VPC Flow Logs

The blueprint enables VPC Flow Logs for every subnet. The blueprint configures log sampling to sample 50% of logs to reduce cost.

If you create additional subnets, you must ensure that VPC Flow Logs are enabled for each subnet.

Firewall Rules Logging

The blueprint enables Firewall Rules Logging for every firewall policy rule.

If you create additional firewall policy rules for workloads, you must ensure that Firewall Rules Logging is enabled for each new rule.

Cloud DNS logging

The blueprint enables Cloud DNS logs for managed zones.

If you create additional managed zones, you must enable those DNS logs.

Google Workspace audit logging

Requires a one-time enablement step that is not automated by the blueprint. For more information, see Share data with Google Cloud services.

Access Transparency logs

Requires a one-time enablement step that is not automated by the blueprint. For more information, see Enable Access Transparency.

The following table describes the log sinks and how they are used with supported destinations in the blueprint.

Sink

Destination

Purpose

sk-c-logging-la

Logs routed to Cloud Logging buckets with Log Analytics and a linked BigQuery dataset enabled

Actively analyze logs. Run ad hoc investigations by using Logs Explorer in the console, or write SQL queries, reports, and views using the linked BigQuery dataset.

sk-c-logging-bkt

Logs routed to Cloud Storage

Store logs long-term for compliance, audit, and incident-tracking purposes.

Optionally, if you have compliance requirements for mandatory data retention, we recommend that you additionally configure Bucket Lock.

sk-c-logging-pub

Logs routed to Pub/Sub

Export logs to an external platform such as your existing SIEM.

This requires additional work to integrate with your SIEM, such as the following mechanisms:

For guidance on enabling additional log types and writing log sink filters, see the log scoping tool.

Threat monitoring with Security Command Center

We recommend that you activate Security Command Center Premium for your organization to automatically detect threats, vulnerabilities, and misconfigurations in your Google Cloud resources. Security Command Center creates security findings from multiple sources including the following:

  • Security Health Analytics: detects common vulnerabilities and misconfigurations across Google Cloud resources.
  • Attack path exposure: shows a simulated path of how an attacker could exploit your high-value resources, based on the vulnerabilities and misconfigurations that are detected by other Security Command Center sources.
  • Event Threat Detection: applies detection logic and proprietary threat intelligence against your logs to identify threats in near-real time.
  • Container Threat Detection: detects common container runtime attacks.
  • Virtual Machine Threat Detection: detects potentially malicious applications that are running on virtual machines.
  • Web Security Scanner: scans for OWASP Top Ten vulnerabilities in your web-facing applications on Compute Engine, App Engine, or Google Kubernetes Engine.

For more information on the vulnerabilities and threats addressed by Security Command Center, see Security Command Center sources.

You must activate Security Command Center after you deploy the blueprint. For instructions, see Activate Security Command Center for an organization.

After you activate Security Command Center, we recommend that you export the findings that are produced by Security Command Center to your existing tools or processes for triaging and responding to threats. The blueprint creates the prj-c-scc project with a Pub/Sub topic to be used for this integration. Depending on your existing tools, use one of the following methods to export findings:

Alerting on log-based metrics and performance metrics

When you begin to deploy workloads on top of your foundation, we recommend that you use Cloud Monitoring to measure the performance metrics.

The blueprint creates a monitoring project such as prj-p-monitoring for each environment. This project is configured as a scoping project to gather aggregated performance metrics across multiple projects. The blueprint deploys an example with log-based metrics and an alerting policy to generate email notifications if there are any changes to the IAM policy that is applied to Cloud Storage buckets. This helps monitor for suspicious activities on sensitive resources such as the bucket in the prj-b-seed project that contains the Terraform state.

More generally, you can also use Cloud Monitoring to measure the performance metrics and health of your workload applications. Depending on the operational responsibility for supporting and monitoring applications in your organization, you might make more granular monitoring projects for different teams. Use these monitoring projects to view performance metrics, create dashboards of application health, and trigger alerts when your expected SLO is not met.

The following diagram shows a high-level view of how Cloud Monitoring aggregates performance metrics.

Monitoring of performance.

For guidance on how to monitor workloads effectively for reliability and availability, see the Site Reliability Engineering book by Google, particularly the chapter on monitoring distributed systems.

Custom solution for automated log analysis

You might have requirements to create alerts for security events that are based on custom queries against logs. Custom queries can help supplement the capabilities of your SIEM by analyzing logs on Google Cloud and exporting only the events that merit investigation, especially if you don't have the capacity to export all cloud logs to your SIEM.

The blueprint helps enable this log analysis by setting up a centralized source of logs that you can query using a linked BigQuery dataset. To automate this capability, you must implement the code sample at bq-log-alerting and extend the foundation capabilities. The sample code lets you regularly query a log source and send a custom finding to Security Command Center.

The following diagram introduces the high-level flow of the automated log analysis.

Automated logging analysis.

The diagram shows the following concepts of automated log analysis:

  • Logs from various sources are aggregated into a centralized logs bucket with log analytics and a linked BigQuery dataset.
  • BigQuery views are configured to query logs for the security event that you want to monitor.
  • Cloud Scheduler pushes an event to a Pub/Sub topic every 15 minutes and triggers Cloud Functions.
  • Cloud Functions queries the views for new events. If it finds events, it pushes them to Security Command Center as custom findings.
  • Security Command Center publishes notifications about new findings to another Pub/Sub topic.
  • An external tool such as a SIEM subscribes to the Pub/Sub topic to ingest new findings.

The sample has several use cases to query for potentially suspicious behavior. Examples include a login from a list of super admins or other highly privileged accounts that you specify, changes to logging settings, or changes to network routes. You can extend the use cases by writing new query views for your requirements. Write your own queries or reference security log analytics for a library of SQL queries to help you analyze Google Cloud logs.

Custom solution to respond to asset changes

To respond to events in real time, we recommend that you use Cloud Asset Inventory to monitor asset changes. In this custom solution, an asset feed is configured to trigger notifications to Pub/Sub about changes to resources in real time, and then Cloud Functions runs custom code to enforce your own business logic based on whether the change should be allowed.

The blueprint has an example of this custom governance solution that monitors for IAM changes that add highly sensitive roles including Organization Admin, Owner, and Editor. The following diagram describes this solution.

Automatically reverting an IAM policy change and sending a notification.

The previous diagram shows these concepts:

  • Changes are made to an allow policy.
  • The Cloud Asset Inventory feed sends a real-time notification about the allow policy change to Pub/Sub.
  • Pub/Sub triggers a function.
  • Cloud Functions runs custom code to enforce your policy. The example function has logic to assess if the change has added the Organization Admin, Owner, or Editor roles to an allow policy. If so, the function creates a custom security finding and sends it to Security Command Center.
  • Optionally, you can use this model to automate remediation efforts. Write additional business logic in Cloud Functions to automatically take action on the finding, such as reverting the allow policy to its previous state.

In addition, you can extend the infrastructure and logic used by this sample solution to add custom responses to other events that are important to your business.

What's next

Preventative controls for acceptable resource configurations

We recommend that you define policy constraints that enforce acceptable resource configurations and prevent risky configurations. The blueprint uses a combination of organization policy constraints and infrastructure-as-code (IaC) validation in your pipeline. These controls prevent the creation of resources that don't meet your policy guidelines. Enforcing these controls early in the design and build of your workloads helps you to avoid remediation work later.

Organization policy constraints

The Organization Policy service enforces constraints to ensure that certain resource configurations can't be created in your Google Cloud organization, even by someone with a sufficiently privileged IAM role.

The blueprint enforces policies at the organization node so that these controls are inherited by all folders and projects within the organization. This bundle of policies is designed to prevent certain high-risk configurations, such as exposing a VM to the public internet or granting public access to storage buckets, unless you deliberately allow an exception to the policy.

The following table introduces the organization policy constraints that are implemented in the blueprint:

Organization policy constraint Description

compute.disableNestedVirtualization

Nested virtualization on Compute Engine VMs can evade monitoring and other security tools for your VMs if poorly configured. This constraint prevents the creation of nested virtualization.

compute.disableSerialPortAccess

IAM roles like compute.instanceAdmin allow privileged access to an instance's serial port using SSH keys. If the SSH key is exposed, an attacker could access the serial port and bypass network and firewall controls. This constraint prevents serial port access.

compute.disableVpcExternalIpv6

External IPv6 subnets can be exposed to unauthorized internet access if they are poorly configured. This constraint prevents the creation of external IPv6 subnets.

compute.requireOsLogin

The default behavior of setting SSH keys in metadata can allow unauthorized remote access to VMs if keys are exposed. This constraint enforces the use of OS Login instead of metadata-based SSH keys.

compute.restrictProtocolForwardingCreationForTypes

VM protocol forwarding for external IP addresses can lead to unauthorized internet egress if forwarding is poorly configured. This constraint allows VM protocol forwarding for internal addresses only.

compute.restrictXpnProjectLienRemoval

Deleting a Shared VPC host project can be disruptive to all the service projects that use networking resources. This constraint prevents accidental or malicious deletion of the Shared VPC host projects by preventing the removal of the project lien on these projects.

compute.setNewProjectDefaultToZonalDNSOnly

A legacy setting for global (project-wide) internal DNS is not recommended because it reduces service availability. This constraint prevents the use of the legacy setting.

compute.skipDefaultNetworkCreation

A default VPC network and overly permissive default VPC firewall rules are created in every new project that enables the Compute Engine API. This constraint skips the creation of the default network and default VPC firewall rules.

compute.vmExternalIpAccess

By default, a VM is created with an external IPv4 address that can lead to unauthorized internet access. This constraint configures an empty allowlist of external IP addresses that the VM can use and denies all others.

essentialcontacts.allowedContactDomains

By default, Essential Contacts can be configured to send notifications about your domain to any other domain. This constraint enforces that only email addresses in approved domains can be set as recipients for Essential Contacts.

iam.allowedPolicyMemberDomains

By default, allow policies can be granted to any Google Account, including unmanaged accounts, and accounts belonging to external organizations. This constraint ensures that allow policies in your organization can only be granted to managed accounts from your own domain. Optionally, you can allow additional domains.

iam.automaticIamGrantsForDefaultServiceAccounts

By default, default service accounts are automatically granted overly permissive roles. This constraint prevents the automatic IAM role grants to default service accounts.

iam.disableServiceAccountKeyCreation

Service account keys are a high-risk persistent credential, and in most cases a more secure alternative to service account keys can be used. This constraint prevents the creation of service account keys.

iam.disableServiceAccountKeyUpload

Uploading service account key material can increase risk if key material is exposed. This constraint prevents the uploading of service account keys.

sql.restrictAuthorizedNetworks

Cloud SQL instances can be exposed to unauthenticated internet access if the instances are configured to use authorized networks without a Cloud SQL Auth Proxy. This policy prevents the configuration of authorized networks for database access and forces the use of the Cloud SQL Auth Proxy instead.

sql.restrictPublicIp

Cloud SQL instances can be exposed to unauthenticated internet access if the instances are created with public IP addresses. This constraint prevents public IP addresses on Cloud SQL instances.

storage.uniformBucketLevelAccess

By default, objects in Cloud Storage can be accessed through legacy Access Control Lists (ACLs) instead of IAM, which can lead to inconsistent access controls and accidental exposure if misconfigured. Legacy ACL access is not affected by the iam.allowedPolicyMemberDomains constraint. This constraint enforces that access can only be configured through IAM uniform bucket-level access, not legacy ACLs.

storage.publicAccessPrevention

Cloud Storage buckets can be exposed to unauthenticated internet access if misconfigured. This constraint prevents ACLs and IAM permissions that grant access to allUsers and allAuthenticatedUsers.

These policies are a starting point that we recommend for most customers and most scenarios, but you might need to modify organization policy constraints to accommodate certain workload types. For example, a workload that uses a Cloud Storage bucket as the backend for Cloud CDN to host public resources is blocked by storage.publicAccessPrevention, or a public-facing Cloud Run app that doesn't require authentication is blocked by iam.allowedPolicyMemberDomains. In these cases, modify the organization policy at the folder or project level to allow a narrow exception. You can also conditionally add constraints to organization policy by defining a tag that grants an exception or enforcement for policy, then applying the tag to projects and folders.

For additional constraints, see available constraints and custom constraints.

Pre-deployment validation of infrastructure-as-code

The blueprint uses a GitOps approach to manage infrastructure, meaning that all infrastructure changes are implemented through version-controlled infrastructure-as-code (IaC) and can be validated before deploying.

The policies enforced in the blueprint define acceptable resource configurations that can be deployed by your pipeline. If code that is submitted to your GitHub repository does not pass the policy checks, no resources are deployed.

For information on how pipelines are used and how controls are enforced through CI/CD automation, see deployment methodology.

What's next

Deployment methodology

We recommend that you use declarative infrastructure to deploy your foundation in a consistent and controllable manner. This approach helps enable consistent governance by enforcing policy controls about acceptable resource configurations into your pipelines. The blueprint is deployed using a GitOps flow, with Terraform used to define infrastructure as code (IaC), a Git repository for version control and approval of code, and Cloud Build for CI/CD automation in the deployment pipeline. For an introduction to this concept, see managing infrastructure as code with Terraform, Cloud Build, and GitOps.

The following sections describe how the deployment pipeline is used to manage resources in your organization.

Pipeline layers

To separate the teams and technology stack that are responsible for managing different layers of your environment, we recommend a model that uses different pipelines and different personas that are responsible for each layer of the stack.

The following diagram introduces our recommended model for separating a foundation pipeline, infrastructure pipeline, and application pipeline.

Blueprint pipelines.

The diagram introduces the pipeline layers in this model:

  • The foundation pipeline deploys the foundation resources that are used across the platform. We recommend that a single central team is responsible for managing the foundation resources that are consumed by multiple business units and workloads.
  • The infrastructure pipeline deploys projects and infrastructure that are used by workloads, such as VM instances or databases. The blueprint sets up a separate infrastructure pipeline for each business unit, or you might prefer a single infrastructure pipeline used by multiple teams.
  • The application pipeline deploys the artifacts for each workload, such as containers or images. You might have many different application teams with individual application pipelines.

The following sections introduce the usage of each pipeline layer.

The foundation pipeline

The foundation pipeline deploys the foundation resources. It also sets up the infrastructure pipeline that is used to deploy infrastructure used by workloads.

To create the foundation pipeline, you first clone or fork the terraform-example-foundation to your own Git repository. Follow the steps in the 0-bootstrap README file to configure your bootstrap folder and resources.

Stage Description

0-bootstrap

Bootstraps a Google Cloud organization. This step also configures a CI/CD pipeline for the blueprint code in subsequent stages.

  • The CICD project contains the Cloud Build foundation pipeline for deploying resources.
  • The seed project includes the Cloud Storage buckets that contain the Terraform state of the foundation infrastructure and includes highly privileged service accounts that are used by the foundation pipeline to create resources. The Terraform state is protected through storage Object Versioning. When the CI/CD pipeline runs, it acts as the service accounts that are managed in the seed project.

After you create the foundation pipeline in the 0-bootstrap stage, the following stages deploy resources on the foundation pipeline. Review the README directions for each stage and implement each stage sequentially.

Stage Description

1-org

Sets up top-level shared folders, projects for shared services, organization-level logging, and baseline security settings through organization policies.

2-environments

Sets up development, non-production, and production environments within the Google Cloud organization that you've created.

3-networks-dual-svpc

or

3-networks-hub-and-spoke

Sets up shared VPCs in your chosen topology and the associated network resources.

The infrastructure pipeline

The infrastructure pipeline deploys the projects and infrastructure (for example, the VM instances and databases) that are used by workloads. The foundation pipeline deploys multiple infrastructure pipelines. This separation between the foundation pipeline and infrastructure pipeline allows for a separation between platform-wide resources and workload-specific resources.

The following diagram describes how the blueprint configures multiple infrastructure pipelines that are intended for use by separate teams.

Multiple infrastructure pipelines

The diagram describes the following key concepts:

  • Each infrastructure pipeline is used to manage infrastructure resources independently of the foundation resources.
  • Each business unit has its own infrastructure pipeline, managed in a dedicated project in the common folder.
  • Each of the infrastructure pipelines has a service account with permission to deploy resources only to the projects that are associated with that business unit. This strategy creates a separation of duties between the privileged service accounts used for the foundation pipeline and those used by each infrastructure pipeline

This approach with multiple infrastructure pipelines is recommended when you have multiple entities inside your organization that have the skills and appetite to manage their infrastructure separately, particularly if they have different requirements such as the types of pipeline validation policy they want to enforce. Alternatively, you might prefer to have a single infrastructure pipeline managed by a single team with consistent validation policies.

In the terraform-example-foundation, stage 4 configures an infrastructure pipeline, and stage 5 demonstrates an example of using that pipeline to deploy infrastructure resources.

Stage Description

4-projects

Sets up a folder structure, projects, and an infrastructure pipeline.

5-app-infra (optional)

Deploys workload projects with a Compute Engine instance using the infrastructure pipeline as an example.

The application pipeline

The application pipeline is responsible for deploying application artifacts for each individual workload, such as images or Kubernetes containers that run the business logic of your application. These artifacts are deployed to infrastructure resources that were deployed by your infrastructure pipeline.

The enterprise foundation blueprint sets up your foundation pipeline and infrastructure pipeline, but doesn't deploy an application pipeline. For an example application pipeline, see the enterprise application blueprint.

Automating your pipeline with Cloud Build

The blueprint uses Cloud Build to automate CI/CD processes. The following table describes the controls are built into the foundation pipeline and infrastructure pipeline that are deployed by the terraform-example-foundation repository. If you are developing your own pipelines using other CI/CD automation tools, we recommend that you apply similar controls.

Control Description

Separate build configurations to validate code before deploying

The blueprint uses two Cloud Build build configuration files for the entire pipeline, and each repository that is associated with a stage has two Cloud Build triggers that are associated with those build configuration files. When code is pushed to a repository branch, the build configuration files are triggered to first run cloudbuild-tf-plan.yaml which validates your code with policy checks and Terraform plan against that branch, then cloudbuild-tf-apply.yaml runs terraform apply on the outcome of that plan.

Terraform policy checks

The blueprint includes a set of Open Policy Agent constraints that are enforced by the policy validation in Google Cloud CLI. These constraints define the acceptable resource configurations that can be deployed by your pipeline. If a build doesn't meet policy in the first build configuration, then the second build configuration doesn't deploy any resources.

The policies enforced in the blueprint are forked from GoogleCloudPlatform/policy-library on GitHub. You can write additional policies for the library to enforce custom policies to meet your requirements.

Principle of least privilege

The foundation pipeline has a different service account for each stage with an allow policy that grants only the minimum IAM roles for that stage. Each Cloud Build trigger runs as the specific service account for that stage. Using different accounts helps mitigate the risk that modifying one repository could impact the resources that are managed by another repository. To understand the particular IAM roles applied to each service account, see the sa.tf Terraform code in the bootstrap stage.

Cloud Build private pools

The blueprint uses Cloud Build private pools. Private pools let you optionally enforce additional controls such as restricting access to public repositories or running Cloud Build inside a VPC Service Controls perimeter.

Cloud Build custom builders

The blueprint creates its own custom builder to run Terraform. For more information, see 0-bootstrap/Dockerfile. This control enforces that the pipeline consistently runs with a known set of libraries at pinned versions.

Deployment approval

Optionally, you can add a manual approval stage to Cloud Build. This approval adds an additional checkpoint after the build is triggered but before it runs so that a privileged user can manually approve the build.

Branching strategy

We recommend a persistent branch strategy for submitting code to your Git system and deploying resources through the foundation pipeline. The following diagram describes the persistent branch strategy.

The blueprint deployment branching strategy

The diagram describes three persistent branches in Git (development, non-production, and production) that reflect the corresponding Google Cloud environments. There are also multiple ephemeral feature branches that don't correspond to resources that are deployed in your Google Cloud environments.

We recommend that you enforce a pull request (PR) process into your Git system so that any code that is merged to a persistent branch has an approved PR.

To develop code with this persistent branch strategy, follow these high-level steps:

  1. When you're developing new capabilities or working on a bug fix, create a new branch based off of the development branch. Use a naming convention for your branch that includes the type of change, a ticket number or other identifier, and a human-readable description, like feature/123456-org-policies.
  2. When you complete the work in the feature branch, open a PR that targets the development branch.
  3. When you submit the PR, the PR triggers the foundation pipeline to perform terraform plan and terraform validate to stage and verify the changes.
  4. After you validate the changes to the code, merge the feature or bug fix into the development branch.
  5. The merge process triggers the foundation pipeline to run terraform apply to deploy the latest changes in the development branch to the development environment.
  6. Review the changes in the development environment using any manual reviews, functional tests, or end-to-end tests that are relevant to your use case. Then promote changes to the non-production environment by opening a PR that targets the non-production branch and merge your changes.
  7. To deploy resources to the production environment, repeat the same process as step 6: review and validate the deployed resources, open a PR to the production branch, and merge.

What's next

Operations best practices

This section introduces operations that you must consider as you deploy and operate additional workloads into your Google Cloud environment. This section isn't intended to be exhaustive of all operations in your cloud environment, but introduces decisions related to the architectural recommendations and resources deployed by the blueprint.

Update foundation resources

Although the blueprint provides an opinionated starting point for your foundation environment, your foundation requirements might grow over time. After your initial deployment, you might adjust configuration settings or build new shared services to be consumed by all workloads.

To modify foundation resources, we recommend that you make all changes through the foundation pipeline. Review the branching strategy for an introduction to the flow of writing code, merging it, and triggering the deployment pipelines.

Decide attributes for new workload projects

When creating new projects through the project factory module of the automation pipeline, you must configure various attributes. Your process to design and create projects for new workloads should include decisions for the following:

  • Which Google Cloud APIs to enable
  • Which Shared VPC to use, or whether to create a new VPC network
  • Which IAM roles to create for the initial project-service-account that is created by the pipeline
  • Which project labels to apply
  • The folder that the project is deployed to
  • Which billing account to use
  • Whether to add the project to a VPC Service Controls perimeter
  • Whether to configure a budget and billing alert threshold for the project

For a complete reference of the configurable attributes for each project, see the input variables for the project factory in the automation pipeline.

Manage permissions at scale

When you deploy workload projects on top of your foundation, you must consider how you will grant access to the intended developers and consumers of those projects. We recommend that you add users into a group that is managed by your existing identity provider, synchronize the groups with Cloud Identity, and then apply IAM roles to the groups. Always keep in mind the principle of least privilege.

We also recommend that you use IAM recommender to identify allow policies that grant over-privileged roles. Design a process to periodically review recommendations or automatically apply recommendations into your deployment pipelines.

Coordinate changes between the networking team and the application team

The network topologies that are deployed by the blueprint assume that you have a team responsible for managing network resources, and separate teams responsible for deploying workload infrastructure resources. As the workload teams deploy infrastructure, they must create firewall rules to allow the intended access paths between components of their workload, but they don't have permission to modify the network firewall policies themselves.

Plan how teams will work together to coordinate the changes to the centralized networking resources that are needed to deploy applications. For example, you might design a process where a workload team requests tags for their applications. The networking team then creates the tags and adds rules to the network firewall policy that allows traffic to flow between resources with the tags, and delegates the IAM roles to use the tags to the workload team.

Optimize your environment with the Active Assist portfolio

In addition to IAM recommender, Google Cloud provides the Active Assist portfolio of services to make recommendations about how to optimize your environment. For example, firewall insights or the unattended project recommender provide actionable recommendations that can help tighten your security posture.

Design a process to periodically review recommendations or automatically apply recommendations into your deployment pipelines. Decide which recommendations should be managed by a central team and which should be the responsibility of workload owners, and apply IAM roles to access the recommendations accordingly.

Grant exceptions to organization policies

The blueprint enforces a set of organization policy constraints that are recommended to most customers in most scenarios, but you might have legitimate use cases that require limited exceptions to the organization policies you enforce broadly.

For example, the blueprint enforces the iam.disableServiceAccountKeyCreation constraint. This constraint is an important security control because a leaked service account key can have a significant negative impact, and most scenarios should use more secure alternatives to service account keys to authenticate. However, there might be use cases that can only authenticate with a service account key, such as an on-premises server that requires access to Google Cloud services and cannot use workload identity federation. In this scenario, you might decide to allow an exception to the policy, so long as additional compensating controls like best practices for managing service account keys are enforced.

Therefore, you should design a process for workloads to request an exception to policies, and ensure that the decision makers who are responsible for granting exceptions have the technical knowledge to validate the use case and consult on whether additional controls must be in place to compensate. When you grant an exception to a workload, modify the organization policy constraint as narrowly as possible. You can also conditionally add constraints to an organization policy by defining a tag that grants an exception or enforcement for policy, then applying the tag to projects and folders.

Protect your resources with VPC Service Controls

The blueprint helps prepare your environment for VPC Service Controls by separating the base and restricted networks. However, by default, the Terraform code doesn't enable VPC Service Controls because this enablement can be a disruptive process.

A perimeter denies access to restricted Google Cloud services from traffic that originates outside the perimeter, which includes the console, developer workstations, and the foundation pipeline used to deploy resources. If you use VPC Service Controls, you must design exceptions to the perimeter that allow the access paths that you intend.

A VPC Service Controls perimeter is intended for exfiltration controls between your Google Cloud organization and external sources. The perimeter isn't intended to replace or duplicate allow policies for granular access control to individual projects or resources. When you design and architect a perimeter, we recommend using a common unified perimeter for lower management overhead.

If you must design multiple perimeters to granularly control service traffic within your Google Cloud organization, we recommend that you clearly define the threats that are addressed by a more complex perimeter structure and the access paths between perimeters that are needed for intended operations.

To adopt VPC Service Controls, evaluate the following:

After the perimeter is enabled, we recommend that you design a process to consistently add new projects to the correct perimeter, and a process to design exceptions when developers have a new use case that is denied by your current perimeter configuration.

Test organization-wide changes in a separate organization

We recommend that you never deploy changes to production without testing. For workload resources, this approach is facilitated by separate environments for development, non-production, and production. However, some resources at the organization don't have separate environments to facilitate testing.

For changes at the organization-level, or other changes that can affect production environments like the configuration between your identity provider and Cloud Identity, consider creating a separate organization for test purposes.

Control remote access to virtual machines

Because we recommend that you deploy immutable infrastructure through the foundation pipeline, infrastructure pipeline, and application pipeline, we also recommend that you only grant developers direct access to a virtual machine through SSH or RDP for limited or exceptional use cases.

For scenarios that require remote access, we recommend that you manage user access using OS Login where possible. This approach uses managed Google Cloud services to enforce access control, account lifecycle management, two-step verification, and audit logging. Alternatively, if you must allow access through SSH keys in metadata or RDP credentials, it is your responsibility to manage the credential lifecycle and store credentials securely outside of Google Cloud.

In any scenario, a user with SSH or RDP access to a VM can be a privilege escalation risk, so you should design your access model with this in mind. The user can run code on that VM with the privileges of the associated service account or query the metadata server to view the access token that is used to authenticate API requests. This access can then be a privilege escalation if you didn't deliberately intend for the user to operate with the privileges of the service account.

Mitigate overspending by planning budget alerts

The blueprint implements best practices introduced in the Google Cloud Architecture Framework: Cost Optimization for managing cost, including the following:

  • Use a single billing account across all projects in the enterprise foundation.

  • Assign each project a billingcode metadata label that is used to allocate cost between cost centers.

  • Set budgets and alert thresholds.

It's your responsibility to plan budgets and configure billing alerts. The blueprint creates budget alerts for workload projects when the forecasted spending is on track to reach 120% of the budget. This approach lets a central team identify and mitigate incidents of significant overspending. Significant unexpected increases in spending without a clear cause can be an indicator of a security incident and should be investigated from the perspectives of both cost control and security.

Depending on your use case, you might set a budget that is based on the cost of an entire environment folder, or all projects related to a certain cost center, instead of setting granular budgets for each project. We also recommend that you delegate budget and alert setting to workload owners who might set more granular alerting threshold for their day-to-day monitoring.

For guidance on building FinOps capabilities, including forecasting budgets for workloads, see Getting started with FinOps on Google Cloud.

Allocate costs between internal cost centers

The console lets you view your billing reports to view and forecast cost in multiple dimensions. In addition to the prebuilt reports, we recommend that you export billing data to a BigQuery dataset in the prj-c-billing-logs project. The exported billing records allow you to allocate cost on custom dimensions, such as your internal cost centers, based on project label metadata like billingcode.

The following SQL query is a sample query to understand costs for all projects that are grouped by the billingcode project label.

#standardSQL
SELECT
   (SELECT value from UNNEST(labels) where key = 'billingcode') AS costcenter,
   service.description AS description,
   SUM(cost) AS charges,
   SUM((SELECT SUM(amount) FROM UNNEST(credits))) AS credits
FROM PROJECT_ID.DATASET_ID.TABLE_NAME
GROUP BY costcenter, description
ORDER BY costcenter ASC, description ASC

To set up this export, see export Cloud Billing data to BigQuery.

If you require internal accounting or chargeback between cost centers, it's your responsibility to incorporate the data that is obtained from this query into your internal processes.

Ingest findings from detective controls into your existing SIEM

Although the foundation resources help you configure aggregated destinations for audit logs and security findings, it is your responsibility to decide how to consume and use these signals.

If you have a requirement to aggregate logs across all cloud and on-premise environments into an existing SIEM, decide how to ingest logs from the prj-c-logging project and findings from Security Command Center into your existing tools and processes. You might create a single export for all logs and findings if a single team is responsible for monitoring security across your entire environment, or you might create multiple exports filtered to the set of logs and findings needed for multiple teams with different responsibilities.

Alternatively, if log volume and cost are prohibitive, you might avoid duplication by retaining Google Cloud logs and findings only in Google Cloud. In this scenario, ensure that your existing teams have the right access and training to work with logs and findings directly in Google Cloud.

  • For audit logs, design log views to grant access to a subset of logs in your centralized logs bucket to individual teams, instead of duplicating logs to multiple buckets which increases log storage cost.
  • For security findings, grant folder-level and project-level roles for Security Command Center to let teams view and manage security findings just for the projects for which they are responsible, directly in the console.

Continuously develop your controls library

The blueprint starts with a baseline of controls to detect and prevent threats. We recommend that you review these controls and add additional controls based on your requirements. The following table summarizes the mechanisms to enforce governance policies and how to extend these for your additional requirements:

Policy controls enforced by the blueprint Guidance to extend these controls

Security Command Center detects vulnerabilities and threats from multiple security sources.

Define custom modules for Security Health Analytics and custom modules for Event Threat Detection.

The Organization Policy service enforces a recommended set of organization policy constraints on Google Cloud services.

Enforce additional constraints from the premade list of available constraints or create custom constraints.

Open Policy Agent (OPA) policy validates code in the foundation pipeline for acceptable configurations before deployment.

Develop additional constraints based on the guidance at GoogleCloudPlatform/policy-library.

Alerting on log-based metrics and performance metrics configures log-based metrics to alert on changes to IAM policies and configurations of some sensitive resources.

Design additional log-based metrics and alerting policies for log events that you expect shouldn't occur in your environment.

A custom solution for automated log analysis regularly queries logs for suspicious activity and creates Security Command Center findings.

Write additional queries to create findings for security events that you want to monitor, using security log analytics as a reference.

A custom solution to respond to asset changes creates Security Command Center findings and can automate remediation actions.

Create additional Cloud Asset Inventory feeds to monitor changes for particular asset types and write additional Cloud Functions with custom logic to respond to policy violations.

These controls might evolve as your requirements and maturity on Google Cloud change.

Manage encryption keys with Cloud Key Management Service

Google Cloud provides default encryption at rest for all customer content, but also provides Cloud Key Management Service (Cloud KMS) to provide you additional control over your encryption keys for data at rest. We recommend that you evaluate whether the default encryption is sufficient, or whether you have a compliance requirement that you must use Cloud KMS to manage keys yourself. For more information, see decide how to meet compliance requirements for encryption at rest.

The blueprint provides a prj-c-kms project in the common folder and a prj-{env}-kms project in each environment folder for managing encryption keys centrally. This approach lets a central team audit and manage encryption keys that are used by resources in workload projects, in order to meet regulatory and compliance requirements.

Depending on your operational model, you might prefer a single centralized project instance of Cloud KMS under the control of a single team, you might prefer to manage encryption keys separately in each environment, or you might prefer multiple distributed instances so that accountability for encryption keys can be delegated to the appropriate teams. Modify the Terraform code sample as needed to fit your operational model.

Optionally, you can enforce customer-managed encryption keys (CMEK) organization policies to enforce that certain resource types always require a CMEK key and that only CMEK keys from an allowlist of trusted projects can be used.

Store and audit application credentials with Secret Manager

We recommend that you never commit sensitive secrets (such as API keys, passwords, and private certificates) to source code repositories. Instead, commit the secret to Secret Manager and grant the Secret Manager Secret Accessor IAM role to the user or service account that needs to access the secret. We recommend that you grant the IAM role to an individual secret, not to all secrets in the project.

When possible, you should generate production secrets automatically within the CI/CD pipelines and keep them inaccessible to human users except in breakglass situations. In this scenario, ensure that you don't grant IAM roles to view these secrets to any users or groups.

The blueprint provides a single prj-c-secrets project in the common folder and a prj-{env}-secrets project in each environment folder for managing secrets centrally. This approach lets a central team audit and manage secrets used by applications in order to meet regulatory and compliance requirements.

Depending on your operational model, you might prefer a single centralized instance of Secret Manager under the control of a single team, or you might prefer to manage secrets separately in each environment, or you might prefer multiple distributed instances of Secret Manager so that each workload team can manage their own secrets. Modify the Terraform code sample as needed to fit your operational model.

Plan breakglass access to highly privileged accounts

Although we recommend that changes to foundation resources are managed through version-controlled IaC that is deployed by the foundation pipeline, you might have exceptional or emergency scenarios that require privileged access to modify your environment directly. We recommend that you plan for breakglass accounts (sometimes called firecall or emergency accounts) that have highly privileged access to your environment in case of an emergency or when the automation processes break down.

The following table describes some example purposes of breakglass accounts.

Breakglass purpose Description

Super admin

Emergency access to the Super admin role used with Cloud Identity, to, for example, fix issues that are related to identity federation or multi-factor authentication (MFA).

Organization administrator

Emergency access to the Organization Administrator role, which can then grant access to any other IAM role in the organization.

Foundation pipeline administrator

Emergency access to modify the resources in your CICD project on Google Cloud and external Git repository in case the automation of the foundation pipeline breaks down.

Operations or SRE

An operations or SRE team needs privileged access to respond to outages or incidents. This can include tasks like restarting VMs or restoring data.

Your mechanism to permit breakglass access depends on the existing tools and procedures you have in place, but a few example mechanisms include the following:

  • Use your existing tools for privileged access management to temporarily add a user to a group that is predefined with highly-privileged IAM roles or use the credentials of a highly-privileged account.
  • Pre-provision accounts intended only for administrator usage. For example, developer Dana might have an identity dana@example.com for daily use and admin-dana@example.com for breakglass access.
  • Use an application like just-in-time privileged access that allows a developer to self-escalate to more privileged roles.

Regardless of the mechanism you use, consider how you operationally address the following questions:

  • How do you design the scope and granularity of breakglass access? For example, you might design a different breakglass mechanism for different business units to ensure that they cannot disrupt each other.
  • How does your mechanism prevent abuse? Do you require approvals? For example, you might have split operations where one person holds credentials and one person holds the MFA token.
  • How do you audit and alert on breakglass access? For example, you might configure a custom Event Threat Detection module to create a security finding when a predefined breakglass account is used.
  • How do you remove the breakglass access and resume normal operations after the incident is over?

For common privilege escalation tasks and rolling back changes, we recommend designing automated workflows where a user can perform the operation without requiring privilege escalation for their user identity. This approach can help reduce human error and improve security.

For systems that require regular intervention, automating the fix might be the best solution. Google encourages customers to adopt a zero-touch production approach to make all production changes using automation, safe proxies, or audited breakglass. Google provides the SRE books for customers who are looking to adopt Google's SRE approach.

What's next

Deploy the blueprint

This section describes the process that you can use to deploy the blueprint, its naming conventions, and alternatives to blueprint recommendations.

Bringing it all together

To deploy your own enterprise foundation in alignment with the best practices and recommendations from this blueprint, follow the high-level tasks summarized in this section. Deployment requires a combination of prerequisite setup steps, automated deployment through the terraform-example-foundation on GitHub, and additional steps that must be configured manually after the initial foundation deployment is complete.

Process Steps

Prerequisites before deploying the foundation pipeline resources

Complete the following steps before you deploy the foundation pipeline:

To connect to an an existing on-premises environment, prepare the following:

Steps to deploy the terraform-example-foundation from GitHub

Follow the README directions for each stage to deploy the terraform-example-foundation from GitHub:

Additional steps after IaC deployment

After you deploy the Terraform code, complete the following:

Additional administrative controls for customers with sensitive workloads

Google Cloud provides additional administrative controls that can help your security and compliance requirements. However, some controls involve additional cost or operational trade-offs that might not be appropriate for every customer. These controls also require customized inputs for your specific requirements that can't be fully automated in the blueprint with a default value for all customers.

This section introduces security controls that you apply centrally to your foundation. This section isn't intended to be exhaustive of all the security controls that you can apply to specific workloads. For more information on Google's security products and solutions, see Google Cloud security best practices center.

Evaluate whether the following controls are appropriate for your foundation based on your compliance requirements, risk appetite, and sensitivity of data.

Control Description

Protect your resources with VPC Service Controls

VPC Service Controls lets you define security policies that prevent access to Google-managed services outside of a trusted perimeter, block access to data from untrusted locations, and mitigate data exfiltration risks. However, VPC Service Controls can cause existing services to break until you define exceptions to allow intended access patterns.

Evaluate whether the value of mitigating exfiltration risks justifies the increased complexity and operational overhead of adopting VPC Service Controls. The blueprint prepares restricted networks and optional variables to configure VPC Service Controls, but the perimeter isn't enabled until you take additional steps to design and enable it.

Restrict resource locations

You might have regulatory requirements that cloud resources must only be deployed in approved geographical locations. This organization policy constraint enforces that resources can only be deployed in the list of locations you define.

Enable Assured Workloads

Assured Workloads provides additional compliance controls that help you meet specific regulatory regimes. The blueprint provides optional variables in the deployment pipeline for enablement.

Enable data access logs

You might have a requirement to log all access to certain sensitive data or resources.

Evaluate where your workloads handle sensitive data that requires data access logs, and enable the logs for each service and environment working with sensitive data.

Enable Access Approval

Access Approval ensures that Cloud Customer Care and engineering require your explicit approval whenever they need to access your customer content.

Evaluate the operational process required to review Access Approval requests to mitigate possible delays in resolving support incidents.

Enable Key Access Justifications

Key Access Justifications lets you programmatically control whether Google can access your encryption keys, including for automated operations and for Customer Care to access your customer content.

Evaluate the cost and operational overhead associated with Key Access Justifications as well as its dependency on Cloud External Key Manager (Cloud EKM).

Disable Cloud Shell

Cloud Shell is an online development environment. This shell is hosted on a Google-managed server outside of your environment, and thus it isn't subject to the controls that you might have implemented on your own developer workstations.

If you want to strictly control which workstations a developer can use to access cloud resources, disable Cloud Shell. You might also evaluate Cloud Workstations for a configurable workstation option in your own environment.

Restrict access to the Google Cloud console

Google Cloud lets you restrict access to the Google Cloud console based on access level attributes like group membership, trusted IP address ranges, and device verification. Some attributes require an additional subscription to BeyondCorp Enterprise.

Evaluate the access patterns that you trust for user access to web-based applications such as the console as part of a larger zero trust deployment.

Naming conventions

We recommend that you have a standardized naming convention for your Google Cloud resources. The following table describes recommended conventions for resource names in the blueprint.

Resource Naming convention

Folder

fldr-environment

environment is a description of the folder-level resources within the Google Cloud organization. For example, bootstrap, common, production, nonproduction, development, or network.

For example: fldr-production

Project ID

prj-environmentcode-description-randomid

  • environmentcode is a short form of the environment field (one of b, c, p, n, d, or net). Shared VPC host projects use the environmentcode of the associated environment. Projects for networking resources that are shared across environments, like the interconnect project, use the net environment code.
  • description is additional information about the project. You can use short, human-readable abbreviations.
  • randomid is a randomized suffix to prevent collisions for resource names that must be globally unique and to mitigate against attackers guessing resource names. The blueprint automatically adds a random four-character alphanumeric identifier.

For example: prj-c-logging-a1b2

VPC network

vpc-environmentcode-vpctype-vpcconfig

  • environmentcode is a short form of the environment field (one of b, c, p, n, d, or net).
  • vpctype is one of shared, float, or peer.
  • vpcconfig is either base or restricted to indicate whether the network is intended to be used with VPC Service Controls or not.

For example: vpc-p-shared-base

Subnet

sn-environmentcode-vpctype-vpcconfig-region{-description}

  • environmentcode is a short form of the environment field (one of b, c, p, n, d, or net).
  • vpctype is one of shared, float, or peer.
  • vpcconfig is either base or restricted to indicate whether the network is intended to be used with VPC Service Controls or not.
  • region is any valid Google Cloud region that the resource is located in. We recommend removing hyphens and using an abbreviated form of some regions and directions to avoid hitting character limits. For example, au (Australia), na (North America), sa (South America), eu (Europe), se (southeast), or ne (northeast).
  • description is additional information about the subnet. You can use short, human-readable abbreviations.

For example: sn-p-shared-restricted-uswest1

Firewall policies

fw-firewalltype-scope-environmentcode{-description}

  • firewalltype is hierarchical or network.
  • scope is global or the Google Cloud region that the resource is located in. We recommend removing hyphens and using an abbreviated form of some regions and directions to avoid reaching character limits. For example, au (Australia), na (North America), sa (South America), eu (Europe), se (southeast), or ne (northeast).
  • environmentcode is a short form of the environment field (one of b, c, p, n, d, or net) that owns the policy resource.
  • description is additional information about the hierarchical firewall policy. You can use short, human-readable abbreviations.

For example:

fw-hierarchical-global-c-01

fw-network-uswest1-p-shared-base

Cloud Router

cr-environmentcode-vpctype-vpcconfig-region{-description}

  • environmentcode is a short form of the environment field (one of b, c, p, n, d, or net).
  • vpctype is one of shared, float, or peer.
  • vpcconfig is either base or restricted to indicate whether the network is intended to be used with VPC Service Controls or not.
  • region is any valid Google Cloud region that the resource is located in. We recommend removing hyphens and using an abbreviated form of some regions and directions to avoid reaching character limits. For example, au (Australia), na (North America), sa (South America), eu (Europe), se (southeast), or ne (northeast).
  • description is additional information about the Cloud Router. You can use short, human-readable abbreviations.

For example: cr-p-shared-base-useast1-cr1

Cloud Interconnect connection

ic-dc-colo

  • dc is the name of your data center to which a Cloud Interconnect is connected.
  • colo is the colocation facility name that the Cloud Interconnect from the on-premises data center is peered with.

For example: ic-mydatacenter-lgazone1

Cloud Interconnect VLAN attachment

vl-dc-colo-environmentcode-vpctype-vpcconfig-region{-description}

  • dc is the name of your data center to which a Cloud Interconnect is connected.
  • colo is the colocation facility name that the Cloud Interconnect from the on-premises data center is peered with.
  • environmentcode is a short form of the environment field (one of b, c, p, n, d, or net).
  • vpctype is one of shared, float, or peer.
  • vpcconfig is either base or restricted to indicate whether the network is intended to be used with VPC Service Controls or not.
  • region is any valid Google Cloud region that the resource is located in. We recommend removing hyphens and using an abbreviated form of some regions and directions to avoid reaching character limits. For example, au (Australia), na (North America), sa (South America), eu (Europe), se (southeast), or ne (northeast).
  • description is additional information about the VLAN. You can use short, human-readable abbreviations.

For example: vl-mydatacenter-lgazone1-p-shared-base-useast1-cr1

Group

grp-gcp-description@example.com

Where description is additional information about the group. You can use short, human-readable abbreviations.

For example: grp-gcp-billingadmin@example.com

Custom role

rl-description

Where description is additional information about the role. You can use short, human-readable abbreviations.

For example: rl-customcomputeadmin

Service account

sa-description@projectid.iam.gserviceaccount.com

Where:

  • description is additional information about the service account. You can use short, human-readable abbreviations.
  • projectid is the globally unique project identifier.

For example: sa-terraform-net@prj-b-seed-a1b2.iam.gserviceaccount.com

Storage bucket

bkt-projectid-description

Where:

  • projectid is the globally unique project identifier.
  • description is additional information about the storage bucket. You can use short, human-readable abbreviations.

For example: bkt-prj-c-infra-pipeline-a1b2-app-artifacts

Alternatives to default recommendations

The best practices that are recommended in the blueprint might not work for every customer. You can customize any of the recommendations to meet your specific requirements. The following table introduces some of the common variations that you might require based on your existing technology stack and ways of working.

Decision area Possible alternatives

Organization: The blueprint uses a single organization as the root node for all resources.

Decide a resource hierarchy for your Google Cloud landing zone introduces scenarios in which you might prefer multiple organizations, such as the following:

  • Your organization includes sub-companies that are likely to be sold in the future or that run as completely separate entities.
  • You want to experiment in a sandbox environment with no connectivity to your existing organization.

Folder structure: The blueprint has a simple folder structure, with workloads divided into production, non-production and development folders at the top layer.

Decide a resource hierarchy for your Google Cloud landing zone introduces other approaches for structuring folders based on how you want to manage resources and inherit policies, such as:

  • Folders based on application environments
  • Folders based on regional entities or subsidiaries
  • Folders based on accountability framework

Organization policies: The blueprint enforces all organization policy constraints at the organization node.

You might have different security policies or ways of working for different parts of the business. In this scenario, enforce organization policy constraints at a lower node in the resource hierarchy. Review the complete list of organization policy constraints that help meet your requirements.

Deployment pipeline tooling: The blueprint uses Cloud Build to run the automation pipeline.

You might prefer other products for your deployment pipeline, such as Terraform Enterprise, GitLab Runners, GitHub Actions, or Jenkins. The blueprint includes alternative directions for each product.

Code repository for deployment: The blueprint uses Cloud Source Repositories as the managed private Git repository.

Use your preferred version control system for managing code repositories, such as GitLab, GitHub, or Bitbucket.

If you use a private repository that is hosted in your on-premises environment, configure a private network path from your repository to your Google Cloud environment.

Identity provider: The blueprint assumes an on-premises Active Directory and federates identities to Cloud Identity using Google Cloud Directory Sync.

If you already use Google Workspace, you can use the Google identities that are already managed in Google Workspace.

If you don't have an existing identity provider, you might create and manage user identities directly in Cloud Identity.

If you have an existing identity provider, such as Okta, Ping, or Azure Entra ID, you might manage user accounts in your existing identity provider and synchronize to Cloud Identity.

If you have data sovereignty or compliance requirements that prevent you from using Cloud Identity, and if you don't require managed Google user identities for other Google services such as Google Ads or Google Marketing Platform, then you might prefer workforce identity federation. In this scenario, be aware of limitations with supported services.

Multiple regions: The blueprint deploys regional resources into two different Google Cloud regions to help enable workload design with high availability and disaster recovery requirements in mind.

If you have end users in more geographical locations, you might configure more Google Cloud regions to create resources closer to the end user with less latency.

If you have data sovereignty constraints or your availability needs can be met in a single region, you might configure only one Google Cloud region.

IP address allocation: The blueprint provides a set of IP address ranges.

You might need to change the specific IP address ranges that are used based on the IP address availability in your existing hybrid environment. If you modify the IP address ranges, use the blueprint as guidance for the number and size of ranges required, and review the valid IP address ranges for Google Cloud.

Hybrid networking: The blueprint uses Dedicated Interconnect across multiple physical sites and Google Cloud regions for maximum bandwidth and availability.

Depending on your requirements for cost, bandwidth, and reliability requirements, you might configure Partner Interconnect or Cloud VPN instead.

If you need to start deploying resources with private connectivity before a Dedicated Interconnect can be completed, you might start with Cloud VPN and change to using Dedicated Interconnect later.

If you don't have an existing on-premises environment, you might not need hybrid networking at all.

VPC Service Controls perimeter: The blueprint recommends a single perimeter which includes all the service projects that are associated with a restricted VPC network. Projects that are associated with a base VPC network are not included inside the perimeter.

You might have a use case that requires multiple perimeters for an organization or you might decide not to use VPC Service Controls at all.

For information, see decide how to mitigate data exfiltration through Google APIs.

Secret Manager: The blueprint deploys a project for using Secret Manager in the common folder for organization-wide secrets, and a project in each environment folder for environment-specific secrets.

If you have a single team who is responsible for managing and auditing sensitive secrets across the organization, you might prefer to use only a single project for managing access to secrets.

If you let workload teams manage their own secrets, you might not use a centralized project for managing access to secrets, and instead let teams use their own instances of Secret Manager in workload projects.

Cloud KMS: The blueprint deploys a project for using Cloud KMS in the common folder for organization-wide keys, and a project for each environment folder for keys in each environment.

If you have a single team who is responsible for managing and auditing encryption keys across the organization, you might prefer to use only a single project for managing access to keys. A centralized approach can help meet compliance requirements like PCI key custodians.

If you let workload teams manage their own keys, you might not use a centralized project for managing access to keys, and instead let teams use their own instances of Cloud KMS in workload projects.

Aggregated log sinks: The blueprint configures a set of log sinks at the organization node so that a central security team can review audit logs from across the entire organization.

You might have different teams who are responsible for auditing different parts of the business, and these teams might require different logs to do their jobs. In this scenario, design multiple aggregated sinks at the appropriate folders and projects and create filters so that each team receives only the necessary logs, or design log views for granular access control to a common log bucket.

Monitoring scoping projects: The blueprint configures a single monitoring scoping project for each environment.

You might configure more granular scoping projects that are managed by different teams, scoped to the set of projects that contain the applications that each team manages.

Granularity of infrastructure pipelines: The blueprint uses a model where each business unit has a separate infrastructure pipeline to manage their workload projects.

You might prefer a single infrastructure pipeline that is managed by a central team if you have a central team who is responsible for deploying all projects and infrastructure. This central team can accept pull requests from workload teams to review and approve before project creation, or the team can create the pull request themselves in response to a ticketed system.

You might prefer more granular pipelines if individual workload teams have the ability to customize their own pipelines and you want to design more granular privileged service accounts for the pipelines.

SIEM exports:The blueprint manages all security findings in Security Command Center.

Decide whether you will export security findings from Security Command Center to tools such as Google Security Operations or your existing SIEM, or whether teams will use the console to view and manage security findings. You might configure multiple exports with unique filters for different teams with different scopes and responsibilities.

DNS lookups for Google Cloud services from on-premises: The blueprint configures a unique Private Service Connect endpoint for each Shared VPC, which can help enable designs with multiple VPC Service Controls perimeters.

You might not require routing from an on-premises environment to Private Service Connect endpoints at this level of granularity if you don't require multiple VPC Service Control perimeters.

Instead of mapping on-premises hosts to Private Service Connect endpoints by environment, you might simplify this design to use a single Private Service Connect endpoint with the appropriate API bundle, or use the generic endpoints for private.googlepais.com and restricted.googleapis.com.

What's next