This guide introduces best practices to help enterprise customers like you on your journey to Google Cloud Platform (GCP). The guide is not an exhaustive list of recommendations. Instead, its goal is to help enterprise architects and technology stakeholders understand the scope of activities and plan accordingly. Each section provides key actions and includes links for further reading.
Before you read this guide, we recommend reviewing the Platform overview in order to understand the overall GCP landscape.
Define your resource hierarchy
GCP resources are organized hierarchically. This hierarchy allows you to map your enterprise's operational structure to GCP, and to manage access control and permissions for groups of related resources. The following diagram shows an example hierarchy.
The top-level node of the hierarchy is the Organization resource, which represents an organization (for example, a company). The Organization resource provides central visibility and control over all resources further down the hierarchy.
Next in the hierarchy are folders. You can use folders to isolate requirements for different departments and teams in the parent organization. You can similarly use folders to separate production resources from development resources.
At the bottom of the hierarchy are projects. Projects contain the computing, storage, and networking resources that constitute your apps. Projects are discussed in more detail later in this document.
The structure you define is flexible and allows you to adapt to evolving requirements. If you are just beginning your GCP journey, adopt the simplest structure that satisfies your initial requirements. See the Resource Manager overview for full details.
Create an Organization node
Many of the features supported by GCP require an Organization
node. You can create an Organization node that maps to your corporate internet
domain, such as
You can migrate your existing GCP projects and billing accounts
into the Organization node. For more details, see
creating and managing organizations.
If you need help setting up, see the Organization setup wizard.
Specify your project structure
A project is required in order to use GCP. All GCP resources, such as Compute Engine virtual machines and Cloud Storage buckets, belong to a single project. For more information about projects, see the Platform overview.
You control the scope of your projects. A single project might contain multiple separate apps, or conversely a single app might include several projects. Projects can contain resources spread across multiple regions and geographies.
The ideal project structure depends on your individual requirements, and might evolve over time. When designing project structure, determine whether resources need to be billed separately, what degree of isolation is required, and how the teams that manage the resources and apps are organized.
Automate project creation
When you automate the creation and management of your GCP projects and resources, you get benefits such as consistency, reproducibility, and testability. Treating your configuration as code allows you to version and manage the lifecycle of your configuration alongside your software artifacts. Automation allows you to support best practices such as consistent naming conventions and labeling of resources. As your requirements evolve, automation also simplifies the refactoring of your projects.
For GCP projects, use Cloud Deployment Manager, which is the GCP native management tool. With Cloud Deployment Manager, you create a configuration file that describes a set of GCP resources that you want to deploy together. You can define parameterized templates that act as reusable building blocks. Cloud Deployment Manager can also set access control permissions through Cloud IAM such that your developers are granted appropriate access as part of the project creation process.
If you already use tools like Terraform, Ansible, or Puppet, you can use those instead. This lets you take advantage of skills your team already has.
Identity and access management
Manage your Google identities
GCP uses Google accounts for authentication and access management. Your developers and other technical staff must have Google accounts to access GCP. We recommend using fully managed Google accounts tied to your corporate domain name through Cloud Identity. This way, your developers can access GCP using their corporate email IDs, and your admins can see and control the accounts through the Admin Console. Subsequent sections of this doc describe how to integrate your existing identity platform with Cloud Identity.
Cloud Identity is a stand-alone Identity-as-a-Service (IDaaS) solution. It gives Cloud Platform customers access to many of the identity management capabilities provided by G Suite, Google Cloud's set of workplace productivity apps. Cloud Identity does not require a G Suite license. Signing up for Cloud Identity provides a management layer over Google accounts that are associated with your corporate domain name. Through this management layer, you can enable or disable access to Google services, including GCP, for your employees. Signing up for Cloud Identity also creates an Organization node for your domain, which helps map corporate structure and controls to your GCP resources through the Resource Hierarchy.
For more information see Cloud Identity Solutions.
Federate your identity provider with GCP
If your organization uses an on-premises or third-party identity provider, synchronize your user directory with Cloud Identity to let users access GCP with their corporate credentials. This way, your identity platform remains the source of truth while Cloud Identity controls how your employees access Google services.
Migrate unmanaged accounts
If members of your domain have used their corporate email addresses to create a personal Google Account—for example, to sign up for a Google service such as YouTube or Blogger, then consider migrating these accounts so that they can also be managed using Cloud Identity. Alternatively, you can force these accounts to be changed to use a different email address.
You can find additional guidance on how to migrate accounts or force accounts to be renamed, in the Cloud Identity documentation.
Control access to resources
You must authorize your developers and IT staff to consume GCP resources. You can use Cloud Identity and Access Management (IAM) to grant granular access to specific GCP resources and prevent unwanted access to other resources. Specifically, Cloud IAM enables you to control access by defining who (identity) has what access (role) for which resource.
Rather than directly assigning permissions, you assign roles. Cloud IAM roles are collections of permissions. For example, the BigQuery Data Viewer role contains the permissions to list, read, and query BigQuery tables, but does not include permissions to create new tables or modify existing data. Cloud IAM provides many predefined roles to handle a wide range of common use cases. It also enables you to create custom roles.
Use Cloud IAM to apply the security principle of least privilege, so you grant only the necessary access to your resources. Cloud IAM is a fundamental topic for enterprise organizations. For more information about identity and access management, see the following resources:
Delegate responsibility with groups and service accounts
We recommend collecting users with the same responsibilities into groups and assigning Cloud IAM roles to the groups rather than to individual users. For example, you can create a "data scientist" group and assign appropriate roles to enable interaction with BigQuery and Cloud Storage. When a new data scientist joins your team, you can simply add them to the group and they will inherit the defined permissions. You can create and manage groups through the Admin Console.
A service account is a special type of Google Account that represents a Google Cloud service identity or app rather than an individual user. Like users and groups, service accounts can be assigned IAM roles to grant access to specific resources. Service accounts authenticate with a key rather than a password. Google manages and rotates the service account keys for code running on GCP. We recommend that you use service accounts for server-to-server interactions.
Define an organization policy
Use the Organization Policy Service to get centralized and programmatic control over your organization's cloud resources. Cloud IAM focuses on who, providing the ability to authorize users and groups to take action on specific resources based on permissions. An organization policy focuses on what, providing the ability to set restrictions on specific resources to determine how they can be configured and used. For example, you can define a constraint to restrict virtual machine instances from having an external IP address.
You set policies on resources in the resource hierarchy. All descendants of a resource inherit its policies by default. You can define a base set of constraints that apply to all elements in the hierarchy by attaching a policy to the top-level Organization node. You can then set custom organization policies on child nodes, which overwrite or merge with the inherited policy.
For more information about setting policies, see Policy Design for Enterprise Customers.
Networking and security
Use VPC to define your network
Use VPCs and subnets to map out your network, and to group and isolate related resources. Virtual Private Cloud (VPC) is a virtual version of a physical network. VPC networks provide scalable and flexible networking for your Compute Engine virtual machine (VM) instances, and for the services that leverage VM instances, including Google Kubernetes Engine (GKE), Cloud Dataproc, and Cloud Dataflow, among others.
VPC networks are global resources; a single VPC can span multiple regions without communicating over the public internet. This means you can connect and manage resources distributed across the globe from a single GCP project, and you can create multiple, isolated VPC networks in a single project.
VPC networks themselves do not define IP address ranges. Instead, each VPC network consists of one or more partitions called subnetworks. Each subnet in turn defines one or more IP address ranges. Subnets are regional resources; each subnet is explicitly associated with a single region.
For more details, see the VPC overview.
Manage traffic with firewall rules
Each VPC network implements a distributed virtual firewall. Configure firewall rules that allow or deny traffic to and from the resources attached to the VPC, including Compute Engine VM instances and GKE clusters. Firewall rules are applied at the virtual networking level, so they help provide effective protection and traffic control regardless of the operating system your instances use. The firewall is stateful, which means that for flows that are permitted, return traffic is automatically allowed.
Firewall rules are specific to a particular VPC network. The rules allow you to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, tags, and service accounts. For example, you can create an ingress rule to allow any VM instance associated with a particular service account to accept TCP traffic on port 80 that originated from a specific source subnet. Each VPC automatically includes default and implied firewall rules.
If your app is hosted in GKE, there are different considerations for managing network traffic and configuring firewall rules. For more details, see GKE networking concepts.
Limit external access
When you create a GCP resource that leverages VPC, you choose a network and subnet to place the resource in. The resource is assigned an internal IP address from one of the IP ranges associated with the subnet. Resources in a VPC network can communicate among themselves through internal IP addresses as long as firewall rules permit.
To communicate with the internet, resources must have an external, public IP address or must use Cloud NAT. Similarly, resources must possess an external IP address to connect to other resources outside of the same VPC network, unless the networks are connected in some way—for example, through a VPN. For more details, see the IP addresses documentation.
Limit access to the internet to only those resources that need it. Resources with only a private, internal IP address can still access many Google APIs and services through Private Google Access. This private access enables resources to interact with key Google and GCP services while remaining isolated from the internet.
Centralize network control
Use Shared VPC to connect to a common VPC network. Resources in those projects can communicate with each other securely and efficiently across project boundaries using internal IPs. You can manage shared network resources, such as subnets, routes, and firewalls, from a central host project, enabling you to apply and enforce consistent network policies across the projects.
With Shared VPC and IAM controls, you can separate network administration from project administration. This separation helps you implement the principle of least privilege. For example, a centralized network team can administer the network without having any permissions into the participating projects. Similarly, the project admins can manage their project resources without any permissions to manipulate the shared network.
Connect your enterprise network
Many enterprises need to connect existing on-premises infrastructure with their GCP resources. Evaluate your bandwidth, latency, and SLA requirements to choose the best connection option:
If you need low-latency, highly available, enterprise-grade connections that enable you to reliably transfer data between your on-premises and VPC networks without traversing the internet connections to GCP, use Cloud Interconnect:
- Dedicated Interconnect provides a direct physical connection between your on-premises network and Google's network.
- Partner Interconnect provides connectivity between your on-premises and GCP VPC networks through a supported service provider.
If you don't require the low latency and high availability of Cloud Interconnect, or you are just starting on your cloud journey, use Cloud VPN to set up encrypted IPsec VPN tunnels between your on-premises network and VPC. Compared to a direct, private connection, an IPsec VPN tunnel has lower overhead and costs.
Secure your apps and data
GCP provides robust security features across its infrastructure and services, from the physical security of data centers and custom security hardware to dedicated teams of researchers. However, securing your GCP resources is a shared responsibility. You must take appropriate measures to help ensure that your apps and data are protected.
In addition to firewall rules and VPC isolation, use these additional tools to help secure and protect your apps:
- Use VPC Service Controls to define a security perimeter around your GCP resources to constrain data within a VPC and help mitigate data exfiltration risks.
- Use a GCP global HTTP(S) load balancer to support high availability and scaling for your internet-facing services.
- Integrate Cloud Armor with the HTTP(S) load balancer to provide DDoS protection and the ability to blacklist and whitelist IP addresses at the network edge.
- Control access to apps by using Cloud Identity-Aware Proxy (Cloud IAP) to verify user identity and the context of the request to determine if a user should be granted access.
GCP helps keep your data secure by applying encryption both in transit and at rest. Data at rest is encrypted by default using encryption keys managed by Google. For sensitive data, you can instead manage your keys in GCP. If you need greater control, you can supply your own encryption keys that are maintained outside of GCP. Because managing or maintaining your own keys introduces overhead, we recommend using that approach only for truly sensitive data. For more details, see encryption at rest.
Logging, monitoring, and operations
Centralize logging and monitoring
Enterprises typically run multiple apps, data pipelines, and other processes, often across different platforms. Ensuring the health of these apps and processes is a key responsibility of developers and operations teams alike. To help ensure health, we recommend using Stackdriver to manage logging, monitoring, debugging, tracing, profiling, and more.
Logs are a primary source of diagnostic information about the health of your apps and processes. Stackdriver Logging is part of the Stackdriver suite and allows you to store, view, search, analyze, and alert on log data and events. Logging integrates natively with many GCP services. The service includes a logging agent and API to support apps running on Amazon EC2 instances and on-premises. Use Logging to centralize logs from all your apps into Stackdriver.
In addition to consuming logs, you typically need to monitor other aspects of your apps and systems to ensure reliable operation. Use Stackdriver Monitoring to get visibility into the performance, uptime, and overall health of your apps and infrastructure. Monitoring ingests events, metrics, and metadata and generates insights through dashboards, charts, and alerts. Monitoring supports metrics from many GCP and third-party sources out of the box. You can also define custom metrics with Monitoring. For example, you can use metrics to define alerting policies such that operations teams are notified of unusual behaviour or trends. Monitoring also provides flexible dashboards and rich visualization tools to help identify emergent issues.
Set up an audit trail
In addition to capturing app and process-level logs, you might need to track and maintain details of how your developers and IT teams are interacting with GCP resources. Use Cloud Audit Logging to help answer questions like "who did what, where, and when" in your GCP projects. For a list of GCP services that write audit logs, see services producing audit logs. Use IAM controls to limit who has access to view audit logs.
Cloud Audit Logging captures several types of activity. Admin Activity logs contain log entries for API calls or other administrative actions that modify the configuration or metadata of resources. Admin Activity logs are always enabled. Data Access audit logs record API calls that create, modify, or read user-provided data. Data Access audit logs are disabled by default because they can be quite large. You can configure which GCP services produce data access logs.
For more in-depth information about auditing, see Best practices for working with Cloud Audit Logging.
Export your logs
Logging retains app and audit logs for a limited period of time. You might need to retain logs for longer periods to meet compliance obligations. Alternatively, you might want to keep logs for historical analysis.
You can export logs to Cloud Storage, BigQuery, and Cloud Pub/Sub. Using filters, you can include or exclude resources from the export. For example, you can export all Compute Engine logs but exclude high-volume logs from Cloud Load Balancing.
Where you export your logs depends on your use case. Many enterprises export to multiple destinations. Broadly speaking, to meet compliance obligations, use Cloud Storage for long-term storage. If you need to analyze logs, use BigQuery, because it supports SQL querying and a large ecosystem of third-party tools.
For more details on logging exports, see Design patterns for exporting from Logging.
Embrace DevOps and explore Site Reliability Engineering
To increase agility and reduce time-to-market for apps and features, you need to break down silos between development, operations, networking, and security teams. Doing so requires processes, culture, and tooling that are together referred to as DevOps.
GCP provides a range of services to help you adopt DevOps practices. Features include integrated source code repositories, continuous-delivery tooling, rich monitoring capabilities through Stackdriver, and strong support for open source tools. For more details, see GCP's DevOps solutions.
Site Reliability Engineering (SRE) is a set of practices closely related to DevOps. These practices evolved from the SRE team that manages Google's production infrastructure. While creating a dedicated SRE function is beyond the scope of many enterprises, we recommend that you study the SRE books to learn practices that can help shape your operations strategy.
Plan your migration
Migrating on-premises apps and infrastructure to the cloud requires careful assessment and planning. You must evaluate the various migration strategies, from lift-and-shift to transform-and-move, on a per-app basis. GCP provides tools to help migrate virtual machines, transfer your data, and modernize your workloads. For more details, see the migration center.
Because of regulatory, technical, or financial constraints, it might not be possible or even desirable to move certain apps to the public cloud. Consequently, you might need to distribute and integrate workloads across your on-premises and GCP infrastructure. This setup is referred to as hybrid cloud. For more details on hybrid workloads, see the hybrid cloud page and the patterns and best practices for hybrid and multi-cloud solutions.
Favor managed services
Key drivers of cloud adoption are reducing IT overhead and increasing efficiencies, enabling you to focus on your core business. In addition to adopting DevOps practices and increasing automation, you should use GCP managed services to help reduce operational burden and total cost of ownership (TCO).
Rather than independently installing, supporting, and operating all parts of an application stack, you can use managed services to consume parts of your application stack as services. For example, rather than installing and self-managing a MySQL database on a VM instance, you can use a MySQL database provided by Cloud SQL. You can then rely on GCP to manage the underlying infrastructure and automate backups, updates, and replication.
We recommend that you evaluate the SLA provided by each managed service.
GCP offers managed services and serverless options for many common app components and use cases, from managed databases to big-data processing tools. Many of these managed services support popular open source frameworks and platforms, so you can realize TCO benefits by lifting and shifting existing apps that leverage these open source platforms into the cloud.
Design for high availability
To help maintain uptime for mission-critical apps, design resilient apps that gracefully handle failures or unexpected changes in load. High availability is the ability of an app to remain responsive and continue to function despite failures of components in the system. Highly available architectures typically involve distribution of compute resources, load-balancing, and replication of data. The extent of your high availability provisions might vary by app. For more information about availability concepts, see the geographies and regions documentation.
At a minimum, you should distribute compute resources, such as Compute Engine VM instances and GKE clusters, across the available zones in a region to protect against failure of a particular zone. To further improve the availability of your compute resources, you can similarly distribute them across multiple geographically dispersed regions to mitigate against loss of an entire region. For guidance on where to create your compute resources, see best practices for Compute Engine region selection.
GCP offers several variations of load balancing. The HTTP(S) load balancer is often used to expose internet-facing apps. This load balancer provides global balancing, allowing distribution of load across regions in different geographies. If a zone or region becomes unavailable, the load balancer directs traffic to a zone with available capacity. For more details, see application capacity optimizations with global load balancing.
Also consider availability when choosing your data storage. Some GCP data storage services offer the ability to replicate data across zones in a single region. Other services automatically replicate data across multiple regions in a geographical area, but might require a trade-off on either latency or the consistency model. Which data storage service is most appropriate varies by app and availability requirements.
Plan your disaster recovery strategy
In addition to designing for high availability, you should create a plan for maintaining business continuity in the event of a wide-scale outage or natural disaster. Disaster recovery (DR) is the ability to recover from rare but major incidents. When high availability provisions are ineffective or unavailable, you might need to initiate disaster recovery.
Creating an effective DR plan requires planning and testing. We recommend that you address DR plans early. For more details, see the disaster recovery planning guide and related articles.
- Building scalable and resilient web applications on GCP
- Designing and implementing your disaster recovery plan using GCP
- Geography and regions
Billing and management
Know how resources are charged.
Set up billing and permissions.
Analyze and export your bill.
Plan for your capacity requirements.
Implement cost controls.
Purchase a support package.
Get help from the experts.
Build centers of excellence.
Know how resources are charged
GCP operates on a consumption model. You are charged based on how much of a particular resource or product you use over a given payment period. Products measure consumption in different ways, for example:
- As an amount of time (how many seconds a machine was running)
- As a volume (how much data is stored)
- As the number of operations executed
- As variations of those concepts
Make sure you understand how the billing works for the components in your system so that you can accurately gauge your costs. Each product provides detailed pricing information in its documentation. Many products provide a Free Tier where any consumption below a certain threshold does not incur any charges. To consume resources beyond what is offered by the Free Tier, you must enable billing.
For more details about GCP pricing philosophies, innovations, and discounts, see the Pricing page.
Set up billing controls
All GCP resources, including Compute Engine VMs, Cloud Storage buckets, and BigQuery datasets, must be associated with a GCP project. To consume resources beyond what is offered by the Free Tier, a project must be associated with a billing account. There is a one-to-many relationship between billing accounts and projects; a project can be associated with only one billing account, but a billing account can be associated with many projects.
Use a billing account to define who pays for the resources in a set of projects. The account includes a payment instrument, such as a credit card, to which costs are charged. You can define billing accounts at the organization level, where you link projects under the Organization node to the billing accounts. You can have multiple billing accounts in your organization to reflect different cost centers or departments.
Cloud IAM provides a robust set of controls to limit how different users can administer and interact with billing. These controls help you apply the principle of least privilege and provide a clear separation of roles. For example, you can separate permission to create billing accounts from permission to link projects to a particular billing account.
For a detailed discussion of billing concepts and setup, see the Billing onboarding checklist.
Analyze and export your bill
Users with appropriate permissions can view a detailed breakdown of costs, transaction history, and more in the GCP Console. Information is presented per billing account. The console also contains interactive billing reports that allow you to filter and break down costs by project, product, and time range. The GCP Console functionality is often sufficient for customers with less complicated GCP setups.
You typically require custom analyses and reporting on your cloud expenditure, however. To meet this requirement, enable daily exports of billing charges. You can configure file exports to export a CSV or JSON file to a Cloud Storage bucket. Similarly, you can configure exports to a BigQuery dataset. Exports will include any labels that have been applied to resources.
We recommend that you enable BigQuery exports. These exports provide a finer-grained breakdown of costs compared to the file export. Once the billing data is in BigQuery, finance teams can analyze the data using standard SQL and use tools that integrate with BigQuery.
Plan for your capacity requirements
GCP projects have quotas that limit the consumption of a particular resource or API. Quotas are in place to protect the wider GCP community by preventing unforeseen spikes in usage. For example, quotas ensure that a small number of customers or projects cannot monopolize usage of CPU cores in a particular region or zone.
Plan the capacity requirements of your projects in advance to prevent unexpected limiting of your resource consumption. If the quotas are not sufficient, you can request changes in the Quotas section of the GCP Console. If you require a large capacity, contact your GCP sales team.
Implement cost controls
As cloud services scale up, their costs also go up. GCP provides several methods to limit resource consumption, and to notify interested parties of relevant billing events.
You can define budgets that generate alerts when spending reaches certain thresholds. Alerts take the form of emails and can optionally generate Cloud Pub/Sub messages for programmatic notification. You can apply the budget to the entire billing account or to an individual project that is linked to the billing account. For example, you could create a budget to generate alerts when total monthly spending for a billing account reaches 50, 80, and 100 percent of the specified budget amount. Note that budgets do not themselves limit spending; rather, they are a function to generate alerts. For more details, see the budget alerts documentation. For more best practices, design decisions, and configuration options that help simplify cost management, see Cloud Billing onboarding checklist
You can also use quotas to cap the consumption of a particular resource. For example, you can set a maximum "query usage per day" quota over the BigQuery API to ensure that a project does not overspend on BigQuery.
Purchase a support package
GCP provides various ways to get support when you experience problems, from community forums to paid support packages. To protect your business-critical workloads, we recommend purchasing an Enterprise support package. For details, see the GCP support portal.
Depending on the level of support you purchase, your ability to raise support tickets could be limited to certain individuals. So it's a good practice to establish a support clearinghouse or triage desk. This approach helps you avoid ticket duplication and miscommunication, and keeps your communication with Google Cloud Support as clear as possible.
Get help from the experts
The Google Cloud Professional Services organization (PSO) offers consulting services to help you on your GCP journey. Contact PSO consultants, who can provide deep expertise to educate your team on best practices and guiding principles for a successful implementation. Services are delivered in the form of packages to help you plan, deploy, execute, and optimize workloads.
GCP also has a strong ecosystem of Google Cloud partners, from large global systems integrators to partners with a deep specialization in a particular area like machine learning. Partners have demonstrated customer success using GCP and can accelerate your projects and improve business outcomes. We recommend that enterprise customers engage partners to help plan and execute their GCP implementation.
Build centers of excellence
Google continues to invest in these products, and new features are continually being rolled out. It can be valuable to capture your organization's information, experience, and patterns in an internal knowledge base, such as a wiki, Google site, or intranet site.
Similarly, it's a good practice to nominate GCP experts and champions in your organization. A range of training and certification options are available to help nominated champions grow in their area of expertise. Teams can stay up to date on the latest news, announcements, and customer stories by subscribing to the Google Cloud blog.
- Review the Launch checklist.
- Explore GCP Solutions to optimize your business.
Compare with other platforms:
Visit the blog.
Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.