This principle in the cost optimization pillar of the Google Cloud Architecture Framework provides recommendations to help you plan and provision resources to match the requirements and consumption patterns of your cloud workloads.
Principle overview
To optimize the cost of your cloud resources, you need to thoroughly understand your workloads resource requirements and load patterns. This understanding is the basis for a well defined cost model that lets you forecast the total cost of ownership (TCO) and identify cost drivers throughout your cloud adoption journey. By proactively analyzing and forecasting cloud spending, you can make informed choices about resource provisioning, utilization, and cost optimization. This approach lets you control cloud spending, avoid overprovisioning, and ensure that cloud resources are aligned with the dynamic needs of your workloads and environments.
Recommendations
To effectively optimize cloud resource usage, consider the following recommendations.
Choose environment-specific resources
Each deployment environment has different requirements for availability, reliability and scalability. For example, developers might prefer an environment that lets them rapidly deploy and run applications for short durations, but might not need high availability. On the other hand, a production environment typically needs high availability. To maximize the utilization of your resources, define environment-specific requirements based on your business needs. The following table lists examples of environment-specific requirements.
Environment | Requirements |
Production |
|
Development and testing |
|
Other environments (like staging and QA) |
|
Choose workload-specific resources
Each of your cloud workloads might have different requirements for availability, scalability, security, and performance. To optimize costs, you need to align resource choices with the specific requirements of each workload. For example, a stateless application might not require the same level of availability or reliability as a stateful backend. The following table lists more examples of workload-specific requirements.
Workload type | Workload requirements | Resource options |
Mission-critical | Continuous availability, robust security, and high performance | Premium resources and managed services like Spanner for high availability and global consistency of data. |
Non-critical | Cost-efficient and autoscaling infrastructure | Resources with basic features and ephemeral resources like Spot VMs. |
Event-driven | Dynamic scaling based on the current demand for capacity and performance | Serverless services like Cloud Run and Cloud Run functions. |
Experimental workloads | Low cost and flexible environment for rapid development, iteration, testing, and innovation | Resources with basic features, ephemeral resources like Spot VMs, and sandbox environments with defined spending limits. |
A benefit of the cloud is the opportunity to take advantage of the most appropriate computing power for a given workload. Some workloads are developed to take advantage of processor instruction sets, and others might not be designed in this way. Benchmark and profile your workloads accordingly. Categorize your workloads and make workload-specific resource choices (for example, choose appropriate machine families for Compute Engine VMs). This practice helps to optimize costs, enable innovation, and maintain the level of availability and performance that your workloads need.
The following are examples of how you can implement this recommendation:
- For mission-critical workloads that serve globally distributed users, consider using Spanner. Spanner removes the need for complex database deployments by ensuring reliability and consistency of data in all regions.
- For workloads with fluctuating load levels, use autoscaling to ensure that you don't incur costs when the load is low and yet maintain sufficient capacity to meet the current load. You can configure autoscaling for many Google Cloud services, including Compute Engine VMs, Google Kubernetes Engine (GKE) clusters, and Cloud Run. When you set up autoscaling, you can configure maximum scaling limits to ensure that costs remain within specified budgets.
Select regions based on cost requirements
For your cloud workloads, carefully evaluate the available Google Cloud regions and choose regions that align with your cost objectives. The region with lowest cost might not offer optimal latency or it might not meet your sustainability requirements. Make informed decisions about where to deploy your workloads to achieve the desired balance. You can use the Google Cloud Region Picker to understand the trade-offs between cost, sustainability, latency, and other factors.
Use built-in cost optimization options
Google Cloud products provide built-in features to help you optimize resource usage and control costs. The following table lists examples of cost optimization features that you can use in some Google Cloud products:
Product | Cost optimization feature |
Compute Engine |
|
GKE |
|
Cloud Storage |
|
BigQuery |
|
Google Cloud VMware Engine |
|
Optimize resource sharing
To maximize the utilization of cloud resources, you can deploy multiple applications or services on the same infrastructure, while still meeting the security and other requirements of the applications. For example, in development and testing environments, you can use the same cloud infrastructure to test all the components of an application. For the production environment, you can deploy each component on a separate set of resources to limit the extent of impact in case of incidents.
The following are examples of how you can implement this recommendation:
- Use a single Cloud SQL instance for multiple non-production environments.
- Enable multiple development teams to share a GKE cluster by using the fleet team management feature in GKE Enterprise with appropriate access controls.
- Use GKE Autopilot to take advantage of cost-optimization techniques like bin packing and autoscaling that GKE implements by default.
- For AI and ML workloads, save GPU costs by using GPU-sharing strategies like multi-instance GPUs, time-sharing GPUs, and NVIDIA MPS.
Develop and maintain reference architectures
Create and maintain a repository of reference architectures that are tailored to meet the requirements of different deployment environments and workload types. To streamline the design and implementation process for individual projects, the blueprints can be centrally managed by a team like a Cloud Center of Excellence (CCoE). Project teams can choose suitable blueprints based on clearly defined criteria, to ensure architectural consistency and adoption of best practices. For requirements that are unique to a project, the project team and the central architecture team should collaborate to design new reference architectures. You can share the reference architectures across the organization to foster knowledge sharing and expand the repository of available solutions. This approach ensures consistency, accelerates development, simplifies decision-making, and promotes efficient resource utilization.
Review the reference architectures provided by Google for various use cases and technologies. These reference architectures incorporate best practices for resource selection, sizing, configuration, and deployment. By using these reference architectures, you can accelerate your development process and achieve cost savings from the start.
Enforce cost discipline by using organization policies
Consider using organization policies to limit the available Google Cloud locations and products that team members can use. These policies help to ensure that teams adhere to cost-effective solutions and provision resources in locations that are aligned with your cost optimization goals.
Estimate realistic budgets and set financial boundaries
Develop detailed budgets for each project, workload, and deployment environment. Make sure that the budgets cover all aspects of cloud operations, including infrastructure costs, software licenses, personnel, and anticipated growth. To prevent overspending and ensure alignment with your financial goals, establish clear spending limits or thresholds for projects, services, or specific resources. Monitor cloud spending regularly against these limits. You can use proactive quota alerts to identify potential cost overruns early and take timely corrective action.
In addition to setting budgets, you can use quotas and limits to help enforce cost discipline and prevent unexpected spikes in spending. You can exercise granular control over resource consumption by setting quotas at various levels, including projects, services, and even specific resource types.
The following are examples of how you can implement this recommendation:
- Project-level quotas: Set spending limits or resource quotas at the project level to establish overall financial boundaries and control resource consumption across all the services within the project.
- Service-specific quotas: Configure quotas for specific Google Cloud services like Compute Engine or BigQuery to limit the number of instances, CPUs, or storage capacity that can be provisioned.
- Resource type-specific quotas: Apply quotas to individual resource types like Compute Engine VMs, Cloud Storage buckets, Cloud Run instances, or GKE nodes to restrict their usage and prevent unexpected cost overruns.
- Quota alerts: Get notifications when your quota usage (at the project level) reaches a percentage of the maximum value.
By using quotas and limits in conjunction with budgeting and monitoring, you can create a proactive and multi-layered approach to cost control. This approach helps to ensure that your cloud spending remains within defined boundaries and aligns with your business objectives. Remember, these cost controls are not permanent or rigid. To ensure that the cost controls remain aligned with current industry standards and reflect your evolving business needs, you must review the controls regularly and adjust them to include new technologies and best practices.