Optimize resource usage

Last reviewed 2024-09-25 UTC

This principle in the cost optimization pillar of the Google Cloud Well-Architected Framework provides recommendations to help you plan and provision resources to match the requirements and consumption patterns of your cloud workloads.

Principle overview

To optimize the cost of your cloud resources, you need to thoroughly understand your workloads resource requirements and load patterns. This understanding is the basis for a well defined cost model that lets you forecast the total cost of ownership (TCO) and identify cost drivers throughout your cloud adoption journey. By proactively analyzing and forecasting cloud spending, you can make informed choices about resource provisioning, utilization, and cost optimization. This approach lets you control cloud spending, avoid overprovisioning, and ensure that cloud resources are aligned with the dynamic needs of your workloads and environments.

Recommendations

To effectively optimize cloud resource usage, consider the following recommendations.

Choose environment-specific resources

Each deployment environment has different requirements for availability, reliability and scalability. For example, developers might prefer an environment that lets them rapidly deploy and run applications for short durations, but might not need high availability. On the other hand, a production environment typically needs high availability. To maximize the utilization of your resources, define environment-specific requirements based on your business needs. The following table lists examples of environment-specific requirements.

Environment	Requirements
Production	High availability Predictable performance Operational stability Security with robust resources
Development and testing	Cost efficiency Flexible infrastructure with burstable capacity Ephemeral infrastructure when data persistence is not necessary
Other environments (like staging and QA)	Tailored resource allocation based on environment-specific requirements

Choose workload-specific resources

Each of your cloud workloads might have different requirements for availability, scalability, security, and performance. To optimize costs, you need to align resource choices with the specific requirements of each workload. For example, a stateless application might not require the same level of availability or reliability as a stateful backend. The following table lists more examples of workload-specific requirements.

Workload type	Workload requirements	Resource options
Mission-critical	Continuous availability, robust security, and high performance	Premium resources and managed services like Spanner for high availability and global consistency of data.
Non-critical	Cost-efficient and autoscaling infrastructure	Resources with basic features and ephemeral resources like Spot VMs.
Event-driven	Dynamic scaling based on the current demand for capacity and performance	Serverless services like Cloud Run and Cloud Run functions.
Experimental workloads	Low cost and flexible environment for rapid development, iteration, testing, and innovation	Resources with basic features, ephemeral resources like Spot VMs, and sandbox environments with defined spending limits.

A benefit of the cloud is the opportunity to take advantage of the most appropriate computing power for a given workload. Some workloads are developed to take advantage of processor instruction sets, and others might not be designed in this way. Benchmark and profile your workloads accordingly. Categorize your workloads and make workload-specific resource choices (for example, choose appropriate machine families for Compute Engine VMs). This practice helps to optimize costs, enable innovation, and maintain the level of availability and performance that your workloads need.

The following are examples of how you can implement this recommendation:

For mission-critical workloads that serve globally distributed users, consider using Spanner. Spanner removes the need for complex database deployments by ensuring reliability and consistency of data in all regions.
For workloads with fluctuating load levels, use autoscaling to ensure that you don't incur costs when the load is low and yet maintain sufficient capacity to meet the current load. You can configure autoscaling for many Google Cloud services, including Compute Engine VMs, Google Kubernetes Engine (GKE) clusters, and Cloud Run. When you set up autoscaling, you can configure maximum scaling limits to ensure that costs remain within specified budgets.

Select regions based on cost requirements

For your cloud workloads, carefully evaluate the available Google Cloud regions and choose regions that align with your cost objectives. The region with lowest cost might not offer optimal latency or it might not meet your sustainability requirements. Make informed decisions about where to deploy your workloads to achieve the desired balance. You can use the Google Cloud Region Picker to understand the trade-offs between cost, sustainability, latency, and other factors.

Use built-in cost optimization options

Google Cloud products provide built-in features to help you optimize resource usage and control costs. The following table lists examples of cost optimization features that you can use in some Google Cloud products:

Product	Cost optimization feature
Compute Engine	Automatically add or remove VMs based on the current load by using autoscaling. Avoid overprovisioning by creating and using custom machine types that match your workload's requirements. For non-critical or fault-tolerant workloads, reduce costs by using Spot VMs. In development environments, reduce costs by limiting the run time of VMs or by suspending or stopping VMs when you don't need them.
GKE	Automatically adjust the size of GKE clusters based on the current load by using cluster autoscaler. Automatically create and manage node pools based on workload requirements and ensure optimal resource utilization by using node auto-provisioning.
Cloud Storage	Automatically transition data to lower-cost storage classes based on the age of data or based on access patterns by using Object Lifecycle Management. Dynamically move data to the most cost-effective storage class based on usage patterns by using Autoclass.
BigQuery	Reduce query processing costs for steady-state workloads by using capacity-based pricing. Optimize query performance and costs by using partitioning and clustering techniques.
Google Cloud VMware Engine	Reduce VMware costs by using cost-optimization strategies like CUDs, optimizing storage consumption, and rightsizing ESXi clusters.

Optimize resource sharing

To maximize the utilization of cloud resources, you can deploy multiple applications or services on the same infrastructure, while still meeting the security and other requirements of the applications. For example, in development and testing environments, you can use the same cloud infrastructure to test all the components of an application. For the production environment, you can deploy each component on a separate set of resources to limit the extent of impact in case of incidents.

The following are examples of how you can implement this recommendation:

Use a single Cloud SQL instance for multiple non-production environments.
Enable multiple development teams to share a GKE cluster by using the fleet team management feature in GKE with appropriate access controls.
Use GKE Autopilot to take advantage of cost-optimization techniques like bin packing and autoscaling that GKE implements by default.
For AI and ML workloads, save GPU costs by using GPU-sharing strategies like multi-instance GPUs, time-sharing GPUs, and NVIDIA MPS.

Develop and maintain reference architectures

Create and maintain a repository of reference architectures that are tailored to meet the requirements of different deployment environments and workload types. To streamline the design and implementation process for individual projects, the blueprints can be centrally managed by a team like a Cloud Center of Excellence (CCoE). Project teams can choose suitable blueprints based on clearly defined criteria, to ensure architectural consistency and adoption of best practices. For requirements that are unique to a project, the project team and the central architecture team should collaborate to design new reference architectures. You can share the reference architectures across the organization to foster knowledge sharing and expand the repository of available solutions. This approach ensures consistency, accelerates development, simplifies decision-making, and promotes efficient resource utilization.

Review the reference architectures provided by Google for various use cases and technologies. These reference architectures incorporate best practices for resource selection, sizing, configuration, and deployment. By using these reference architectures, you can accelerate your development process and achieve cost savings from the start.

Enforce cost discipline by using organization policies

Consider using organization policies to limit the available Google Cloud locations and products that team members can use. These policies help to ensure that teams adhere to cost-effective solutions and provision resources in locations that are aligned with your cost optimization goals.

Estimate realistic budgets and set financial boundaries

Develop detailed budgets for each project, workload, and deployment environment. Make sure that the budgets cover all aspects of cloud operations, including infrastructure costs, software licenses, personnel, and anticipated growth. To prevent overspending and ensure alignment with your financial goals, establish clear spending limits or thresholds for projects, services, or specific resources. Monitor cloud spending regularly against these limits. You can use proactive quota alerts to identify potential cost overruns early and take timely corrective action.

In addition to setting budgets, you can use quotas and limits to help enforce cost discipline and prevent unexpected spikes in spending. You can exercise granular control over resource consumption by setting quotas at various levels, including projects, services, and even specific resource types.

The following are examples of how you can implement this recommendation:

Project-level quotas: Set spending limits or resource quotas at the project level to establish overall financial boundaries and control resource consumption across all the services within the project.
Service-specific quotas: Configure quotas for specific Google Cloud services like Compute Engine or BigQuery to limit the number of instances, CPUs, or storage capacity that can be provisioned.
Resource type-specific quotas: Apply quotas to individual resource types like Compute Engine VMs, Cloud Storage buckets, Cloud Run instances, or GKE nodes to restrict their usage and prevent unexpected cost overruns.
Quota alerts: Get notifications when your quota usage (at the project level) reaches a percentage of the maximum value.

By using quotas and limits in conjunction with budgeting and monitoring, you can create a proactive and multi-layered approach to cost control. This approach helps to ensure that your cloud spending remains within defined boundaries and aligns with your business objectives. Remember, these cost controls are not permanent or rigid. To ensure that the cost controls remain aligned with current industry standards and reflect your evolving business needs, you must review the controls regularly and adjust them to include new technologies and best practices.

Foster a culture of cost awareness

Optimize continuously