Monitor and control cost

Last reviewed 2023-09-08 UTC

This document in Google Cloud Architecture Framework describes best practices, tools, and techniques to help you track and control the cost of your resources in Google Cloud.

The guidance in this section is intended for users who provision or manage resources in the cloud.

Cost-management focus areas

The cost of your resources in Google Cloud depends on the quantity of resources that you use and the rate at which you're billed for the resources.

To manage the cost of cloud resources, we recommend that you focus on the following areas:

  • Cost visibility
  • Resource optimization
  • Rate optimization

Cost visibility

Track how much you spend and how your resources and services are billed, so that you can analyze the effect of cost on business outcomes. We recommend that you follow the FinOps operating model, which suggests the following actions to make cost information visible across your organization:

  • Allocate: Assign an owner for every cost item.
  • Report: Make cost data available, consumable, and actionable.
  • Forecast: Estimate and track future spend.

Resource optimization

Align the number and size of your cloud resources to the requirements of your workload. Where feasible, consider using managed services or re-architecting your applications. Typically, individual engineering teams have more context than the central FinOps (financial operations) team on opportunities and techniques to optimize resource deployment. We recommend that the FinOps team work with the individual engineering teams to identify resource-optimization opportunities that can be applied across the organization.

Rate optimization

The FinOps team often makes rate optimization decisions centrally. We recommend that the individual engineering teams work with the central FinOps team to take advantage of deep discounts for reservations, committed usage, Spot VMs, flat-rate pricing, and volume and contract discounting.

Design recommendations

This section suggests approaches that you can use to monitor and control costs.

Consolidate billing and resource management

To manage billing and resources in Google Cloud efficiently, we recommend that you use a single billing account for your organization, and use internal chargeback mechanisms to allocate costs. Use multiple billing accounts for loosely structured conglomerates and organizations with entities that don't affect each other. For example, resellers might need distinct accounts for each customer. Using separate billing accounts might also help you meet country-specific tax regulations.

Another recommended best practice is to move all the projects that you manage into your organization. We recommend using Resource Manager to build a resource hierarchy that helps you achieve the following goals:

  • Establish a hierarchy of resource-ownership based on the relationship of each resource to its immediate parent.
  • Control how access policies and cost-allocation tags or labels are attached to and inherited by the resources in your organization.

In addition, we recommend that you allocate the cost of shared services proportionally based on consumption. Review and adjust the cost allocation parameters periodically based on changes in your business goals and priorities.

Track and allocate cost using tags or labels

Tags and labels are two different methods that you can use to annotate your Google Cloud resources. Tags provide more capabilities than labels. For example, you can implement fine-grained control over resources by creating Identity and Access Management (IAM) policies that are conditional based on whether a tag is attached to a supported resource. In addition, the tags that are associated with a resource are inherited by all the child resources in the hierarchy. For more information about the differences between tags and labels, see Tags overview.

If you're building a new framework for cost allocation and tracking, we recommend using tags.

To categorize cost data at the required granularity, establish a tagging or labeling schema that suits your organization's chargeback mechanism and helps you allocate costs appropriately. You can define tags at the organization or project level. You can assign labels at the project level, and define a set of labels that can be applied by default to all the projects.

Define a process to detect and correct tagging and labeling anomalies and unlabeled projects. For example, from Cloud Asset Inventory, you can download an inventory (.csv file) of all the resources in a project and analyze the inventory to identify resources that aren't assigned any tags or labels.

To track the cost of shared resources and services (for example, common datastores, multi-tenant clusters, and support subscriptions), consider using a special tag or label to identify projects that contain shared resources.

Configure billing access control

To control access to Cloud Billing, we recommend that you assign the Billing Account Administrator role to only those users who manage billing contact information. For example, employees in finance, accounting, and operations might need this role.

To avoid a single point of failure for billing support, assign the Billing Account Administrator role to multiple users or to a group. Only users with the Billing Account Administrator role can contact support. For detailed guidance, see Cloud Billing access control examples and Important Roles.

Make the following configurations to manage access to billing:

  • To associate a billing account with a project, members need the Billing Account User role on the billing account and the Project Billing Manager role on the project.
  • To enable teams to manually associate billing accounts with projects, you can assign the Project Billing Manager role at the organization level and the Billing Account User role on the billing account. You can automate the association of billing accounts during project creation by assigning the Project Billing Manager and Billing Account User roles to a service account. We recommend that you restrict the Billing Account Creator role or remove all assignments of this role.
  • To prevent outages caused by unintentional changes to the billing status of a project, you can lock the link between the project and its billing account. For more information, see Secure the link between a project and its billing account.

Configure billing reports

Set up billing reports to provide data for the key metrics that you need to track. We recommend that you track the following metrics:

  • Cost trends
  • Largest spenders (by project and by product)
  • Areas of irregular spending
  • Key organization-wide insights as follows:
    • Anomaly detection
    • Trends over time
    • Trends that occur in a set pattern (for example, month-on-month)
    • Cost comparison and benchmark analysis between internal and external workloads
    • Business case tracking and value realization (for example, cloud costs compared with the cost of similar on-premises resources)
    • Validation that Google Cloud bills are as expected and accurate

Customize and analyze cost reports using BigQuery Billing Export, and visualize cost data using Looker Studio. Assess the trend of actual costs and how much you might spend by using the forecasting tool.

Optimize resource usage and cost

This section recommendeds best practices to help you optimize the usage and cost of your resources across Google Cloud services.

To prevent overspending, consider configuring default budgets and alerts with high thresholds for all your projects. To help keep within budgets, we recommend that you do the following:

  • Configure budgets and alerts for projects where absolute usage limits are necessary (for example, training or sandbox projects).

  • Define budgets based on the financial budgets that you need to track. For example, if a department has an overall cloud budget, set the scope of the Google Cloud budget to include the specific projects that you need to track.

  • To ensure that budgets are maintained, delegate the responsibility for configuring budgets and alerts to the teams that own the workloads.

To help optimize costs, we also recommend that you do the following:

  • Cap API usage in cases where it has minimal or no business impact. Capping can be useful for sandbox or training projects and for projects with fixed budgets (for example, ad-hoc analytics in BigQuery). Capping doesn't remove all the resources and data from the associated projects.
  • Use quotas to set hard limits that throttle resource deployment. Quotas help you control cost and prevent malicious use or misuse of resources. Quotas are applied at the project level, per resource type and location.
  • View and implement the cost-optimization recommendations in the Recommendation Hub.
  • Purchase committed use discounts (CUD) to save money on resources for workloads with predictable resource needs.

Tools and techniques

The on-demand provisioning and pay-per-use characteristics of the cloud help you to optimize your IT spend. This section describes tools that Google Cloud provides and techniques that you can use to track and control the cost of your resources in the cloud. Before you use these tools and techniques, review the basic Cloud Billing concepts.

Billing reports

Google Cloud provides billing reports within the Google Cloud console to help you view your current and forecasted spend. The billing reports enable you to view cost data on a single page, discover and analyze trends, forecast the end-of-period cost, and take corrective action when necessary.

Billing reports provide the following data:

  • The costs and cost trends for a given period, organized as follows:
    • By billing account
    • By project
    • By product (for example, Compute Engine)
    • By SKU (for example, static IP addresses)
  • The potential costs if discounts or promotional credits were excluded
  • The forecasted spend

Data export to BigQuery

You can export billing reports to BigQuery, and analyze costs using granular and historical views of data, including data that's categorized using labels or tags. You can perform more advanced analyses using BigQuery ML. We recommend that you enable export of billing reports to BigQuery when you create the Cloud Billing account. Your BigQuery dataset contains billing data from the date you set up Cloud Billing export. The dataset doesn't include data for the period before you enabled export.

To visualize cost data, you can create custom dashboards that integrate with BigQuery (example templates: Looker, Looker Studio).

You can use tags and labels as criteria for filtering the exported billing data. The number of labels included in the billing export is limited. Up to a 1,000 label-maps within a a period of one hour are preserved. Labels don't appear in the invoice PDF or CSV. Consider annotating resources by using tags or labels that indicate the business unit, internal chargeback unit, and other relevant metadata.

Billing access control

You can control access to Cloud Billing for specific resources by defining Identity and Access Management (IAM) policies for the resources. To grant or limit access to Cloud Billing, you can set an IAM policy at the organization level, the billing account level, or the project level.

Access control for billing and resource management follows the principle of separation of duties. Each user has only the permissions necessary for their business role. The Organization Administrator and Billing Administrator roles don't have the same permissions.

You can set billing-related permissions at the billing account level and the organization level. The common roles are Billing Account Administrator, Billing Account User, and Billing Account Viewer.

We recommend that you use invoiced billing, or configure a backup payment method. Maintain contact and notification settings for billing and payment.

Budgets, alerts, and quotas

Budgets help you track actual Google Cloud costs against planned spending. When you create a budget, you can configure alert rules to trigger email notifications when the actual or forecasted spend exceeds a defined threshold. You can also use budgets to automate cost-control responses.

Budgets can trigger alerts to inform you about resource usage and cost trends, and prompt you to take cost-optimization actions. However, budgets don't prevent the use or billing of your services when the actual cost reaches or exceeds the budget or threshold. To automatically control cost, you can use budget notifications to programmatically disable Cloud Billing for a project. You can also limit API usage to stop incurring cost after a defined usage threshold.

You can configure alerts for billing accounts and projects. Configure at least one budget for an account.

To prevent provisioning resources beyond a predetermined level or to limit the volume of specific operations, you can set quotas at the resource or API level. The following are examples of how you can use quotas:

  • Control the number of API calls per second.
  • Limit the number of VMs created.
  • Restrict the amount of data queried per day in BigQuery.

Project owners can reduce the amount of quota that can be charged against a quota limit, by using the Service Usage API to apply consumer overrides to specific quota limits. For more information, see Creating a consumer quota override.

Workload efficiency improvement

We recommend the following strategies to help make your workloads in Google Cloud cost-efficient:

  • Optimize resource usage by improving product efficiency.
  • Reduce the rate at which you're billed for resources.
  • Control and limit resource usage and spending.

When selecting cost-reduction techniques and Google Cloud features, consider the effort required and the expected savings, as shown in the following graph:

Cost optimization strategies: effort-to-savings map

The following is a summary of the techniques shown in the preceding graph:

  • The following techniques potentially yield high savings with low effort:
    • Committed use discounts
    • Autoscaling
    • BigQuery slots
  • The following techniques potentially yield high savings with moderate-to-high effort:
    • Spot VMs
    • Re-architecting as serverless or containerized applications
    • Re-platforming to use managed services
  • The following techniques potentially yield moderate savings with moderate effort:
    • Custom machine types
    • Cloud Storage lifecycle management
    • Rightsizing
    • Reclaiming idle resources

The techniques explained in the following sections can help you improve the efficiency of your workloads.

Refactoring or re-architecting

You can achieve substantial cost savings by refactoring or re-architecting your workload to use Google Cloud products. For example, moving to serverless services (like Cloud Storage, Cloud Run, BigQuery, and Cloud Functions) that support scaling to zero can help improve efficiency. To assess and compare the cost of these products, you can use the pricing calculator.

Rightsizing

This technique helps you ensure that the scale of your infrastructure matches the intended usage. This strategy is relevant primarily to infrastructure-as-a-service (IaaS) solutions, where you pay for the underlying infrastructure. For example, you've deployed 50 VMs, but the VMs aren't fully utilized, and you determine that the workloads could run effectively on fewer (or smaller) VMs. In this case, you can remove or resize some of the VMs. Google Cloud provides rightsizing recommendations to help you detect opportunities to save money without affecting performance by provisioning smaller VMs. Rightsizing requires less effort if done during the design phase than after deploying resources to production.

Autoscaling

If the products you use support dynamic autoscaling, consider designing the workloads to take advantage of autoscaling to get cost and performance benefits. For example, for compute-intensive workloads, you can use managed instance groups in Compute Engine or containerize the applications and deploy them to a Google Kubernetes Engine cluster.

Active Assist recommendations

Active Assist uses data, intelligence, and machine learning to reduce cloud complexity and administrative effort. Active Assist makes it easy to optimize the security, performance, and cost of your cloud topology. It provides intelligent recommendations for optimizing your costs and usage. You can apply these recommendations for immediate cost savings and greater efficiency.

The following are examples of recommendations provided by Active Assist:

  • Compute Engine resource rightsizing: Resize your VM instances to optimize for cost and performance based on usage. Identify and delete or back up idle VMs and persistent disks to optimize your infrastructure cost.
  • Committed-use discount (CUD): Google Cloud analyzes your historical usage, finds the optimal commitment quantity for your workloads, and provides easy-to-understand, actionable recommendations for cost savings. For more information, see Committed use discount recommender.
  • Unattended projects: Discover unattended projects in your organization, and remove or reclaim them. For more information, see Unattended project recommender.

For a complete list, see Recommenders.

What's next