Reducing BigQuery physical storage cost with new billing model
Schneider Larbi
Sr. Partner Engineer
Jonathan Zhuo
Senior Manager & Chief Architect, Deloitte
Introduction
BigQuery is a scalable petabyte-scale data warehouse, renowned for its efficient Structured Query Language (SQL) capabilities. While BigQuery offers exceptional performance, cost optimization remains a critical aspect for any cloud-based service.
This blog post focuses on reducing BigQuery storage costs through the utilization of the newly introduced Physical Bytes Storage Billing (PBSB) as opposed to Logical Bytes Storage Billing (LBSB). PBSB is generally available as of July 5th, 2023. We work with organizations actively exploring this feature to optimize storage costs.
Drawing from the extensive experience Google Cloud has built with Deloitte in assisting clients, this blog post will share valuable insights and recommendations to help customers to smoothly migrate to the PBSB model when designing storage to support BigQuery implementations.
Design challenges
In today's business landscape, we see large organizations accumulating extensive amounts of data, often measured in petabytes, within BigQuery data warehouses. This data is crucial for performing thorough business analysis and extracting valuable insights. However, the associated storage costs can be significant, sometimes exceeding millions of dollars annually. Consequently, minimizing these storage expenses has emerged as a challenge for many of our clients.
Solution
By default, when creating a dataset in BigQuery, the unit of consumption for storage billing is logical bytes. However, with the introduction of physical bytes storage billing, customers now have the option to choose this billing model. By choosing this billing model, customers can take advantage of the cost savings offered by the compression capabilities of physical bytes without compromising data accessibility or query performance.
To address the challenge of high storage costs on BigQuery data warehouses, we implement a two-step approach that leverages the newly introduced physical bytes storage billing option:
First approach: Conduct a BigQuery cost benefit assessment between PBSB and LBSB Google has provided an example of how to calculate the price difference at the dataset level using this query.
Running the query in the example above results in the following details in the table below;
In this example, the first datasets demonstrate an impressive active and long-term compression ratio, ranging from 16 to 25. As a result, there is a remarkable storage cost reduction of 8 times, leading to a substantial decrease in monthly costs from $70,058 to $8,558.
However, the last dataset in this test, close to 11 TB or 96% of all active physical storage data for time travel is used. This is not suitable for PBSB.
During the assessment, we observed the presence of "_session" and "_scripts'' rows, which may impede CSV files downloads due to the 10 MB limit. The "_session" objects correspond to temporary tables generated within BigQuery sessions, while the "_scripts'' objects pertain to intermediate objects produced as part of stored procedures or multi-statement queries. These objects are billed based on logical bytes and cannot be converted to physical bytes by users. To address this, customers can disregard them by modifying the query using this clause:
where total_logical_bytes > 0 AND total_physical_bytes > 0 AND table_schema not like '_scripts%' AND table_schema not like '_session%'. [2, 7, 8]
Second approach: Switch to physical storage billing for suitable datasets. Customers must remember they will not be able to enroll datasets for physical storage billing until all flat-rate commitments for your organization are no longer active.
The simplest way to update the billing model for a dataset is to use the BigQuery (BQ) update command and set the storage_billing_model flag to PHYSICAL.
For example: bq update -d --storage_billing_model=PHYSICAL PROJECT_ID:DATASET_ID
After changing the billing model for a dataset, it takes 24 hours for the change to take effect. Another factor to consider when optimizing storage cost is Time Traveling.
Time travel allows customers to query updated or deleted data, restore deleted tables, or restore expired tables. The default time travel window covers the past seven days, and you can configure it using the BQ command-line tool to balance storage costs with your data retention needs. Here's an example:
bq update --dataset=true --max_time_travel_hours=48 PROJECT_ID:DATASET_NAME
This command sets the time travel window to 48 hours (2 days) for the specified dataset.
"The --max_time_travel_hours value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days)." [5]
Considerations
Switching to the physical storage billing model has some considerations. The table below shows the pricing:
Based on this price list on the table above, customers should consider the following:
- Based on BigQuery storage pricing, the unit price of physical storage billing is twice that of logical storage billing.
- If the compression ratio is less than 2, customers will not benefit from PBSB for their datasets.
- In LBSB, customers are not billed for bytes used for time travel storage, but in PBSB, you are billed for that and the same is true when using the fail-safe.
- To ensure accurate assessment, it is advisable to conduct an evaluation of time travel storage utilization once BigQuery workload has reached a stable state and established a predictable pattern. This is important because the bytes utilized for time travel storage can vary over time.
- Customers have the flexibility to configure the time travel window according to specific data retention requirements while considering storage costs. For instance, customers can customize the time travel window, adjusting it from the default 7 days to a shorter duration such as 2 days.
- A fail-safe period of 7 days will be enforced, during which the time travel setting cannot be modified. However, once this period ends, customers have the flexibility to configure additional time travel days according to their preference. This configurable range extends from 2 to 7 days, allowing customers to establish an effective time travel window spanning between 9 and 14 days. If no action is taken, the default time travel window will remain set at 14 days.
- Switching to physical bytes storage billing is possible, however to do so, you must wait 14 days before you can change the billing model again.
Let's go building
Adopting BigQuery Physical Bytes Storage Billing (PBSB) presents a substantial opportunity for reducing storage costs within BigQuery. The process of assessing and transitioning to this cost-effective billing model is straightforward, allowing customers to maximize the benefits. We have provided comprehensive guidance on conducting assessments and making a seamless transition to the PBSB model. In our upcoming blog post, we will delve into leveraging the newly introduced BigQuery editions to further optimize BigQuery analysis costs from a compute perspective. Wishing you a productive and successful cost optimization journey! And always reach out to us for support in your cloud journey from here.
Special thanks to Dylan Myers (Google), Enlai Wang (Deloitte) for contributing to this article.