Jump to Content
Data Analytics

Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

June 9, 2023
Celia Ji

Strategic Cloud Engineer, Google Cloud

Megha Bedi

Cloud Data Engineer, Google Cloud

Why do we need a BigQuery anti-pattern recognition tool?

BigQuery is a serverless and cost-effective enterprise data warehouse that works across cloud environments and scales with your data. As with any large scale data-intensive platform, following best practices and avoiding inefficient anti-patterns goes a long way in terms of performance and cost savings. 

Usually SQL optimization requires a significant time investment from engineers, who must read high-complexity queries, devise a variety of approaches to improve performance and efficiency, and test several optimization techniques. The best place to start is to fix anti-patterns, since this only requires easily applicable changes and provides significant performance improvements.

To facilitate the task of identifying and fixing said anti-pattern, Google Professional Services Organization (PSO) and Global Services Delivery (GSD) have developed a BigQuery anti-pattern recognition tool. This tool automates the process of scanning SQL queries, identifying antipatterns, and providing optimization recommendations.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1-tool-arch.max-700x700.jpg

What is the BigQuery anti-pattern recognition tool?

The BigQuery anti-pattern recognition tool let you easily identify performance impacting anti-patterns across a large number of SQL queries in a single go. 

It utilizes ZetaSQL to parse BigQuery SQL queries into abstract syntax trees (AST) and then traverses the tree nodes to detect the presence of anti-patterns. 

The tool takes a BigQuery SQL query as an input, such as:

Loading...

And produces the output as:

Loading...

It examines potential optimizations, including:

  • Selecting only the necessary columns
  • Handling multiple WITH-clause references 
  • Addressing subqueries in filters with aggregations
  • Optimizing ORDER BY queries with LIMIT
  • Enhancing string comparisons
  • Improving JOIN patterns
  • Avoiding subquery aggregation in the WHERE clause

The solution supports reading from various sources, such as:

  • Command line 
  • Local files
  • Cloud Storage files
  • Local folders
  • Cloud Storage folders
  • CSV (with one query per line)
  • INFORMATION_SCHEMA

Additionally, the solution provides flexibility in writing output to different destinations, including:

  • Printing to the terminal
  • Exporting as CSV

Writing to a BigQuery table

Using the BigQuery anti-pattern recognition tool

The BigQuery anti-pattern recognition tool is hosted on GitHub. Below are the Quick Start steps on using the tool via command line for inline queries. You can also leverage Cloud Run to deploy it as a container on cloud. 

Prerequisites

  • Linux OS
  • JDK 11 or above is installed
  • Maven
  • Docker
  • gcloud CLI

Quick start: - steps

1. Clone the repo into your local machine.

Loading...

2. Build the tool image inside the `bigquery-antipattern-recognition` folder.

Loading...

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_BigQuery_anti-pattern_recognition_tool.max-1600x1600.png

3. Run the tool for a simple inline query.

Loading...

4. Below is the output result in the command-line interface:

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_BigQuery_anti-pattern_recognition_tool.max-1000x1000.png

Additionally, the above tool can read queries from Information Schema and load the output recommendations to a BigQuery table.

Below is an example of the BigQuery anti-pattern recognition tool results exported to a BigQuery table.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_BigQuery_anti-pattern_recognition_tool.max-1900x1900.png

Getting started

Ready to start optimizing your BigQuery queries and cutting costs? Check out the tool here and contribute to the tool via GitHub.

Have questions or feedback?

We’re actively working on new features to make the tool as useful to our customers. Use it and tell us what you think! For product feedback/technical questions, reach out to us at bq-antipattern-eng@google.com. If you’re already a BigQuery customer and would like a briefing on the tool, please reach out, we’d be happy to talk.

Posted in