Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool
Celia Ji
Strategic Cloud Engineer, Google Cloud
Megha Bedi
Cloud Data Engineer, Google Cloud
Why do we need a BigQuery anti-pattern recognition tool?
BigQuery is a serverless and cost-effective enterprise data warehouse that works across cloud environments and scales with your data. As with any large scale data-intensive platform, following best practices and avoiding inefficient anti-patterns goes a long way in terms of performance and cost savings.
Usually SQL optimization requires a significant time investment from engineers, who must read high-complexity queries, devise a variety of approaches to improve performance and efficiency, and test several optimization techniques. The best place to start is to fix anti-patterns, since this only requires easily applicable changes and provides significant performance improvements.
To facilitate the task of identifying and fixing said anti-pattern, Google Professional Services Organization (PSO) and Global Services Delivery (GSD) have developed a BigQuery anti-pattern recognition tool. This tool automates the process of scanning SQL queries, identifying antipatterns, and providing optimization recommendations.
What is the BigQuery anti-pattern recognition tool?
The BigQuery anti-pattern recognition tool let you easily identify performance impacting anti-patterns across a large number of SQL queries in a single go.
It utilizes ZetaSQL to parse BigQuery SQL queries into abstract syntax trees (AST) and then traverses the tree nodes to detect the presence of anti-patterns.
The tool takes a BigQuery SQL query as an input, such as:
And produces the output as:
It examines potential optimizations, including:
- Selecting only the necessary columns
- Handling multiple WITH-clause references
- Addressing subqueries in filters with aggregations
- Optimizing ORDER BY queries with LIMIT
- Enhancing string comparisons
- Improving JOIN patterns
- Avoiding subquery aggregation in the WHERE clause
The solution supports reading from various sources, such as:
- Command line
- Local files
- Cloud Storage files
- Local folders
- Cloud Storage folders
- CSV (with one query per line)
- INFORMATION_SCHEMA
Additionally, the solution provides flexibility in writing output to different destinations, including:
- Printing to the terminal
- Exporting as CSV
Writing to a BigQuery table
Using the BigQuery anti-pattern recognition tool
The BigQuery anti-pattern recognition tool is hosted on GitHub. Below are the Quick Start steps on using the tool via command line for inline queries. You can also leverage Cloud Run to deploy it as a container on cloud.
Prerequisites
- Linux OS
- JDK 11 or above is installed
- Maven
- Docker
- gcloud CLI
Quick start: - steps
1. Clone the repo into your local machine.
2. Build the tool image inside the `bigquery-antipattern-recognition` folder.
3. Run the tool for a simple inline query.
4. Below is the output result in the command-line interface:
Additionally, the above tool can read queries from Information Schema and load the output recommendations to a BigQuery table.
Below is an example of the BigQuery anti-pattern recognition tool results exported to a BigQuery table.
Getting started
Ready to start optimizing your BigQuery queries and cutting costs? Check out the tool here and contribute to the tool via GitHub.
Have questions or feedback?
We’re actively working on new features to make the tool as useful to our customers. Use it and tell us what you think! For product feedback/technical questions, reach out to us at bq-antipattern-eng@google.com. If you’re already a BigQuery customer and would like a briefing on the tool, please reach out, we’d be happy to talk.