What is BI Engine?

BigQuery BI Engine is a fast, in-memory analysis service that accelerates many SQL queries in BigQuery by intelligently caching the data you use most frequently. BI Engine can accelerate SQL queries from any source, including those written by data visualization tools, and can manage cached tables for on-going optimization. This lets you improve query performance without manual tuning or data tiering. You can use clustering and partitioning to further optimize the performance of large tables with BI Engine.

For example, if your dashboard only displays the last quarter's data, then consider partitioning your tables by time so only the latest partitions are loaded into memory. You can also combine the benefits of materialized views and BI Engine. This works particularly well when the materialized views are used to join and flatten data to optimize their structure for BI Engine.

BI Engine provides the following advantages:

  1. BigQuery API: BI Engine directly integrates with the BigQuery API. Any BI solution or custom application that works with the BigQuery API through standard mechanisms such as REST or JDBC and ODBC drivers can use BI Engine without modification.
  2. Vectorized runtime: With the BI Engine SQL interface, BI Engine introduces a more modern technique called vectorized processing. Using vectorized processing in an execution engine makes more efficient use of modern CPU architecture, by operating on batches of data at a time. BI Engine also uses advanced data encodings, specifically, dictionary and run-length encoding, to further compress the data that's stored in the in-memory layer.
  3. Seamless integration: BI Engine works with BigQuery features and metadata, including authorized views, column and row level security, and data masking.
  4. Reservations: BI Engine reservations manage memory allocation at the project location level. BI Engine caches specific columns or partitions that are queried, prioritizing those in tables marked as preferred.

About the BI Engine SQL interface

The BI Engine SQL interface expands BI Engine to integrate with other business intelligence (BI) tools such as Looker, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. This page provides an overview of the BI Engine SQL interface, and the expanded capabilities that it brings to BI Engine.

BI Engine architecture

BI Engine use cases

BI Engine can significantly accelerate many SQL queries, including those used for BI dashboards. Acceleration is most effective if you identify the tables that are essential to your queries and then mark them as preferred tables. To use BI Engine, create a reservation that defines the storage capacity dedicated to BI Engine. You can let BigQuery determine which tables to cache based on the project's usage patterns or you can mark specific tables to prevent other traffic from interfering with accelleration.

BI Engine is useful in the following use cases:

  • You use BI tools to analyze your data: The BI Engine SQL interface can accelerate BigQuery queries regardless of whether they run in the BigQuery console, client library, or through an API or an ODBC or JDBC connector. This can significantly improve the performance of dashboards connected to BigQuery through a built-in connection (API) or connectors.
  • You have certain tables that are queried most frequently: BI Engine lets you designate specific preferred tables to accelerate. This is helpful if you have a subset of tables that are queried more frequently or are used for high-visibility dashboards.

BI Engine might not fit your needs in the following cases:

  • You use wildcards in your queries: Queries referencing wildcard tables are not supported by BI Engine and don't benefit from acceleration.

  • You rely heavily on unsupported BigQuery features: While BI Engine supports most SQL functions and operators when connecting business intelligence (BI) tools to BigQuery, there are unsupported features, including external tables and non-SQL user-defined functions.

Considerations for BI Engine

Consider the following when deciding how to configure BI Engine:

Ensure acceleration for specific queries

You can ensure a particular set of queries always gets accelerated by creating a separate project with a BI Engine reservation. To do so, you should ensure that the BI Engine reservation in that project is large enough to match the size of all tables used in those queries and designate those tables as preferred tables for BI Engine. Only those queries that need to be accelerated should be run in that project.

Minimize your joins

BI Engine works best with pre-joined or pre-aggregated, and with data in a small number of joins. This is particularly true when one side of the join is large and the others are much smaller such as when you query a large fact table joined with a small dimension table. You can combine BI Engine with materialized views that which perform joins to produce a single large, flat table. In this way, the same joins don't have to be performed on every query.

Understand the impact of BI Engine

You can better understand how your workloads benefit from BI Engine by reviewing the usage statistics in Cloud Monitoring or querying the INFORMATION_SCHEMA in BigQuery. Be sure to disable the Use cached results option in BigQuery to get the most accurate comparison. For more information, see Use cached query results.

Quotas and limitations

See BigQuery quotas and limits for quotas and limits that apply to BI Engine.

Pricing

For information on BI Engine pricing, see the BigQuery Pricing page.

Query optimization and acceleration

BigQuery, and by extension BI Engine, breaks down the query plan that's produced for a SQL query into subqueries. A subquery contains a number of operations, such as scanning, filtering, or aggregating data, and is often the unit of execution on a shard.

While all of BigQuery's supported SQL queries are correctly executed by the BI Engine SQL interface, only certain subqueries are optimized. In particular, BI Engine SQL interface is most optimized for leaf-level subqueries that scan the data from storage, and perform operations such as filter, compute, aggregation, order-by, and certain types of joins. Other subqueries that are not yet fully accelerated by BI Engine revert back to BigQuery for execution.

Because of this selective optimization, simpler business intelligence or dashboard-type queries benefit the most from BI Engine (resulting in fewer subqueries) because the majority of the execution time is spent on leaf-level subqueries that process raw data.

What's next