BI Engine SQL interface overview

About the BI Engine SQL interface

BigQuery BI Engine is a fast, in-memory analysis service that lets users analyze data stored in BigQuery with sub-second query response times and with high concurrency.

The BI Engine SQL interface expands BI Engine to integrate with other business intelligence (BI) tools such as Looker, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. This page provides an overview of the BI Engine SQL interface, and the expanded capabilities that it brings to BI Engine.

Increase the reservation size

If you need additional memory reservation beyond the default size of 100 GB, you can request an increase. Reservation increases are available in most regions, and might take from 3 days to one week to process.

BI Engine architecture

The BI Engine SQL interface builds on the existing BI Engine for Google Data Studio architecture. The following diagram shows the updated architecture for BI Engine:

BI Engine architecture

BI Engine supports the following:

  1. BigQuery API: BI Engine directly integrates with the BigQuery API. Any BI solution or custom application that works with the BigQuery API through standard mechanisms such as REST or JDBC and ODBC drivers can use BI Engine without any changes.
  2. Vectorized runtime: With the BI Engine SQL interface, BI Engine introduces a more modern technique called vectorized processing. Using vectorized processing in an execution engine makes more efficient use of modern CPU architecture, by operating on batches of data at a time. BI Engine also uses advanced data encodings, specifically, dictionary run-length encoding, to further compress the data that's stored in the in-memory layer.
  3. Metadata: The metadata stores the table and views definitions, the schema definition, and the fine-grained permissions.
  4. Reservations: BI Engine reservations manage the memory allocation at the project billing level. BI Engine caches only columns and partitions that are queried or scanned. It does not cache the whole table.
  5. Distributed in-memory engine: BI Engine is a distributed in-memory execution engine, letting customers allocate significantly larger memory reservations.

Query optimization and acceleration

BigQuery, and by extension BI Engine, breaks down the query plan that's produced for a SQL query into subqueries. A subquery contains a number of operations, such as scanning, filtering, or aggregating data, and is often the unit of execution on a shard.

While all of BigQuery's supported SQL queries are correctly executed by the BI Engine SQL interface, only certain subqueries are optimized. In particular, BI Engine SQL interface is most optimized for leaf-level subqueries that scan the data from storage, and perform operations such as filter, compute, aggregation, order-by, and certain types of joins. Other subqueries that are not yet fully accelerated by BI Engine revert back to BigQuery for execution.

Because of this selective optimization, simpler business intelligence or dashboard-type queries benefit the most from BI Engine (resulting in fewer subqueries) because the majority of the execution time is spent on leaf-level subqueries that process raw data.

Limitations

The following list explains how and when a query is accelerated when you use BI Engine:

  • Within the leaf-level subqueries, the acceleration is restricted to standard SQL functions and operators.
  • The following limitations apply to the size and number of rows:
    • Simple queries that reference only one table (such as GROUP-BY or FILTER queries) are primarily limited by the reservation size.
    • For joins, the acceleration is restricted to leaf-level subqueries with INNER and LEFT OUTER joins, where a large (fact) table is joined with up to four small and unpartitioned (dimension) tables, each with up to 5 million rows or 5 GB in size for the referenced columns.
  • Queries referencing wildcard tables are not supported.

Monitoring and diagnostics

This section explains how to find statistics about BI Engine, and how BI Engine intregrates with Cloud Monitoring.

Acceleration statistics

Detailed statistics on BI Engine are available through the job statistics API. You can use the bq command-line tool to fetch statistics associated with BI Engine accelerated queries.

With BI Engine acceleration enabled, you can run a query in any of these three modes:

DISABLED
BI Engine disabled the acceleration. biEngineReasons specifies a more detailed reason. The query was run using the BigQuery execution engine.
PARTIAL
Part of the query was accelerated using BI Engine. As described in Query optimization and acceleration, a query plan is generally broken down into multiple subqueries. BI Engine supports the common types of subquery patterns that are typically used in dashboarding. If the query consists of multiple subqueries, only a few of which fall under the supported use cases, then BI Engine executes the remaining subqueries using the normal BigQuery engine. Those remaining subqueries don't receive the BI Engine acceleration. In this situation, BI Engine returns a PARTIAL acceleration code, and uses biEngineReasons to populate the reason for not accelerating other subqueries.
 FULL
 
All of the query was accelerated using BI Engine.

To fetch the statistics associated with BI Engine accelerated queries, run the following bq command-line tool command:

bq show --format=prettyjson -j job_id

If the project is enabled for BI Engine acceleration, then the output produces a new field, biEngineStatistics. Here is a sample job report:

 "statistics": {
    "creationTime": "1602175128902",
    "endTime": "1602175130700",
    "query": {
      "biEngineStatistics": {
        "biEngineMode": "DISABLED",
        "biEngineReasons": [
          {
            "code": "UNSUPPORTED_SQL_TEXT",
            "message": "Detected unsupported join type"
          }
        ]
      },

For more information about the BiEngineStatistics field, see the Job reference.

Acceleration statistics in INFORMATION_SCHEMA

BI Engine acceleration statistics can be queried as part of INFORMATION_SCHEMA.JOBS_BY_* views via the column bi_engine_statistics. For example, this query returns the bi_engine_statistics, for all of your projects' jobs for the last 24 hours:

select creation_time, job_id, bi_engine_statistics
from `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
where creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
  and job_type = "QUERY"

Use the following format to specify regionality for the project-id, region, and views in the INFORMATION_SCHEMA view:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW

Cloud Monitoring

BI Engine integrates with Cloud Monitoring to surface key metrics for monitoring and alerting purposes. These are the metrics that are monitored:

Resource type Metric name Description
Project Reservation Total Bytes Total capacity allocated within one Cloud project.
Project Reservation Used Bytes Total capacity used within one Cloud project.

Pricing

BI Engine pricing is described on the BI Engine pricing page.

Supported regions

Like BigQuery, BI Engine is a regional and a multi-regional resource. BI Engine processes your data in the same region where your data is located inside BigQuery. BigQuery BI Engine SQL interface is supported in the same regions as BigQuery. For more information about BigQuery regional locations, see BigQuery dataset locations.