BigQuery BI Engine SQL interface overview

About the BI Engine SQL interface

BigQuery BI Engine is a fast, in-memory analysis service that lets users analyze data stored in BigQuery with sub-second query response times and with high concurrency.

The BI Engine SQL interface expands BI Engine to integrate with other business intelligence (BI) tools such as Looker, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. This page provides an overview of the BI Engine SQL interface, and the expanded capabilities that it brings to this preview version of BI Engine.

Requesting access to the preview

Access to the BI Engine SQL interface in this preview phase is provided through an enrollment process. For access, submit the BI Engine preview enrollment form with your project details. You will be notified by email once your project is enrolled.

Increasing the reservation size during preview

If you need additional memory reservation beyond the default size of 100 GB, you can request an increase. Reservation increases are evaluated on a case-by-case basis, and are only available in some regions.

BI Engine architecture

The BI Engine SQL interface builds on the existing BI Engine for Google Data Studio architecture. The following diagram shows the updated architecture for BI Engine:

BI Engine architecture

This preview version of BI Engine includes the following:

  1. BigQuery API: BI Engine directly integrates with the BigQuery API. Any BI solution or custom application that works with the BigQuery API through standard mechanisms such as REST JDBC and ODBC drivers can use BI Engine without any changes.
  2. Vectorized runtime: With the BI Engine SQL interface, BI Engine introduces a more modern technique called vectorized processing. Using vectorized processing in an execution engine makes more efficient use of modern CPU architecture, by operating on batches of data at a time. BI Engine also uses advanced data encodings, specifically, dictionary run-length encoding, to further compress the data that's stored in the in-memory layer.
  3. Metadata: The metadata stores the table and views definitions, the schema definition, and the fine-grained permissions.
  4. Reservations: BI Engine reservations manage the memory allocation at the project billing level. BI Engine caches only columns and partitions that are queried or scanned. It does not cache the whole table.
  5. Distributed in-memory engine: BI Engine is a distributed in-memory execution engine, letting customers allocate significantly larger memory reservations.

Query optimization and acceleration

BigQuery, and by extension BI Engine, breaks down the query plan that's produced for a SQL query into subqueries. A subquery contains a number of operations, such as scanning, filtering, or aggregating data, and is often the unit of execution on a shard.

While all of BigQuery's supported SQL queries are correctly executed by the BI Engine SQL interface, the preview version of the BI Engine SQL interface only optimizes certain subqueries. In particular, it is most optimized for leaf-level subqueries that scan the data from storage, and perform operations such as filter, compute, aggregation, order-by, and certain types of joins. Other subqueries that are not yet fully accelerated by BI Engine revert back to BigQuery for execution.

Because of this selective optimization, simpler BI or dashboard-type queries (resulting in fewer subqueries) benefit the most from BI Engine because the majority of the execution time is spent on leaf-level subqueries that process raw data.

Limitations

The following list explains how and when a query is accelerated when you use this preview version of BI Engine:

  • Within the leaf-level subqueries, the acceleration is restricted to standard SQL functions and operators.
  • During this preview, users are limited to a maximum of 100 GB of BI Engine capacity for each Google Cloud project. You can request an increase in reservation capacity.
  • BigQuery streaming ingestion is not supported during preview.
  • The following limitations apply to the size and number of rows:
    • Simple queries that reference only one table (such as GROUP-BY or FILTER queries) are primarily limited by the reservation size.
    • For joins, the acceleration is restricted to leaf-level subqueries with INNER and LEFT OUTER joins, where a large (fact) table is joined with up to four small and unpartitioned (dimension) tables, each with up to 5 million rows or 5 GB in size for the referenced columns.
  • Queries referencing wildcard tables are not supported.
  • This preview version of BI Engine is not available in all regions. For more information, see supported regions.

Monitoring and diagnostics

This section explains how to find statistics about BI Engine, and how BI Engine intregrates with Cloud Monitoring.

Acceleration statistics

Detailed statistics on BI Engine are available through the job statistics API. You can use the bq command-line tool to fetch statistics associated with BI Engine accelerated queries.

With BI Engine acceleration enabled, you can run a query in any of these three modes:

DISABLED
BI Engine disabled the acceleration. biEngineReasons specifies a more detailed reason. The query was run using the BigQuery execution engine.
PARTIAL
Part of the query was accelerated using BI Engine. As described in Query optimization and acceleration, a query plan is generally broken down into multiple subqueries. This preview of BI Engine supports the common types of subquery patterns that are typically used in dashboarding. If the query consists of multiple subqueries, only a few of which fall under the supported use cases, then BI Engine executes the remaining subqueries using the normal BigQuery engine. Those remaining subqueries don't receive the BI Engine acceleration. In this situation, BI Engine returns a PARTIAL acceleration code, and uses biEngineReasons to populate the reason for not accelerating other subqueries.
 FULL
 
All of the query was accelerated using BI Engine.

To fetch the statistics associated with BI Engine accelerated queries, run the following bq command-line tool command:

bq show --format=prettyjson -j job_id

If the project is enabled for BI Engine acceleration, then the output produces a new field, biEngineStatistics. Here is a sample job report:

 "statistics": {
    "creationTime": "1602175128902",
    "endTime": "1602175130700",
    "query": {
      "biEngineStatistics": {
        "biEngineMode": "DISABLED",
        "biEngineReasons": [
          {
            "code": "UNSUPPORTED_SQL_TEXT",
            "message": "Detected unsupported join type"
          }
        ]
      },

For more information about the BiEngineStatistics field, see the Job reference.

Cloud Monitoring

BI Engine integrates with Cloud Monitoring to surface key metrics for monitoring and alerting purposes. These are the metrics that are monitored:

Resource type Metric name Description
Project Reservation Total Bytes Total capacity allocated within one Cloud project.
Project Reservation Used Bytes Total capacity used within one Cloud project.

Feedback and updates

Once you're enrolled in the preview, you can post questions and comments in the discussion group. We also use this group to provide updates on ongoing improvements and features during the preview phase.

Pricing

There is no cost to using the BI Engine SQL interface during the first 20 weeks of the preview phase.

Once the preview ends, BI Engine will follow the pricing model that's described on the BI Engine pricing page.

Supported regions

Like BigQuery, BI Engine is a regional and a multi-regional resource. BI Engine processes your data in the same region where your data is located inside BigQuery.

The preview phase of BI Engine SQL interface supports the following locations.

Regional locations

Region description Region name
Americas
Los Angeles us-west2
Northern Virginia us-east4
Oregon us-west1
South Carolina us-east1
Asia Pacific
Singapore asia-southeast1
Tokyo asia-northeast1

Multi-regional locations

Multi-region description Multi-region name
Data centers within member states of the European Union1 EU
Data centers in the United States US

1 Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.