BI Engine SQL interface overview
About the BI Engine SQL interface
BigQuery BI Engine is a fast, in-memory analysis service that lets users analyze data stored in BigQuery with sub-second query response times and with high concurrency.
The BI Engine SQL interface expands BI Engine to integrate with other business intelligence (BI) tools such as Looker, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. This page provides an overview of the BI Engine SQL interface, and the expanded capabilities that it brings to BI Engine.
Increase the reservation size
If you need additional memory reservation beyond the default size of 100 GB, you can request an increase. Reservation increases are available in most regions, and might take from 3 days to one week to process.
BI Engine architecture
The BI Engine SQL interface builds on the existing BI Engine for Google Data Studio architecture. The following diagram shows the updated architecture for BI Engine:
BI Engine supports the following:
- BigQuery API: BI Engine directly integrates with the BigQuery API. Any BI solution or custom application that works with the BigQuery API through standard mechanisms such as REST or JDBC and ODBC drivers can use BI Engine without any changes.
- Vectorized runtime: With the BI Engine SQL interface, BI Engine introduces a more modern technique called vectorized processing. Using vectorized processing in an execution engine makes more efficient use of modern CPU architecture, by operating on batches of data at a time. BI Engine also uses advanced data encodings, specifically, dictionary run-length encoding, to further compress the data that's stored in the in-memory layer.
- Metadata: The metadata stores the table and views definitions, the schema definition, and the fine-grained permissions.
- Reservations: BI Engine reservations manage the memory allocation at the project billing level. BI Engine caches only columns and partitions that are queried or scanned. It does not cache the whole table.
- Distributed in-memory engine: BI Engine is a distributed in-memory execution engine, letting customers allocate significantly larger memory reservations.
Query optimization and acceleration
BigQuery, and by extension BI Engine, breaks down the query plan that's produced for a SQL query into subqueries. A subquery contains a number of operations, such as scanning, filtering, or aggregating data, and is often the unit of execution on a shard.
While all of BigQuery's supported SQL queries are correctly executed by the BI Engine SQL interface, only certain subqueries are optimized. In particular, BI Engine SQL interface is most optimized for leaf-level subqueries that scan the data from storage, and perform operations such as filter, compute, aggregation, order-by, and certain types of joins. Other subqueries that are not yet fully accelerated by BI Engine revert back to BigQuery for execution.
Because of this selective optimization, simpler business intelligence or dashboard-type queries benefit the most from BI Engine (resulting in fewer subqueries) because the majority of the execution time is spent on leaf-level subqueries that process raw data.
Limitations
The following list explains how and when a query is accelerated when you use BI Engine:
- Within the leaf-level subqueries, the acceleration is restricted to standard SQL functions and operators.
- The following limitations apply to the size and number of rows:
- Simple queries that reference only one table (such as
GROUP-BY
orFILTER
queries) are primarily limited by the reservation size. - For joins, the acceleration is restricted to leaf-level subqueries with
INNER
andLEFT OUTER
joins, where a large (fact) table is joined with up to four small and unpartitioned (dimension) tables, each with up to 5 million rows or 5 GB in size for the referenced columns.
- Simple queries that reference only one table (such as
- Queries referencing wildcard tables are not supported.
Monitoring and diagnostics
This section explains how to find statistics about BI Engine, and how BI Engine intregrates with Cloud Monitoring.
Acceleration statistics
Detailed statistics on BI Engine are available through the job
statistics API. You can use the bq
command-line tool to fetch statistics associated with
BI Engine accelerated queries.
With BI Engine acceleration enabled, you can run a query in any of these three modes:
DISABLED |
BI Engine disabled the acceleration.
biEngineReasons specifies a more detailed reason. The query was
run using the BigQuery execution engine. |
PARTIAL |
Part of the query was accelerated using BI Engine. As
described in Query optimization
and acceleration, a query plan is generally broken down into multiple
subqueries. BI Engine supports the common types
of subquery patterns that are typically used in dashboarding. If the query
consists of multiple subqueries, only a few of which fall under the
supported use cases, then BI Engine executes the remaining
subqueries using the normal BigQuery engine. Those remaining
subqueries don't receive the BI Engine acceleration.
In this situation, BI Engine returns a PARTIAL
acceleration code, and uses biEngineReasons to populate the reason
for not accelerating other subqueries. |
FULL |
All of the query was accelerated using BI Engine. |
To fetch the statistics associated with BI Engine accelerated
queries, run the following bq
command-line tool command:
bq show --format=prettyjson -j job_id
If the project is enabled for BI Engine acceleration, then the
output produces a new field, biEngineStatistics
. Here is a sample job
report:
"statistics": {
"creationTime": "1602175128902",
"endTime": "1602175130700",
"query": {
"biEngineStatistics": {
"biEngineMode": "DISABLED",
"biEngineReasons": [
{
"code": "UNSUPPORTED_SQL_TEXT",
"message": "Detected unsupported join type"
}
]
},
For more information about the BiEngineStatistics
field, see the
Job reference.
Acceleration statistics in INFORMATION_SCHEMA
BI Engine acceleration statistics can be queried as part of INFORMATION_SCHEMA.JOBS_BY_*
views via the column bi_engine_statistics
.
For example, this query returns the bi_engine_statistics, for all of your projects' jobs for the last 24 hours:
select creation_time, job_id, bi_engine_statistics
from `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
where creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
and job_type = "QUERY"
Use the following format to specify regionality for the project-id
, region
, and views
in the INFORMATION_SCHEMA view:
`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW
Cloud Monitoring
BI Engine integrates with Cloud Monitoring to surface key metrics for monitoring and alerting purposes. These are the metrics that are monitored:
Resource type | Metric name | Description |
---|---|---|
Project | Reservation Total Bytes | Total capacity allocated within one Cloud project. |
Project | Reservation Used Bytes | Total capacity used within one Cloud project. |
Pricing
BI Engine pricing is described on the BI Engine pricing page.
Supported regions
Like BigQuery, BI Engine is a regional and a multi-regional resource. BI Engine processes your data in the same region where your data is located inside BigQuery. BigQuery BI Engine SQL interface is supported in the same regions as BigQuery. For more information about BigQuery regional locations, see BigQuery dataset locations.