Data Analytics

Learn how BI Engine enhances BigQuery query performance

BI Engine.jpg

BigQuery BI Engine is a fast, in-memory analysis service that lets users analyze data stored in BigQuery with rapid response times and with high concurrency to accelerate certain BigQuery SQL queries. BI Engine caches data instead of query results, allowing different queries over the same data to be accelerated as you look at different aspects of the data. By using BI Engine with BigQuery streaming, you can perform real-time data analysis over streaming data without sacrificing write speeds or data freshness.

​​BI Engine architecture

The BI Engine SQL interface expands BI Engine support to any business intelligence (BI) tool that works with BigQuery such as Looker, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. With BI Engine, you can build rich, interactive dashboards and reports in BI tool of your choice without compromising performance, scale,security, or data freshness. To learn more about the BI Engine SQL interface, please refer here.

The following diagram shows the updated architecture for BI Engine:

1 BI Engine.jpg

Shown here is one simple example of a Looker dashboard that was created with BI Engine capacity reservation (top) versus the same dashboard without any reservation (bottom).This dashboard is created from the BigQuery public dataset `bigquery-public-data.chicago_taxi_trips.taxi_trips`  to analyze the Sum of total_trip cost and logarithmic average of total trip cost over time.

2 BI Engine.gif
total_trip cost for past 5 years

BI Engine will cache the minimum amount of data possible to resolve a query to maximize the capacity of the reservation. Running business intelligence on big data can be tricky.

Here is a query against the same public dataset, ‘bigquery-public-data.chicago_taxi_trips.taxi_trips,’ to demonstrate BI Engine performance with/without reserved BigQuery slots.

Example Query

  SELECT
 (DATE(trip_end_timestamp , 'America/Chicago')) AS trip_end_timestamp_date,
 (DATE(trip_start_timestamp , 'America/Chicago')) AS trip_start_timestamp_date,
 COALESCE(SUM(CAST(trip_total AS FLOAT64)), 0) AS sum_trip_total,
 CONCAT ('Hour :',(DATETIME_DIFF(trip_end_timestamp,trip_start_timestamp,DAY) * 1440) ,' , ','Day :',(DATETIME_DIFF(trip_end_timestamp,trip_start_timestamp,DAY)) ) AS trip_time,
 CASE WHEN
   ROUND(fare + tips + tolls + extras) = trip_total THEN 'Tallied'
   WHEN ROUND(fare + tips + tolls + extras) < trip_total THEN 'Tallied Less'
   WHEN ROUND(fare + tips + tolls + extras) > trip_total THEN 'Tallied More'
   WHEN (ROUND(fare + tips + tolls + extras) = 0.0 AND trip_total = 0.0) THEN 'Tallied 0'
   ELSE 'N/A' END AS trip_total_tally,
 REGEXP_REPLACE(TRIM(company),'null','N/A') as company,
 CASE WHEN
 TRIM(payment_type) = 'Unknown' THEN 'N/A'
 WHEN payment_type IS NULL THEN 'N/A' ELSE payment_type END AS payment_type
 FROM
  `bigquery-public-data.chicago_taxi_trips.taxi_trips`
 GROUP BY
   1,
   2,
   4,
   5,
   6,
   7
ORDER BY
 1 DESC,
 2 ,
 4 DESC,
 5 ,
 6 ,
 7
LIMIT 5000

The above query was run with the below combinations: 

  • Without any BigQuery slot reservation/BI Engine reservation,  the query observed 7.6X more average slots and 6.3X more job run time compared to the run with reservations (last stats in the result). 

  • Without BI Engine reservation but with BigQuery slot reservation, the query observed 6.9X more average slots and 5.9X more job run time compared to the run with reservations (last stats in the result). 

  • With BI Engine reservation and no BigQuery slot reservation, the query observed 1.5 more average slots and the job completed in sub-seconds (868 ms). 

  • With both BI Engine reservation and BigQuery slot reservation, only 23 average slots were used and the job completed in sub-second as shown in results.This is the most cost effective way in regards to average slots and run time compared to all other options (23.27 avg_slots , 855 ms run time).

3 BI Engine.jpg

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data. You can query the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, and the history of jobs completed in the past 180 days.

Query to determine bi_engine_statistics and number of slots. More schema information can be found here.

  SELECT
 project_id,
 job_id,
 reservation_id,
 job_type,
 TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND) AS job_duration_mseconds,
 CASE
 WHEN job_id = 'bquxjob_54033cc8_18164d54ada' THEN  'YES_BQ_RESERV_NO_BIENGINE'
 WHEN job_id = 'bquxjob_202f17eb_18149bb47c3' THEN 'NO_BQ_RESERV_NO_BIENGINE'
 WHEN job_id = 'bquxjob_404f2321_18164e0f801' THEN 'YES_BQ_RESERV_YES_BIENGINE'
WHEN job_id = 'bquxjob_48c8910d_18164e520ac' THEN 'NO_BQ_RESERV_YES_BIENGINE' ELSE 'NA'  END as query_method,
 bi_engine_statistics,
 -- Average slot utilization per job is calculated by dividing
-- total_slot_ms by the millisecond duration of the job
 SAFE_DIVIDE(total_slot_ms,(TIMESTAMP_DIFF(end_time, start_time, MILLISECOND))) AS avg_slots
FROM
region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
where  creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 80 DAY) AND CURRENT_TIMESTAMP()
AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()
ANd job_id in ('bquxjob_202f17eb_18149bb47c3','bquxjob_54033cc8_18164d54ada','bquxjob_404f2321_18164e0f801','bquxjob_48c8910d_18164e520ac')
ORDER BY avg_slots DESC

From the observation, the most effective way of improving performance  for BI queries is to use BI ENGINE reservation along with BigQuery slot reservation.This will increase query performance, throughput and also utilizes less number of slots. Reserving BI Engine capacity will let you save on slots in your projects.

BigQuery BI Engine optimizes the standard SQL functions and operators when connecting business intelligence (BI) tools to BigQuery. Optimized SQL functions and operators for BI Engine are found here.

Monitor BI Engine with Cloud Monitoring

BigQuery BI Engine integrates with Cloud Monitoring so you can monitor BI Engine metrics and configure alerts.

For information on using Monitoring to create charts for your BI Engine metrics, see Creating charts in the Monitoring documentation.

4 BI Engine.jpg

We ran the same query without BI engine reservation and noticed 15.47 GB were processed.

5 BI Engine.jpg

After BI Engine capacity reservation, in Monitoring under BIE Reservation Used Bytes dashboard we got a compression ratio of ~11.74x (15.47 GB / 1.317 GB). However compression is very data dependent, primarily compression depends on the data cardinality. Customers should run tests on their data to determine their compression rate.

6 BI Engine.jpg

Monitoring metrics ‘Reservation Total Bytes’ gives information about the BI engine capacity reservation whereas ‘Reservation Used Bytes’ gives information about the total used_bytes. Customers can make use of these 2 metrics to come up with the right capacity for reservation. 

When a project has BI engine capacity reserved, queries running in BigQuery will use BI engine to accelerate the compatible subquery performance.​​The degree of acceleration of the query falls into one of the below mentioned modes:

BI Engine Mode FULL - BI Engine compute was used to accelerate leaf stages of the query but the data needed may be in memory or may need to be scanned from a disk. Even when BI Engine compute is utilized, BQ slots may also be used for parts of the query. The more complex the query,the more slots are used.This mode executes all leaf stages in BI Engine (and sometimes all stages).

BI Engine Mode PARTIAL - BI Engine accelerates compatible subqueries and BigQuery processes the subqueries that are not compatible with BI Engine.This mode also provides bi-engine-reason for not using BI Engine mode fully.This mode executes some leaf stages in BI Engine and rest in BigQuery.

BI Engine Mode DISABLED - When BI Engine process subqueries that are not compatible for acceleration, all leaf stages will get processed in BigQuery. This mode also provides bi-engine-reason for not using BI Engine mode fully/partially.

7 BI Engine.jpg

Note that when you purchase a flat rate reservation, BI Engine capacity (GB) will be provided as part of the monthly flat-rate price. You can get up to 100 GB of BI Engine capacity included for free with a 2000-slot annual commitment. As BI Engine reduces the number of slots processed for BI queries, purchasing less slots by topping up little BI Engine capacity along with freely offered capacity might suffice your requirement instead of going in for more slots!