Data Processing in BigQuery Omni

Architecture Overview

BigQuery Omni is a multi-cloud version of BigQuery.

BigQuery Omni is a fully managed multi-tenant system. Customers do not provision their own compute resources.

The BigQuery Omni control plane runs in Google data centers as part of Google Cloud. Customers interact with BigQuery Omni via the BigQuery API (or Google Cloud Console) hosted by the control plane in Google Cloud. The control plane (Google Cloud) and data plane (AWS or Azure) are in the same jurisdiction.

The data plane is a multi-tenant, distributed query execution engine that is fully managed by Google.

BigQuery does not permanently store table data outside of the customer-managed storage service on Azure or AWS. It accesses data stored in AWS S3 or Azure Blob storage owned by the customer as external tables.

Data Flow between Google and AWS or Azure

Queries

This section describes the dataflow for queries (e.g. SELECT * FROM). It also applies to DDL statements (e.g. CREATE EXTERNAL TABLE).

Dataflow between Google and AWS or Azure for queries.

Step Google Cloud AWS or Azure
1 BigQuery control plane receive query jobs from the customer via Cloud Console or the BigQuery CLI/API.
2 BigQuery control plane sends query jobs for processing to BigQuery data plane (on AWS/Azure)
3 BigQuery data plane receives query from the control plane through a VPN connection.
4 BigQuery data plane reads table data from customer owned storage buckets (AWS S3 or Azure Blob storage)
5 BigQuery data plane runs the query job on table data. Processing of table data occurs in the select AWS or Azure region
6 Query result (up to 2 MB) is transmitted from data plane to control plane via VPN connection.
7 BigQuery control plane receives query job results for display to customer in response to query job. This data is stored temporarily (up to 24 hrs).
8 Query result is returned to the user.

Export Queries

This section describes the dataflow for export queries (e.g. EXPORT DATA SELECT * FROM …).

Dataflow between Google and AWS or Azure for export queries.

Step Google Cloud AWS or Azure
1 BigQuery control plane receive export query jobs from the customer via Cloud Console or the BigQuery CLI/API. The query contains the destination path for the query result in customer owned storage buckets (AWS S3 or Azure Blob storage)
2 BigQuery control plane sends export query jobs for processing to BigQuery data plane (on AWS/Azure)
3 BigQuery data plane receives export query from the control plane through VPN connection
4 BigQuery data plane reads table data from customer owned storage buckets (AWS S3 or Azure Blob storage)
5 BigQuery data plane runs the query job on table data. Processing of table data occurs in the select AWS or Azure region
6 BigQuery writes the query result to the specified destination path in customer owned storage buckets (AWS S3 or Azure Blob storage)