Stay organized with collections Save and categorize content based on your preferences.

What is BigQuery Omni?

BigQuery Omni lets you run BigQuery analytics on data stored in Amazon S3 or Azure blob storage.

Many organizations store data in multiple public clouds. Often, this data ends up being siloed, because it's hard to get insights across all of the data. You want to be able to analyze the data with a multi-cloud data tool that is inexpensive, fast, and does not create additional overhead of decentralized data governance. By using BigQuery Omni, we reduce these frictions with a unified interface.

BigQuery Omni brings the BigQuery analytics engine to your data where it resides. This enables you to access and analyze data without moving or copying any data. It also lets you move data between clouds to combine data across clouds using cross-cloud transfer.

BigQuery Omni offers a cross-cloud analytics solution with the ability to analyze data where it is and the flexibility to replicate data when necessary.

How it works

BigQuery's architecture separates compute from storage, allowing BigQuery to scale out as needed to handle very large workloads. BigQuery Omni extends this architecture by running the BigQuery query engine in other clouds. As a result, you don't have to physically move data into BigQuery storage. Processing happens where that data already sits.

BigQuery Omni architecture

Query results can be returned to Google Cloud over a secure connection — for example, to be displayed in the Google Cloud console. Alternatively, you can write the results directly to Amazon S3 storage or Azure blob storage. In that case, there is no cross-cloud movement of the query results.

BigQuery Omni uses standard AWS IAM roles or Azure Active Directory principles to access the data in your subscription. You delegate read or write access to BigQuery Omni, and you can revoke access at any time.

Data Flow between Google and AWS or Azure

The following image describes the dataflow for queries— the SELECT statement. It also applies to DDL statements—CREATE EXTERNAL TABLE.

Dataflow between Google and AWS or Azure for queries.

Step Google Cloud AWS or Azure
1 BigQuery control plane receive query jobs from the customer via Cloud console or the BigQuery CLI/API.
2 BigQuery control plane sends query jobs for processing to BigQuery data plane (on AWS/Azure)
3 BigQuery data plane receives query from the control plane through a VPN connection.
4 BigQuery data plane reads table data from customer owned storage buckets (AWS S3 or Azure Blob storage)
5 BigQuery data plane runs the query job on table data. Processing of table data occurs in the select AWS or Azure region
6 Query result (up to 2 MB) is transmitted from data plane to control plane via VPN connection.
7 BigQuery control plane receives query job results for display to customer in response to query job. This data is stored temporarily (up to 24 hrs).
8 Query result is returned to the user.

The following image describes the dataflow for export queries—EXPORT DATA.

Dataflow between Google and AWS or Azure for export queries.

Step Google Cloud AWS or Azure
1 BigQuery control plane receive export query jobs from the customer via Cloud console or the BigQuery CLI/API. The query contains the destination path for the query result in customer owned storage buckets (AWS S3 or Azure Blob storage)
2 BigQuery control plane sends export query jobs for processing to BigQuery data plane (on AWS/Azure)
3 BigQuery data plane receives export query from the control plane through VPN connection
4 BigQuery data plane reads table data from customer owned storage buckets (AWS S3 or Azure Blob storage)
5 BigQuery data plane runs the query job on table data. Processing of table data occurs in the select AWS or Azure region
6 BigQuery writes the query result to the specified destination path in customer owned storage buckets (AWS S3 or Azure Blob storage)

Benefits of BigQuery Omni

Performance. You can get insights faster, because data is not copied across clouds, and queries run in the same region where your data resides.

Cost. You save on network egress costs because the data doesn't move. There are no additional charges to your AWS or Azure account related to BigQuery Omni analytics, because the queries run on clusters managed by Google. You are only billed for running the queries, using the BigQuery pricing model.

Security and data governance. You manage the data in your own AWS or Azure subscription. You don't need to move or copy the raw data out of your public cloud. All computation happens in the BigQuery multi-tenant service which runs within the same region as your data.

Serverless architecture. Like the rest of BigQuery, BigQuery Omni is a serverless offering. Google deploys and manages the clusters that run BigQuery Omni. You don't need to provision any resources or manage any clusters.

Ease of management. BigQuery Omni provides a unified management interface through Google Cloud. BigQuery Omni can use your existing Google Cloud account and BigQuery projects. You can write a Google Standard SQL query in the Google Cloud console to query data in AWS or Azure, and see the results displayed in the Google Cloud console.

Cross-cloud transfer. You can load data into native BigQuery tables from S3 buckets and Azure blob storage. For more information, see Cross-cloud transfer (AWS) and Cross-cloud transfer (Azure).

Limitations

BigQuery Omni limitations include the following:

  • You can't create standard tables in BigQuery Omni. BigQuery Omni only supports external tables.
  • All limitations for external tables apply to BigQuery Omni external tables.
  • The OBJECT_PRIVILEGES, STREAMING_TIMELINE_BY_*, and TABLE_SNAPSHOTS BigQuery INFORMATION_SCHEMA views are not available for BigQuery Omni tables.
  • Joins with other INFORMATION_SCHEMA tables and other external tables in aws-us-east-1 or azure-eastus2 are not supported.
  • Materialized views for BigQuery Omni external tables are not supported.
  • The following SQL statements are not supported:

    • BigQuery ML statements.
    • Data definition language (DDL) statements that require data managed in BigQuery. For example, CREATE EXTERNAL TABLE, CREATE SCHEMA, or CREATE RESERVATION are supported but CREATE MATERIALIZED VIEW is not.
    • Data manipulation language (DML) statements.
  • The limit for maximum result size for a query is 10 GB (preview).

  • The quota for total query result sizes for a project is 1 TB per day (preview).

  • The following limitations apply on querying and reading destination temporary tables (preview):

    • Querying destination temporary tables with the SELECT SQL statement is not supported.
    • Using the BigQuery Storage Read API to read data from destination temporary tables is not supported.
    • When using the ODBC driver, high-throughput reads (EnableHTAPI option) is not supported.
  • Scheduled queries are only supported through the API or CLI.

  • The destination table option is disabled for queries in BigQuery Omni. Only EXPORT queries are allowed.

  • BigQuery Storage API is not available in BigQuery Omni regions.

Pricing

For information about pricing and limited-time offers in BigQuery Omni, see BigQuery Omni pricing.

Location

BigQuery Omni processes queries in the same location as the dataset that contains the tables you're querying. After you create the dataset, the location cannot be changed. Your data resides within your own AWS or Azure account.

Supported regions

Region description Region name
AWS
AWS - US East (N. Virginia) aws-us-east-1
Azure
Azure - East US 2 azure-eastus2

What's next