Bringing multi-cloud analytics to your data with BigQuery Omni
Debanjan Saha
General Manager and Vice President of Engineering, Data Analytics
Editor’s note: BigQuery Omni is now generally available. For the most up to date information, please read our BigQuery Omni GA blog here.
Today, we are introducing BigQuery Omni, a flexible, multi-cloud analytics solution that lets you cost-effectively access and securely analyze data across Google Cloud, Amazon Web Services (AWS), and Azure (coming soon), without leaving the familiar BigQuery user interface (UI). Using standard SQL and the same BigQuery APIs our customers love, you will be able to break down data silos and gain critical business insights from a single pane of glass. And because BigQuery Omni is powered by Anthos, you will be able to query data without having to manage the underlying infrastructure.
A recent Gartner research survey on cloud adoption revealed that more than 80% of respondents using the public cloud were using more than one cloud service provider (CSP)1. While data is a critical component of decision making across organizations, for many, this data is scattered across multiple public clouds. BigQuery Omni is an extension of our continued innovation and commitment to multi-cloud, bringing you the best analytics and data warehouse technology, no matter where your data is stored.
How BigQuery Omni works
The cost of moving data between cloud providers isn’t sustainable for many businesses, and it’s still difficult to seamlessly work across clouds. BigQuery Omni represents a new way of analyzing data stored in multiple public clouds, which is made possible by BigQuery's separation of compute and storage. By decoupling these two, BigQuery provides scalable storage that can reside in Google Cloud or other public clouds, and stateless resilient compute that executes standard SQL queries. Until now, though, in order to use BigQuery, your data had to be stored in Google Cloud.
While competitors will require you to move or copy your data from one public cloud to another, where you might incur egress costs, this is not the case with BigQuery Omni. The same BigQuery interface on Google Cloud will let you query the data that you have stored in Google Cloud, AWS and Azure without any cross-cloud movement or copies of data. BigQuery Omni’s query engine runs the necessary compute on clusters in the same region where your data resides. For example, you can use BigQuery Omni to query Google Analytics 360 Ads data that’s stored in Google Cloud, and also query logs data from your e-commerce platform and applications that are stored in AWS S3. Then, using Looker, you can build a dashboard that allows you to visualize your audience behavior and purchases alongside your advertising spend.
BigQuery Omni runs on Anthos clusters that are fully managed by Google Cloud, allowing you to securely execute queries on other public clouds. Our Anthos hybrid and multi-cloud application platform allowed us to build, deploy, and manage the BigQuery query engine (Dremel) on multiple clouds. When developing BigQuery Omni, we knew that a consistent and unified operations experience was critical to supporting our customers. Here’s what the architecture looks like:
With BigQuery Omni, you can:
Break down silos and gain insights on data. Power your business across clouds with a flexible, multi-cloud analytics solution. There’s no need to move or copy data from other public clouds into Google Cloud for analysis. Tap into the power of BigQuery to cost-efficiently break down data silos and make analytics work for you.
Get a consistent data experience across clouds. Enjoy a unified analytics experience across your datasets, in Google Cloud, AWS and Azure (coming soon). Use standard SQL and BigQuery’s familiar interface to write queries and build dashboards across your data. Quickly answer questions and share results from a single interface.
Enable flexibility powered by Anthos. Securely run analytics to another public cloud with a fully managed infrastructure, powered by Anthos. This means that you can query data without worrying about the underlying infrastructure. Compute resources run in the same cloud region data is stored, allowing you to have a completely seamless data analysis experience.
Getting started with an already familiar interface in BigQuery Omni
Start in the BigQuery UI on Google Cloud, choose the public cloud region where your data is located, and run your query. There’s no need to format or transform your data—BigQuery Omni supports Avro, CSV, JSON, ORC, and Parquet. You don’t need to move or copy your raw data out of the other public cloud, manage clusters, or provision resources. Computation occurs within BigQuery’s multi-tenant service running on the AWS region where the data is currently located.
Behind the scenes, BigQuery’s query engine is running on our Anthos clusters within the BigQuery managed service. BigQuery gets the data from data storage within your account once you’ve authorized permissions via your other public clouds’ IAM roles. Note that data is moved temporarily within AWS from your data storage to the BigQuery clusters running on Anthos to execute queries.
Choose to have the query results returned to Google Cloud to see them in the BigQuery UI.
Or, you can export the results directly back to your data storage, with no cross-cloud move of results or data.
BigQuery Omni is currently in private alpha. If you’re interested in trying it out, fill out this form. And check out our Google Cloud Next ‘20: OnAir session in August: Analytics in a multi-cloud world.
1. Gartner, The Future of Cloud Data Management is Multicloud, December 2019