Introducing Analytics Hub: secure and scalable sharing for data and analytics
Debanjan Saha
General Manager and Vice President of Engineering, Data Analytics
Brian Welcker
Director, Product Management
Customers tell us that sharing and exchanging data with other organizations is a critical element of their analytics strategy, but it’s hamstrung by unreliable data and processes, and only getting harder with security threats and privacy regulations on the rise.
Furthermore, traditional data sharing techniques use batch data pipelines that are expensive to run, create late arriving data, and can break with any changes to the source data. They also create multiple copies of data, which brings unnecessary costs and can bypass data governance processes. These techniques do not offer features for data monetization, such as managing subscriptions and entitlements. Altogether, these challenges mean that organizations are unable to realize the full potential of transforming their business with shared data.
To address these limitations, we are introducing Analytics Hub, a new fully managed service, available in Q3, in preview, that helps you unlock the value of data sharing, leading to new insights and increased business value. With Analytics Hub you get:
A rich data ecosystem by publishing and subscribing to analytics-ready datasets.
Control and monitoring over how your data is being used, because data is shared in one place.
A self-service way to access valuable and trusted data assets, including data provided by Google. For example, a unique dataset from Google Search Trends will be available, that you can query and combine with your own data.
An easy way to monetize your data assets without the overhead of building and managing the infrastructure.
Built on a decade of cross-organizational sharing
While Analytics Hub is a new service, it builds on BigQuery, Google’s petabyte-scale, serverless cloud data warehouse. BigQuery’s unique architecture provides separation between compute and storage, enabling data publishers to share data with as many subscribers as you want without having to make multiple copies of your data. With BigQuery, there are no servers to deploy or manage, which means that data consumers get immediate value from shared data. Data can be provided and consumed in real-time using the streaming capabilities of BigQuery and you can leverage the built in machine learning, geospatial, and natural language capabilities of BigQuery or take advantage of the native business intelligence support with tools like Looker, Google Sheets, and Data Studio.
BigQuery has had cross-organizational, in-place data sharing capabilities since it was introduced in 2010. We took a look at usage metrics in BigQuery and found that over a 7 day period in April, we had over 3,000 different organizations sharing over 200 petabytes of data. These numbers don’t include data sharing between departments within the same organization.
As you can see, data sharing in BigQuery is already popular. But we want to make it easier and even more scalable.
Raising the bar on data sharing
To make data sharing easier and more scalable in BigQuery, Analytics Hub introduces the concepts of shared datasets and exchanges. As a data publisher, you create shared datasets that contain the views of data that you want to deliver to your subscribers. Next, you create exchanges, which are used to organize and secure shared datasets. By default, exchanges are completely private, which means that only the users and groups that you give access to can view or subscribe to the data. You can also create internal exchanges or leverage public exchanges provided by Google. Finally, you publish shared datasets into an exchange to make them available to subscribers.
Data subscribers search through the datasets that are available across all exchanges for which they have access and subscribe to relevant datasets. This creates a linked dataset in their project that they can query and join with their own data. Subscribers pay for the queries that they run against the data while the publisher pays for the storage of the data. Data providers can add new data, new tables, or new columns to the shared dataset and these will be immediately available to subscribers. In addition, the publisher can track subscribers, disable subscriptions, and see aggregated usage information for the shared data.
Analytics Hub makes it easy for you to publish, discover, and subscribe to valuable datasets that you can combine with your own data to derive unique insights. Here are some types of data that will be available through Analytics Hub:
Public datasets: Easy access to the existing repository of over 200 public datasets, including data about weather and climate, cryptocurrency, healthcare and life sciences, and transportation.
Google datasets: Unique, freely-available datasets from Google. One example of this is the COVID-19 community mobility dataset. Another example is the forthcoming Google Trends dataset, which will provide the top 25 search terms and top 25 rising search terms over a 5 year window in 210 distinct locations in the US. Trends data can be used by everyone in the organization to gain insights into what customers care about.
Commercial (paid for) datasets: We are working with leading commercial data providers to bring their data products to Analytics Hub. If you are interested in delivering your data via Analytics Hub, we’re also introducing Data Gravity, an initiative that provides storage benefits and new distribution paths for data published through Analytics Hub.
Internal datasets: We know that data sharing can be challenging in larger organizations. Analytics Hub can be used for internal data, for example, to share standardized customer demographics with your sales engineering and data science teams.
Customers and partners using Analytics Hub
“Google Search Trends data has always been an important tool for our WPP agency data teams. At WPP we believe that data variety is a superpower which is why we are excited to use the new Trends dataset availability within BigQuery, plus the launch of Analytics Hub. The best creativity in the world is informed by data insights, and influenced by what people search for, so the operational efficiencies we’ll gain via the Analytics Hub and the insights we can drive with Trends data are just phenomenal.”
—Di Mayze Global Head of Data and AI, WPP
“Equifax Ignite is our shared data analytics environment within our Equifax data fabric. We are excited to partner with Google to leverage Analytics Hub and BigQuery to deliver data to over 400 statisticians and data modelers as well as securely sharing data with our partner financial institutions.”
—Kumar Menon, SVP Data Fabric and Decision Science, Equifax
"The flow of data and insights between our teams at Deloitte and our clients is paramount for building truly transformational data cultures. With its purpose-built architecture for secure data exchanges and sharing analytics resources, Google Cloud’s Analytics Hub can help provide significant operational efficiencies for how Deloitte teams support our clients' data-driven initiatives within their industry ecosystems. It will also help minimize the worries about scale, privacy and security, or the administrative burden associated with each."
—Navin Warerkar, Managing Director, Deloitte Consulting LLP, and US Google Cloud Data & Analytics GTM Lead
"Crux Informatics is proud to partner with Google to support the launch of Analytics Hub, removing friction for those who need access to analytics-ready data. With thousands of datasets from over 140 sources, Crux Informatics will accelerate access to data on Analytics Hub and together provide a more efficient and cost effective solution to deliver datasets in Google Cloud’s ecosystem.”
—Will Freiberg, CEO, Crux Informatics
Next steps for Analytics Hub
This is just the beginning for Analytics Hub. As we get to preview and general availability, we will be adding additional capabilities, including workflows for publishing and subscribing, publishing analytics assets (Looker Blocks, Data Studio reports, Connected Google Sheets) along with the shared data, the ability for data publishers to specify query restrictions on the usage of their data, and making it easy for data publishers to create sandbox environments for subscribers to work with their data, even if they are not yet on Google Cloud. We will provide features in Analytics Hub for monetization of data, including managing subscriptions, data entitlements, and billing.
Please sign up for the preview, which is scheduled to be available in the third quarter of 2021. In the meantime, you can learn more about BigQuery and how to leverage its built-in data sharing capabilities. Please go to g.co/cloud/analytics-hub to register your interest in Analytics Hub.