Introduction to Analytics Hub

Analytics Hub is a data exchange platform that enables you to share data and insights at scale across organizational boundaries with a robust security and privacy framework. With Analytics Hub, you can discover and access a data library curated by various data providers. This data library also includes Google-provided datasets.

For example, by using Analytics Hub you can augment your analytics and ML initiatives with third-party and Google datasets.

As an Analytics Hub user, you can perform the following tasks:

  • As an Analytics Hub publisher, you can monetize data by sharing it with your partner network or within your own organization in real time. Listings let you share data without replicating the shared data. You can build a catalog of analytics-ready data sources with granular permissions that let you deliver data to the right audiences. You can also manage subscriptions and view the usage metrics for your listings.

  • As an Analytics Hub subscriber, you can discover the data that you are looking for, combine shared data with your existing data, and leverage the built-in features of BigQuery. When you subscribe to a listing, a linked dataset or linked Pub/Sub subscription is created in your project. You can manage your subscriptions by using the Subscription resource, which stores relevant information about the subscriber and represents the connection between publisher and subscriber.

  • As an Analytics Hub viewer, you can browse through the datasets that you have access to in Analytics Hub and request the publisher to access the shared data.

  • As an Analytics Hub administrator, you can create data exchanges that enable data sharing, and then give permissions to data publishers and subscribers to access these data exchanges.

For more information about Analytics Hub user roles, see Configure Analytics Hub roles.

Architecture

Analytics Hub is built on a publish and subscribe model of Google Cloud data resources, allowing for zero-copy sharing in place. Analytics Hub supports the following Google Cloud resources:

  • BigQuery datasets
  • Pub/Sub topics

The publisher and subscriber workflows in Analytics Hub are explained in detail in the following sections.

Publisher workflow

The following diagram describes how publishers interact with Analytics Hub:

Interaction
between Analytics Hub publishers and Analytics Hub.
Figure 1. Analytics Hub Publisher workflow.

In figure 1, the following features are labeled: Shared resources, Data exchange, and Listing.

Shared resources

Shared resources are the unit of sharing by a publisher in Analytics Hub.

Shared datasets
A shared dataset is a BigQuery dataset that is the unit of data sharing in Analytics Hub. The separation of compute and storage in BigQuery's architecture enables data publishers to share datasets with as many subscribers as they want without having to make multiple copies of the data. As a publisher, you create or use an existing BigQuery dataset in your project with the following supported objects that you want to deliver to your subscribers: Shared datasets support column-level and row-level security.
Shared topics (Preview)
A shared topic is a Pub/Sub topic that is the unit of streaming data sharing in Analytics Hub. As a publisher, you create or use an existing Pub/Sub in your project and distribute that with your subscribers.

Data exchanges

A data exchange is a container that enables self-service data sharing. It contains listings that reference shared resources. With Analytics Hub, publishers and administrators can grant access to subscribers at the exchange and the listing level. This method helps to avoid granting access on the underlying shared resources explicitly. An Analytics Hub subscriber can browse through data exchanges, discover data that they can access, and subscribe to shared resources. A data exchange can be of the following types:
  • Private data exchange. By default, a data exchange is private and only users or groups that have access to that exchange can view or subscribe to its listings.
  • Public data exchange. By default, a data exchange is private and only users or groups that have access to that exchange can view or subscribe to its listings. However, you can choose to make a data exchange public. Listings in public data exchanges can be discovered and subscribed by Google Cloud users (allAuthenticatedUsers). For more information about public data exchanges, see Make a data exchange public.

As an Analytics Hub administrator, you can create multiple data exchanges in Analytics Hub, and manage other Analytics Hub users.

Listings

A listing is a reference to a shared resource that a publisher lists in a data exchange. As a publisher, you can create a listing and specify the resource description, sample queries to run or sample message data, links to any relevant documentation, and any additional information that can help subscribers to use your shared resource. For more information, see Manage listings. A listing can be of the following two types based on the Identity and Access Management (IAM) policy that is set for the listing and the type of data exchange that contains the listing:
  • Public listing. It is shared with all Google Cloud users (allAuthenticatedUsers). Listings in a public data exchange are public listings. These listings can be references of a free public resource or a commercial resource. If the listing is of a commercial resource, subscribers can request access to the listing and the data provider contacts those subscribers directly.
  • Private listing. It is shared directly with individuals or groups. For example, a private listing can reference marketing metrics dataset that you share with other internal teams within your organization.

Subscriber workflow

The following diagram describes how subscribers interact with Analytics Hub:

Interaction
between Analytics Hub subscribers and Analytics Hub.
Figure 2. Analytics Hub Subscriber workflow.

In figure 2, the following Analytics Hub features are labeled: Shared resources, Data exchange, Listing, and Linked resources.

Linked resources

Linked resources are created when subscribing to an Analytics Hub listing, connecting a subscriber to the underlying shared resource.

Linked datasets
A linked dataset is a read-only BigQuery dataset that serves as a pointer or reference to a shared dataset. Subscribing to a listing creates a linked dataset in your project and not a copy of the dataset, so subscribers can read the data but cannot add or update objects within it. When you query objects such as tables and views through a linked dataset, the data from the shared dataset is returned. For more information about linked datasets, see View and subscribe to listings. Linked datasets are authorized to access tables and views of a shared dataset. Subscribers with linked datasets access tables and views of a shared dataset without any additional Identity and Access Management authorization. Linked datasets supports the following objects:
Linked Pub/Sub subscriptions (Preview)
Subscribing to a listing with a shared topic creates a linked Pub/Sub subscription in the subscriber project. No copies of the shared topic or message data are created. Subscribers of the linked Pub/Sub subscription can access the messages published to the shared topic. Subscribers access message data of a shared topic without any additional Identity and Access Management authorization. Publishers can manage subscriptions both in Pub/Sub directly or through Analytics Hub subscription management. For more information about linked Pub/Sub subscriptions, see Stream sharing with Pub/Sub.

Data egress options (BigQuery shared datasets only)

Data egress options let publishers restrict the export by subscribers of data out of BigQuery linked datasets.

Publishers can enable data egress restriction on a listing, the results of a query, or both. When data egress is restricted, the following restrictions are applied:

When you create a listing, you can set the appropriate data egress options.

Limitations

Analytics Hub has the following limitations:

  • A shared dataset can have a maximum of 1,000 linked datasets.

  • A shared topic can have a maximum of 10,000 Pub/Sub subscriptions. This limit includes linked Pub/Sub subscriptions and Pub/Sub subscriptions created outside of Analytics Hub (e.g. directly from Pub/Sub).

  • A dataset with unsupported resources cannot be selected as a shared dataset when you create a listing. For more information about the BigQuery objects that Analytics Hub supports, see Shared datasets in this document.

  • You can't set IAM roles or IAM policies on individual tables within a linked dataset. Apply them at the linked dataset level instead.

  • Linked datasets created before July 25, 2023 are not backfilled by the subscription resource. Only subscriptions created after July 25, 2023 work with the API methods.

  • If you are a publisher, the following BigQuery interoperability limitations apply:

    • Subscribers must be given explicit permissions to read the source dataset in order to be able to query views within linked datasets. To grant access to views, as a best practice publishers should create authorized views. Authorized views can grant subscribers access to the view data without giving them access to the underlying source data.

    • The query plan reveals the shared view query and the routine query, including project IDs, and other datasets involved in authorized views. Never include anything such as encryption keys that you consider sensitive in the shared view or routine query.

    • Shared datasets are indexed in Data Catalog. Updates on a shared dataset, such as adding tables or views, are made available to subscribers without any delay. However, in certain scenarios, for example, when there are more than one hundred subscribers or tables in a shared dataset, the updates might take up to 18 hours to get indexed in Data Catalog. Due to the delay in indexing, subscribers cannot search for these updated resources in the Google Cloud console immediately.

    • Shared topics are indexed in Data Catalog, but you cannot filter specifically for its resource type.

    • If you have set up row-level security or data masking policies on the tables that are listed, then subscribers must be an Enterprise or Enterprise Plus customer to run the query job on linked dataset. For information about editions, see Introduction to BigQuery editions.

  • If you are a subscriber, the following BigQuery interoperability limitations apply:

    • Materialized views that refer to tables in the linked dataset are not supported.

    • Taking snapshots of linked dataset tables is not supported.

    • Queries with linked datasets and JOIN statements that are larger than 1 TB (physical storage) might fail. You can contact support to resolve this issue.

    • You cannot use region qualifiers with INFORMATION_SCHEMA views to view metadata for your linked dataset.

    • When querying for routines in a linked dataset, you can only query for User-defined functions (both SQL and Javascript UDFs) and Table functions routine types. Querying for an unsupported routine type results in the error message: Querying routine type type is not yet supported on linked dataset dataset.

  • The following limitations apply for the usage metrics:

    • You can't get the usage metrics for listings that were subscribed before July 20, 2023.

    • External table usage metrics for the num_rows_processed and total_bytes_processed fields might contain inaccurate data.

    • Usage metrics for consumption are supported only for usage via BigQuery jobs. Consumption by using the following resources is not supported:

    • Usage metrics for views are only populated for queries after April 22, 2024.

    • Usage metrics are not captured for linked Pub/Sub subscriptions in Analytics Hub (you can continue to see usage directly in Pub/Sub).

  • The following limitations apply when subscribing to Salesforce Data Cloud data:

    • Data Cloud data is shared as views. As a subscriber, you can't access the underlying tables that the views reference.

Supported regions

Analytics Hub is supported is the following regions and multi-regions.

Regions

The following table lists the regions in the Americas where Analytics Hub is available.
Region description Region name Details
Columbus, Ohio us-east5
Dallas us-south1 leaf icon Low CO2
Iowa us-central1 leaf icon Low CO2
Las Vegas us-west4
Los Angeles us-west2
Montréal northamerica-northeast1 leaf icon Low CO2
Northern Virginia us-east4
Oregon us-west1 leaf icon Low CO2
Salt Lake City us-west3
São Paulo southamerica-east1 leaf icon Low CO2
Santiago southamerica-west1
South Carolina us-east1
Toronto northamerica-northeast2
The following table lists the regions in Asia Pacific where Analytics Hub is available.
Region description Region name Details
Delhi asia-south2
Hong Kong asia-east2
Jakarta asia-southeast2
Melbourne australia-southeast2
Mumbai asia-south1
Osaka asia-northeast2
Seoul asia-northeast3
Singapore asia-southeast1
Sydney australia-southeast1
Taiwan asia-east1
Tokyo asia-northeast1
The following table lists the regions in Europe where Analytics Hub is available.
Region description Region name Details
Belgium europe-west1 leaf icon Low CO2
Finland europe-north1 leaf icon Low CO2
Frankfurt europe-west3 leaf icon Low CO2
London europe-west2 leaf icon Low CO2
Netherlands europe-west4 leaf icon Low CO2
Warsaw europe-central2
Zürich europe-west6 leaf icon Low CO2
The following table lists the regions in the Middle East where Analytics Hub is available.
Region description Region name Details
Dammam me-central2
Tel Aviv me-west1
The following table lists the regions in Africa where Analytics Hub is available.
Region description Region name Details
Johannesburg africa-south1

Multi-regions

The following table lists the multi-regions where Analytics Hub is available.
Multi-region description Multi-region name
Data centers within member states of the European Union1 EU
Data centers in the United States US

1 Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.

Omni regions

The following table lists the Omni where Analytics Hub is available.
Omni region description Omni region name
AWS
AWS - US East (N. Virginia) aws-us-east-1
AWS - US West (Oregon) aws-us-west-2
AWS - Asia Pacific (Seoul) aws-ap-northeast-2
AWS - Asia Pacific (Sydney) aws-ap-southeast-2
AWS - Europe (Ireland) aws-eu-west-1
AWS - Europe (Frankfurt) aws-eu-central-1
Azure
Azure - East US 2 azure-eastus2

Example use case

This section shows an example of how you can use Analytics Hub.

Suppose you are a retailer and your organization has real-time demand forecasting data in a Google Cloud project named Forecasting. You want to share this demand forecasting data with hundreds of vendors in your supply-chain system. Here's how you can share your data with vendors through Analytics Hub:

Analytics Hub administrators

As the owner of the Forecasting project, you must first enable the Analytics Hub API and then assign the Analytics Hub Admin role to a user who administers the data exchange in the project. Users with the Analytics Hub Admin role are called the Analytics Hub administrators.

An Analytics Hub administrator can perform the following tasks:

  • Create, update, delete, and share the data exchange in your organization's Forecasting project.

  • Manage other Analytics Hub administrators.

  • Manage publishers by granting the Analytics Hub Publisher role to your organization's employees. If you want some employees to only be able to update, delete, and share listings but not create them, then you can grant them the Analytics Hub Listing Admin role.

  • Manage subscribers by granting the Analytics Hub Subscriber role to a Google group consisting of all vendors. If you want some vendors to only have view access to the available exchanges and listings then you can grant them the Analytics Hub Viewer role. These vendors are not able to subscribe to listings.

For more information, see Manage data exchanges.

Analytics Hub publishers

Publishers create the following listings for their datasets in the Forecasting project or in a different project:

  • Listing A: Demand Forecast Dataset 1
  • Listing B: Demand Forecast Dataset 2
  • Listing C: Demand Forecast Dataset 3

As a data provider, you can track the usage metrics for your shared dataset. The usage metrics include the following details:

  • Jobs that run against your shared dataset.
  • The consumption details of your shared dataset by subscribers' projects and organization.
  • The number of rows and bytes processed by the job.

For more information, see Manage listings.

Analytics Hub subscribers

Subscribers can browse through listings that they have access to in data exchanges. They can also subscribe to these listings and add these datasets to their projects by creating a linked dataset. Vendors can then run queries on these linked datasets and retrieve results in real time.

For more information, see View and subscribe to listings.

Pricing

There is no additional cost for managing data exchanges or listings.

For BigQuery datasets, Analytics Hub publishers are charged for data storage, whereas subscribers pay for queries that run against the shared data based on either on-demand or capacity-based pricing model. For information about pricing, see BigQuery pricing.

For Pub/Sub, topic publishers are charged for the total number of bytes written (publish throughput) to the shared topic and network egress (if applicable). Subscribers are charged for the total number of bytes read (subscribe throughput) from the linked subscription and network egress (if applicable). See Pub/Sub pricing for additional details.

Quotas

For information about Analytics Hub quotas, see Quotas and limits.

VPC Service Controls

You can set the ingress and egress rules needed to let publishers and subscribers access data from projects that have VPC Service Controls perimeters. For more information, see Analytics Hub VPC Service Controls rules.

What's next