INFORMATION_SCHEMA.SHARED_DATASET_USAGE view

The INFORMATION_SCHEMA.SHARED_DATASET_USAGE view contains the near real-time metadata about consumption of your shared dataset tables. To get started with sharing your data across organizations, see Analytics Hub.

Required roles

To get the permission that you need to query the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view, ask your administrator to grant you the BigQuery Data Owner (roles/bigquery.dataOwner) IAM role on your source project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the bigquery.datasets.listSharedDatasetUsage permission, which is required to query the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

You might also be able to get this permission with custom roles or other predefined roles.

Schema

The underlying data is partitioned by the job_start_time column and clustered by project_id and dataset_id.

The INFORMATION_SCHEMA.SHARED_DATASET_USAGE has the following schema:

Column name Data type Value
project_id STRING (Clustering column) The ID of the project that contains the shared dataset.
dataset_id STRING (Clustering column) The ID of the shared dataset.
table_id STRING The ID of the accessed table.
data_exchange_id STRING The resource path of the data exchange.
listing_id STRING The resource path of the listing.
job_start_time TIMESTAMP (Partitioning column) The start time of this job.
job_end_time TIMESTAMP The end time of this job.
job_id STRING The job ID. For example, bquxjob_1234.
job_project_number INTEGER The number of the project this job belongs to.
job_location STRING The location of the job.
linked_project_number INTEGER The project number of the subscriber's project.
linked_dataset_id STRING The linked dataset ID of the subscriber's dataset.
subscriber_org_number INTEGER The organization number in which the job ran. This is the organization number of the subscriber. This field is empty for projects that don't have an organization.
subscriber_org_display_name STRING A human-readable string that refers to the organization in which the job ran. This is the organization number of the subscriber. This field is empty for projects that don't have an organization.
num_rows_processed INTEGER The number of rows processed from this table by the job.
total_bytes_processed INTEGER The total bytes processed from this table by the job.

Data retention

The INFORMATION_SCHEMA.SHARED_DATASET_USAGE view contains running jobs and the job history of the past 180 days.

Scope and syntax

Queries against this view must include a region qualifier. If you don't specify a regional qualifier, metadata is retrieved from the US region. The following table explains the region scope for this view:

View Name Resource scope Region scope
[PROJECT_ID.]INFORMATION_SCHEMA.SHARED_DATASET_USAGE Project level US region
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE Project level REGION
Replace the following:

  • Optional: PROJECT_ID: the ID of your Google Cloud project. If not specified, the default project is used.

Examples

To run the query against a project other than your default project, add the project ID in the following format:

PROJECT_ID.region-REGION_NAME.INFORMATION_SCHEMA.SHARED_DATASET_USAGE

For example, myproject.region-us.INFORMATION_SCHEMA.SHARED_DATASET_USAGE.

Get the total number of jobs executed on all shared tables

The following example calculates total jobs run by subscribers for a project:

SELECT
  COUNT(DISTINCT job_id) AS num_jobs
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE

The result is similar to the following:

+------------+
| num_jobs   |
+------------+
| 1000       |
+------------+

To check the total jobs run by subscribers, use the WHERE clause:

  • For datasets, use WHERE dataset_id = "...".
  • For tables, use WHERE dataset_id = "..." AND table_id = "...".

Get the most used table based on the number of rows processed

The following query calculates the most used table based on the number of rows processed by subscribers.

SELECT
  dataset_id,
  table_id,
  SUM(num_rows_processed) AS usage_rows
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
GROUP BY
  1,
  2
ORDER BY
  3 DESC
LIMIT
  1

The output is similar to the following:

+---------------+-------------+----------------+
| dataset_id    | table_id      | usage_rows     |
+---------------+-------------+----------------+
| mydataset     | mytable     | 15             |
+---------------+-------------+----------------+

Find the top organizations that consume your tables

The following query calculates the top subscribers based on the number of bytes processed from your tables. You can also use the num_rows_processed column as a metric.

SELECT
  subscriber_org_number,
  ANY_VALUE(subscriber_org_display_name) AS subscriber_org_display_name,
  SUM(total_bytes_processed) AS usage_bytes
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
GROUP BY
  1

The output is similar to the following:

+--------------------------+--------------------------------+----------------+
|subscriber_org_number     | subscriber_org_display_name    | usage_bytes    |
+-----------------------------------------------------------+----------------+
| 12345                    | myorganization                 | 15             |
+--------------------------+--------------------------------+----------------+

For subscribers without an organization, you can use job_project_number instead of subscriber_org_number.

Get usage metrics for your data exchange

If your data exchange and source dataset are in different projects, follow these step to view the usage metrics for your data exchange:

  1. Find all listings that belong to your data exchange.
  2. Retrieve the source dataset attached to the listing.
  3. To view the usage metrics for your data exchange, use the following query:
SELECT
  *
FROM
  source_project_1.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
WHERE
  dataset_id='source_dataset_id'
AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"
UNION ALL
SELECT
  *
FROM
  source_project_2.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
WHERE
  dataset_id='source_dataset_id'
AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"