Authorized datasets

This document describes how to use authorized datasets in BigQuery. An authorized dataset lets you authorize all of the views in a specified dataset to access the data in a second dataset. With an authorized dataset, you don't need to configure individual authorized views.

You can create an authorized dataset in BigQuery by using the following:

Overview

A view in BigQuery is a virtual table defined by a SQL query. For example, a view's query might return only a subset of the columns of a table, excluding columns that contain personal identifiable information (PII). To query a view, a user needs to have access to the resources that are accessed by the view's query.

If you want to let users query a view, without giving them direct access to the resources referenced by the view, you can use an authorized view. An authorized view lets you, for example, share more limited data in a view with specified groups or users (principals), without giving the principals access to all of the underlying data. You do this by giving the principals access to the view, and by giving the view access to the dataset that contains the underlying data.

If you want to give a collection of views access to a dataset, without having to authorize each individual view, you can group the views together into a dataset, and then give the dataset that contains the views access to the dataset that contains the data. You can then give principals access to the dataset with the group of views, or to individual views in the dataset, as needed. A dataset that has access to another dataset is called an authorized dataset. The dataset that authorizes another dataset to access its data is called the shared dataset.

Required permissions and roles

To authorize a dataset, or to revoke a dataset's authorization, you must have the following Identity and Access Management (IAM) permissions, which let you update the access control list of the dataset you are sharing.

After a dataset is authorized, you need these same permissions if you want to create or update views in the authorized dataset. For more information, see Create or update a view in an authorized dataset.

Permission Resource
bigquery.datasets.get The dataset you are sharing.
bigquery.datasets.update The dataset you are sharing.

The following predefined IAM roles provide the required permissions.

Role Description
bigquery.dataOwner BigQuery Data Owner
bigquery.admin BigQuery Admin

Quotas and limits

Authorized datasets are subject to dataset limits. For more information, see Dataset limits.

Authorize a dataset

You can authorize a dataset's current and future views to access another dataset by adding the dataset you want to authorize to the access list of the dataset you want to share, as follows:

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and select the dataset that contains the data you want to share.

  3. Click the Actions icon and select Open.

  4. In the details pane that appears, click Sharing and select the Authorize Datasets option.

    Authorize a dataset

  5. In the Authorized dataset pane that appears, enter the Dataset ID of the dataset that you want to authorize, in the following format:

    PROJECT.AUTHORIZED_DATASET

    For example:

    myProject.myDataset

  6. Click Add Authorization and then click Close.

bq

  1. Open the Cloud Shell:

    Go to Cloud Shell

  2. Write the existing metadata (including the access control list) for the dataset you want to share into a JSON file by using the bq show command.

    bq show --format=prettyjson PROJECT:SHARED_DATASET > FILE_PATH
    
  3. Use a text editor to add the dataset that you want to authorize into the existing access section of the JSON file that was created at FILE_PATH.

    For example:

    "access": [
     ...
     {
       "dataset": {
         "dataset": {
           "project_id": "PROJECT",
           "dataset_id": "AUTHORIZED_DATASET"
         },
         "target_types": "VIEWS"
       }
     }
    ]
    

  4. Update the shared dataset by using the bq update command. For example:

    bq update --source FILE_PATH PROJECT:SHARED_DATASET
    
  5. To verify that the authorized dataset has been added, enter the bq show command again. For example:

    bq show --format=prettyjson PROJECT:SHARED_DATASET
    

API

  1. Get the current metadata for the dataset you want to share by calling the datasets.get method, as follows:

    GET https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/datasets/SHARED_DATASET
    

    The response body returns a Dataset resource that contains JSON metadata for the dataset.

  2. Add the dataset that you want authorize into the access section of the JSON metadata that was returned in the Dataset resource as follows:

    "access": [
     ...
     {
       "dataset": {
         "dataset": {
           "project_id": "PROJECT",
           "dataset_id": "AUTHORIZED_DATASET"
         },
         "target_types": "VIEWS"
       }
     }
    ]
    
  3. Use the datasets.update method to update the dataset with the added authorization:

    PUT https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/datasets/SHARED_DATASET
    

    Include the updated Dataset resource in the request body.

  4. You can verify that the authorized dataset has been added by calling the datasets.get method again.

Revoke a dataset's authorization

When you delete a dataset authorized to access another source dataset, it can take up to 24 hours for the change to fully reflect in the source dataset's access control lists (ACLs). During this time:

  • You won't be able to access the source data through the deleted dataset.
  • The deleted dataset might still appear in the source dataset's ACL and count towards any authorized dataset limits. This could prevent you from creating new authorized datasets until the ACL is updated.

To revoke the access granted to the views in an authorized dataset, remove the authorized dataset from the shared dataset's access list, as follows:

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and select the shared dataset.

  3. Click the Actions icon and select Open.

  4. In the details pane that appears, click Sharing and select the Authorize Datasets option.

    Authorize a dataset

  5. In the Authorized dataset pane that appears, find the entry for the authorized dataset in the Currently authorized datasets section.

  6. Click the delete icon next to the authorized dataset you want to remove, and then click Close.

bq

  1. Open the Cloud Shell:

    Go to Cloud Shell

  2. Write the existing metadata (including the access control list) for the shared dataset into a JSON file by using the bq show command.

    bq show --format=prettyjson PROJECT:SHARED_DATASET > FILE_PATH
    
  3. Use a text editor to remove the authorized dataset from the access section of the JSON file that was created at FILE_PATH, as follows:

      {
        "dataset": {
          "dataset": {
            "project_id": "PROJECT",
            "dataset_id": "AUTHORIZED_DATASET"
          },
          "target_types": "VIEWS"
        }
      }
    
  4. Update the shared dataset by using the bq update command. For example:

    bq update --source FILE_PATH PROJECT:SHARED_DATASET
    
  5. To verify that the authorized dataset has been removed, enter the bq show command again. For example:

    bq show --format=prettyjson PROJECT:SHARED_DATASET
    

API

  1. Get the current metadata for the shared dataset by calling the datasets.get method, as follows:

    GET https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/datasets/SHARED_DATASET
    

    The response body returns a Dataset resource that contains JSON metadata for the dataset.

  2. Remove the authorized dataset from the access section of the JSON that was returned in the Dataset resource, for example:

     {
       "dataset": {
         "dataset": {
           "project_id": "PROJECT",
           "dataset_id": "AUTHORIZED_DATASET"
         },
         "target_types": "VIEWS"
       }
     }
    
  3. Use the datasets.update method to update the dataset with the removed authorization:

    PUT https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/datasets/SHARED_DATASET
    

    Include the updated Dataset resource in the request body.

  4. You can verify that the authorized dataset has been removed by calling the datasets.get method again.

Create or update a view in an authorized dataset

To create or update a view that is in an authorized dataset, you must have the permissions for the shared dataset that are listed in Required permissions and roles, in addition to the permissions that are required to create or update a view in a standard dataset.

The following table summarizes the necessary Identity and Access Management (IAM) permissions to create or update a view that is in an authorized dataset:

Permission Resource
bigquery.datasets.get The dataset you are sharing.
bigquery.datasets.update The dataset you are sharing.
bigquery.tables.getData Any tables or views from the shared dataset that are referenced in the new view you are creating or updating.
bigquery.tables.create The authorized dataset in which you are creating a view.
bigquery.tables.update The authorized dataset in which you are updating a view.

You don't need any additional permissions to delete a view from an authorized dataset.

Query a view in an authorized dataset

To query a view in an authorized dataset, a user needs to have access to the view, but access to the shared dataset is not required.

For more information, see Authorized views.

Authorized dataset example

The following example describes how to create and use an authorized dataset.

Assume you have two datasets, named private_dataset and public_dataset. The private_dataset dataset contains a table named private_table. The public_dataset dataset contains a view named private_table_filtered. The private_table_filtered view is based on a query that returns some, but not all, of the fields in the private_table table.

You can give a user access to the data returned by the private_table_filtered view, but not all of the data in the private_table table, as follows:

  1. Grant the bigquery.dataViewer role to the user for the public_dataset dataset. This role includes the bigquery.tables.getData permission, which lets the user query the views in the public_dataset dataset. For information about how to grant a role to a user for a dataset, see Controlling access to datasets.

    The user now has permission to query views in the public_dataset, but they still cannot access the private_table table in private_dataset. If the user tries to query the private_table table directly, or if they try to access the private_table table indirectly by querying the private_table_filtered view, they get an error message similar to the following:

    Access Denied: Table PROJECT:private_dataset.private_table: User does not have permission to query table PROJECT:private_dataset.private_table.

  2. In the BigQuery page of the Google Cloud console, open the private_datasetdataset, click Sharing, and then select Authorize Datasets.

  3. In the Authorized dataset pane that appears, enter PROJECT.public_dataset in the Dataset ID field, and then click Add Authorization.

    The public_dataset dataset is added to the access control list of the private_dataset dataset, authorizing the views in the public_dataset dataset to query the data in the private_dataset dataset.

    The user can now query the private_table_filtered view in the public_dataset dataset, which indirectly accesses the private_dataset dataset, without having any permissions to directly access data in the private_dataset dataset.

Limitations

  • You can create authorized datasets in different regions, but BigQuery doesn't support cross-region queries. Therefore, we recommend that you create datasets in the same region.

What's next

  • For information about authorizing an individual view to access data in a dataset, see Authorized views.

  • For information about authorizing a table function or a user-defined function to access data in a dataset, see Authorized functions.