Creating an Authorized View in BigQuery

BigQuery is a petabyte-scale analytics data warehouse that you can use to run SQL queries over vast amounts of data in near realtime.

Giving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view allows you to share query results with particular users and groups without giving them access to the underlying tables. You can also use the view's SQL query to restrict the columns (fields) the users are able to query. In this tutorial, you create an authorized view.

Objectives

In this tutorial you:

  • Create datasets and apply access controls to them
  • Assign access controls to your project
  • Create an authorized view that restricts the data users are able to query

Costs

BigQuery is a paid product and you will incur BigQuery usage costs in this tutorial. BigQuery query pricing provides the first 1 TB per month free of charge. For more information, see BigQuery pricing.

Before you begin

Before you begin this tutorial, use the Google Cloud Platform Console to create or select a project and enable billing.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Introduction

In this tutorial, you create two datasets - one dataset for your source data and a second dataset for your authorized view. You populate the source dataset with data from the GitHub public dataset. You then create a view that queries a table in the source dataset.

Once you create your datasets and your view, you assign access controls to the project, to the dataset containing the view, and to the dataset containing the source data.

Giving a view access to the source dataset is also referred to as creating an authorized view. When you create an authorized view, follow these steps:

  • Create a separate dataset to store the view
  • Create the view in the new dataset
  • Assign access controls to the project
  • Assign access controls to the dataset containing the view
  • Authorize the view to access the source dataset

Create a source dataset

You begin by creating a dataset to store your source data. For this tutorial, you populate a table in your source dataset by querying the GitHub public dataset. The data in your source dataset contains information you do not want your data analysts to see. You restrict access to the data using an authorized view.

To create your source dataset:

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your project name in the navigation, then click Create new dataset.

  3. For Dataset ID type github_source_data.

  4. Leave all of the other default settings in place and click OK.

Command-line

Use the mk command to create your dataset.

bq mk github_source_data

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

from google.cloud import bigquery

client = bigquery.Client()
source_dataset_id = 'github_source_data'

source_dataset = bigquery.Dataset(client.dataset(source_dataset_id))
# Specify the geographic location where the dataset should reside.
source_dataset.location = 'US'
source_dataset = client.create_dataset(source_dataset)  # API request

After creating the source dataset, you populate a table in it using a SQL query. This query retrieves data from the GitHub public dataset.

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the COMPOSE QUERY button.

  3. Copy and paste the following query into the New Query text area.

    #standardSQL
    SELECT
      commit,
      author,
      committer,
      repo_name
    FROM
      `bigquery-public-data.github_repos.commits`
    LIMIT
      1000
    

  4. Click Show Options.

  5. For Destination Table click Select Table.

  6. In the Select Destination Table dialog:

    • For Project verify your project is selected.
    • For Dataset verify github_source_data is selected.
    • For Table ID type: github_contributors.
    • Click Ok.

  7. Click Run Query.

  8. When the query completes, click github_contributors and then click Preview to verify the data was written to the table.

Command-line

Use the query command with the --destination_table flag to write the query results to a table in the github_source_data dataset.

bq query --destination_table=github_source_data.github_contributors --use_legacy_sql=false 'SELECT commit, author, committer, repo_name FROM
`bigquery-public-data.github_repos.commits` LIMIT 1000'

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

source_table_id = 'github_contributors'
job_config = bigquery.QueryJobConfig()
job_config.destination = source_dataset.table(source_table_id)
sql = """
    SELECT commit, author, committer, repo_name
    FROM `bigquery-public-data.github_repos.commits`
    LIMIT 1000
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location='US',
    job_config=job_config)  # API request - starts the query

query_job.result()  # Waits for the query to finish

Create a separate dataset to store your view

After creating your source dataset you create a new dataset to store the view you will share with your data analysts. This view will have access to the data in the source dataset. Your data analysts will have access to the view, not the source data.

When you create the view, it must be created in a dataset separate from the source data queried by the view. Because you can assign access controls only at the dataset level, if the view is created in the same dataset as the source data, your data analysts would have access to both the view and the data.

To create a dataset to store your view:

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your project name in the navigation, then click Create new dataset.

  3. For Dataset ID type shared_views.

  4. Leave all of the other default settings in place and click OK.

Command-line

Use the mk command to create your dataset.

bq mk shared_views

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

shared_dataset_id = 'shared_views'
shared_dataset = bigquery.Dataset(client.dataset(shared_dataset_id))
shared_dataset.location = 'US'
shared_dataset = client.create_dataset(shared_dataset)  # API request

Create the view in the new dataset

In the new dataset, you create the view you intend to authorize. This is the view you share with your data analysts. This view is created using a SQL query that excludes the columns you do not want the data analysts to see.

For this tutorial, your shared view excludes all of the author's information except for the author's name, and all of the committer's information except for the committer's name.

To create the view in the new dataset:

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the COMPOSE QUERY button.

  3. Copy and paste the following query into the New Query text area. Replace [PROJECT_ID] with your project ID.

    #standardSQL
    SELECT
      commit,
      author.name as author,
      committer.name as committer,
      repo_name
    FROM
      `[PROJECT_ID].github_source_data.github_contributors`
    

  4. Click Show Options.

  5. Uncheck Use Legacy SQL.

  6. Click Save View.

  7. In the Save View dialog:

    • For Project verify your project is selected.
    • For Dataset verify shared_views is selected.
    • For Table ID type: github_analyst_view.
    • Click Ok.

Command-line

Use the mk command with the --view flag. Replace [PROJECT_ID] with your project ID.

bq mk --use_legacy_sql=false --view='SELECT commit, author.name as author, committer.name as committer, repo_name FROM `[PROJECT_ID].github_source_data.github_contributors`' shared_views.github_analyst_view

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

shared_view_id = 'github_analyst_view'
view = bigquery.Table(shared_dataset.table(shared_view_id))
sql_template = """
    SELECT
        commit, author.name as author,
        committer.name as committer, repo_name
    FROM
        `{}.{}.{}`
"""
view.view_query = sql_template.format(
    client.project, source_dataset_id, source_table_id)
view = client.create_table(view)  # API request

Assign a project-level IAM role to your data analysts

In order to query the view, your data analysts need permission to run query jobs. The bigquery.user role includes permissions to run jobs, including query jobs, within the project. If you grant a user or group the bigquery.user role at the project level, the user can create datasets and can run query jobs against tables in those datasets. The bigquery.user role does not give users permission to query data, view table data, or view table schema details for datasets the user did not create.

Assigning your data analysts the project-level bigquery.user role does not give them the ability to view or query table data in the dataset containing the tables queried by the view. Most individuals (data scientists, business intelligence analysts, data analysts) in an enterprise should be assigned the project-level bigquery.user role.

For this tutorial, your data analysts are in a group named data_analysts@example.com. This group name is for example purposes only. When you add a group to an IAM role, the email address and domain must be associated with an active Google Account or Google Apps account.

To assign the data analysts group to the bigquery.user role at the project level:

Web UI

  1. Open the IAM page in the Google Cloud Platform Console.

    Open the IAM page

  2. Click Select a project.

  3. Select your project and click Open.

  4. On the IAM page, click Add.

  5. In the Add members dialog:

    • For Members type the group name: data_analysts@example.com.
    • For Roles, click Select a role and choose BigQuery > BigQuery User.
    • Click Add.

Command-line

  1. To add a single binding to the project's IAM policy, type the following command. Replace [PROJECT_ID] with your project ID.

    gcloud projects add-iam-policy-binding [PROJECT_ID] --member group:data_analysts@example.com --role roles/bigquery.user
    
  2. The command outputs the updated policy:

    bindings:
    - members:
      - group:data_analysts@example.com
        role: roles/bigquery.user
    

Assign access controls to the dataset containing the view

In order for your data analysts to query the view, they need READER access to the dataset containing the view. The bigquery.user role gives your data analysts the permissions required to create query jobs, but they cannot successfully query the view unless they also have at least READER access to the dataset containing the view.

To give your data analysts READER access to the dataset:

Web UI

These steps demonstrate assigning access controls to a dataset by using the classic BigQuery web UI. Currently, you cannot assign access controls by using the BigQuery web UI.

  1. Click the drop-down arrow to the right of the shared_views dataset and choose Share Dataset.

  2. In the Share Dataset dialog, for Add People, click the drop-down to the left of the field, and choose Group by e-mail.

  3. Type data_analysts@example.com in the text box.

  4. To the right of the Add People field, verify Can view is selected. "Can View" maps to the bigquery.dataViewer role at the dataset level.

  5. Click Add and then click Save changes.

Command-line

  1. When you apply access controls to a dataset using the command-line tool, the existing controls are overwritten. First, export the existing access controls to a JSON file using the show command.

    bq --format=json show shared_views >shared_views.json
    
  2. Make your changes to the "access" section of the JSON file using the READER role and groupByEmail.

    For example:

    {
     "access": [
      {
       "role": "READER",
       "specialGroup": "projectReaders"
      },
      {
       "role": "WRITER",
       "specialGroup": "projectWriters"
      },
      {
       "role": "OWNER",
       "specialGroup": "projectOwners"
      }
      {
       "role": "READER",
       "specialGroup": "allAuthenticatedUsers"
      }
      {
       "role": "READER",
       "domain": "[DOMAIN_NAME]"
      }
      {
       "role": "WRITER",
       "userByEmail": "[USER_EMAIL]"
      }
      {
       "role": "READER",
       "groupByEmail": "data_analysts@example.com"
      }
     ],
    }
    

  3. When your edits are complete, use the update command and include the JSON file using the --source flag.

    bq update --source=shared_views.json shared_views

  4. To verify your access control changes, use the show command.

    bq show shared_views

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# analyst_group_email = 'data_analysts@example.com'
access_entries = shared_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry('READER', 'groupByEmail', analyst_group_email)
)
shared_dataset.access_entries = access_entries
shared_dataset = client.update_dataset(
    shared_dataset, ['access_entries'])  # API request

Authorize the view to access the source dataset

Once you create access controls for the dataset containing the view, you add the view as an authorized view in the source dataset. This gives the view access to the source data, not your data analysts group.

To authorize the view to access the source data:

Web UI

  1. Click the drop-down arrow to the right of the github_source_data dataset and choose Share Dataset.

  2. In the Share Dataset dialog, for Add People, click the drop-down to the left of the field, and choose Authorized View.

  3. Click Select View.

  4. In the Select View dialog:

    • For Project, verify your project is selected.
    • For Dataset, choose shared_views.
    • For Table ID, type the view name: github_analyst_view.
    • Click OK.

  5. Click Add and then click Save changes.

Command-line

  1. When you apply access controls to a dataset using the CLI, the existing controls are overwritten. First, export the existing access controls to a JSON file using the show command.

    bq --format=prettyjson show github_source_data >github_source_data.json
    
  2. Add the authorized view to the "access" section of the JSON file. Replace [PROJECT_ID] with your project ID.

    For example:

    {
     "access": [
      {
       "role": "READER",
       "specialGroup": "projectReaders"
      },
      {
       "view":{
       "datasetId": "shared_views",
       "projectId": "[PROJECT_ID]",
       "tableId": "github_analyst_view"
       }
      },
      ...
     ],
     ...
    }
    

    Some text in the JSON file is omitted for brevity.

  3. When your edits are complete, use the update command and include the JSON file using the --source flag.

    bq update --source=github_source_data.json github_source_data

  4. To verify your access control changes, use the show command.

    bq show github_source_data

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

access_entries = source_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry(None, 'view', view.reference.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
    source_dataset, ['access_entries'])  # API request

Verify the configuration

When your configuration is complete, a member of the data_analysts group can verify the configuration by querying the view.

To verify the configuration:

Web UI

  1. Have a member of the data_analysts group go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the COMPOSE QUERY button.

  3. Copy and paste the following query into the New Query text area.

    #standardSQL
    SELECT
      *
    FROM
      `shared_views.github_analyst_view`
    

Command-line

Use the query command to query the view.

bq query --use_legacy_sql=false 'SELECT * FROM `shared_views.github_analyst_view`'

Complete source code

Here is the complete source code for the tutorial for your reference.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# Create a source dataset
from google.cloud import bigquery

client = bigquery.Client()
source_dataset_id = 'github_source_data'

source_dataset = bigquery.Dataset(client.dataset(source_dataset_id))
# Specify the geographic location where the dataset should reside.
source_dataset.location = 'US'
source_dataset = client.create_dataset(source_dataset)  # API request

# Populate a source table
source_table_id = 'github_contributors'
job_config = bigquery.QueryJobConfig()
job_config.destination = source_dataset.table(source_table_id)
sql = """
    SELECT commit, author, committer, repo_name
    FROM `bigquery-public-data.github_repos.commits`
    LIMIT 1000
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location='US',
    job_config=job_config)  # API request - starts the query

query_job.result()  # Waits for the query to finish

# Create a separate dataset to store your view
shared_dataset_id = 'shared_views'
shared_dataset = bigquery.Dataset(client.dataset(shared_dataset_id))
shared_dataset.location = 'US'
shared_dataset = client.create_dataset(shared_dataset)  # API request

# Create the view in the new dataset
shared_view_id = 'github_analyst_view'
view = bigquery.Table(shared_dataset.table(shared_view_id))
sql_template = """
    SELECT
        commit, author.name as author,
        committer.name as committer, repo_name
    FROM
        `{}.{}.{}`
"""
view.view_query = sql_template.format(
    client.project, source_dataset_id, source_table_id)
view = client.create_table(view)  # API request

# Assign access controls to the dataset containing the view
# analyst_group_email = 'data_analysts@example.com'
access_entries = shared_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry('READER', 'groupByEmail', analyst_group_email)
)
shared_dataset.access_entries = access_entries
shared_dataset = client.update_dataset(
    shared_dataset, ['access_entries'])  # API request

# Authorize the view to access the source dataset
access_entries = source_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry(None, 'view', view.reference.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
    source_dataset, ['access_entries'])  # API request

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...