Creating an Authorized View in BigQuery

BigQuery is a petabyte-scale analytics data warehouse that you can use to run SQL queries over vast amounts of data in near realtime.

Giving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view allows you to share query results with particular users and groups without giving them access to the underlying tables. In this tutorial, you create an authorized view.

Objectives

In this tutorial you:

  • Create datasets and views
  • Assign access controls to your project and datasets
  • Give an authorized view access to the dataset queried by the view

Costs

BigQuery is a paid product and you will incur BigQuery usage costs in this tutorial. BigQuery query pricing provides the first 1 TB per month free of charge. For more information, see BigQuery pricing.

Before you begin

Before you begin this tutorial, use the Google Cloud Platform Console to create or select a project and enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Projects page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Introduction

In this tutorial, you create two datasets - one dataset for your source data and a second dataset for your shared view. You populate the source dataset with data from the Github public dataset. You then create a view by querying a table in the source dataset.

Once you create your datasets and your view, you assign access controls to the project, to the dataset containing the view, and to the source dataset.

Giving a view access to the source dataset is also referred to as creating an authorized view. When you create an authorized view, follow these steps:

  • Create a separate dataset to store the view
  • Create the view in the new dataset
  • Assign access controls to the project
  • Assign access controls to the dataset containing the view
  • Authorize the view to access the source dataset

Create a source dataset

You begin by creating a dataset to store your source data. For this tutorial, you populate a table in your source dataset by querying the Github public dataset. The data in your source dataset contains information you do not want your data analysts to see. You restrict access to the data using an authorized view.

To create your source dataset:

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your project name in the navigation, then click Create new dataset.

  3. For Dataset ID type github_source_data.

  4. Leave all of the other default settings in place and click OK.

Command-line

Use the mk command to create your dataset.

bq mk github_source_data

After creating the source dataset, you populate a table in it using a SQL query. This query retrieves data from the Github public dataset.

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the COMPOSE QUERY button.

  3. Copy and paste the following query into the New Query text area.

    #standardSQL
    SELECT
      commit,
      author,
      committer,
      repo_name
    FROM
      `bigquery-public-data.github_repos.commits`
    LIMIT
      1000
    

  4. Click Show Options.

  5. For Destination Table click Select Table.

  6. In the Select Destination Table dialog:

    • For Project verify your project is selected.
    • For Dataset verify github_source_data is selected.
    • For Table ID type: github_contributors.
    • Click Ok.

  7. Click Run Query.

  8. When the query completes, click github_contributors and then click Preview to verify the data was written to the table.

Command-line

Use the query command with the --destination_table flag to write the query results to a table in the github_source_data dataset.

bq query --destination_table=github_source_data.github_contributors --use_legacy_sql=false 'SELECT commit, author, committer, repo_name FROM
`bigquery-public-data.github_repos.commits` LIMIT 1000'

Create a separate dataset to store the shared view

After creating your source dataset you create a new dataset to store the shared view. When you create a shared view, it must be created in a dataset separate from the source data queried by the view. To create the dataset:

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your project name in the navigation, then click Create new dataset.

  3. For Dataset ID type shared_views.

  4. Leave all of the other default settings in place and click OK.

Command-line

Use the mk command to create your dataset.

bq mk shared_views

Create the shared view in the new dataset

In the new dataset, you create the view you intend to authorize. This is the view you share with your data analysts. This view is created using a SQL query that excludes the columns you do not want the data analysts to see.

Restricting access in this manner is referred to as assigning row-level permissions to the source data. For information on assigning row-level permissions for multiple users or groups in the same view, see: Assigning row-level permissions to different users or groups using a single view.

For this tutorial, your shared view excludes all of the author's information except for the author's name, and all of the committer's information except for the committer's name.

To create the shared view:

Web UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the COMPOSE QUERY button.

  3. Copy and paste the following query into the New Query text area. Replace [PROJECT_ID] with your project ID.

    #standardSQL
    SELECT
      commit,
      author.name as author,
      committer.name as committer,
      repo_name
    FROM
      `[PROJECT_ID].github_source_data.github_contributors`
    

  4. Click Show Options.

  5. Uncheck Use Legacy SQL.

  6. Click Save View.

  7. In the Save View dialog:

    • For Project verify your project is selected.
    • For Dataset verify shared_views is selected.
    • For Table ID type: github_analyst_view.
    • Click Ok.

Command-line

Use the mk command with the --view flag. Replace [PROJECT_ID] with your project ID.

bq mk --use_legacy_sql=false --view='SELECT commit, author.name as author, committer.name as committer, repo_name FROM `[PROJECT_ID].github_source_data.github_contributors`' shared_views.github_analyst_view

Assign a project-level IAM role to your data analysts

In order to query the view, your data analysts need permission to run query jobs. The bigquery.User role includes permissions to run jobs, including queries, within the project. The bigquery.User role does not give users permissions to view data in the tables queried by the view or table schema details. Most individuals (data scientists, business intelligence analysts, data analysts) in an enterprise should be users.

For this tutorial, your data analysts are in a group named data_analysts@example.com. This group name is for example purposes only. When you add a group to an IAM role, the email address and domain must be associated with an active Google Account or Google Apps account.

To assign the data analysts group to the bigquery.User role at the project level:

Web UI

  1. Open the IAM page in the Google Cloud Platform Console.

    Open the IAM page

  2. Click Select a project.

  3. Select your project and click Open.

  4. On the IAM page, click Add.

  5. In the Add members dialog:

    • For Members type the group name: data_analysts@example.com.
    • For Roles, click Select a role and choose BIgQuery > BigQuery User.
    • Click Add.

Command-line

  1. To add a single binding to the project's IAM policy, type the following command. Replace [PROJECT_ID] with your project ID.

    gcloud projects add-iam-policy-binding [PROJECT_ID] --member group:data_analysts@example.com --role roles/bigquery.user
    
  2. The command outputs the updated policy:

    bindings:
    - members:
      - group:data_analysts@example.com
        role: roles/bigquery.user
    

Assign access controls to the dataset containing the view

In order for your data analysts to query the view, they need READER access to the dataset containing the view. The bigquery.User role gives your data analysts the permissions required to create query jobs, but they cannot successfully query a view or table unless they also have access at the dataset level.

To give your data analysts READER access to the dataset:

Web UI

  1. Click the drop-down arrow to the right of the shared_views dataset and choose Share Dataset.

  2. In the Share Dataset dialog, for Add People, click the drop-down to the left of the field, and choose Group by e-mail.

  3. Type data_analysts@example.com in the text box.

  4. To the right of the Add People field, verify Can view is selected. "Can View" maps to the bigquery.dataViewer role at the dataset level.

  5. Click Add and then click Save changes.

Command-line

  1. When you apply access controls to a dataset using the command-line tool, the existing controls are overwritten. First, export the existing access controls to a JSON file using the show command.

    bq --format=json show shared_views >shared_views.json
    
  2. Make your changes to the "access" section of the JSON file using the READER role and groupByEmail.

    For example:

    {
     "access": [
      {
       "role": "READER",
       "specialGroup": "projectReaders"
      },
      {
       "role": "WRITER",
       "specialGroup": "projectWriters"
      },
      {
       "role": "OWNER",
       "specialGroup": "projectOwners"
      }
      {
       "role": "READER",
       "specialGroup": "allAuthenticatedUsers"
      }
      {
       "role": "READER",
       "domain": "[DOMAIN_NAME]"
      }
      {
       "role": "WRITER",
       "userByEmail": "[USER_EMAIL]"
      }
      {
       "role": "READER",
       "groupByEmail": "data_analysts@example.com"
      }
     ],
    }
    

  3. When your edits are complete, use the update command and include the JSON file using the --source flag.

    bq update --source=shared_views.json shared_views

  4. To verify your access control changes, use the show command.

    bq show shared_views

Authorize the view to access the source dataset

Once you create access controls for the dataset containing the view, you add the view as an authorized view in the source dataset. This gives the view access to the source data, not your data analysts group.

To create the authorized view:

Web UI

  1. Click the drop-down arrow to the right of the github_source_data dataset and choose Share Dataset.

  2. In the Share Dataset dialog, for Add People, click the drop-down to the left of the field, and choose Authorized View.

  3. Click Select View.

  4. In the Select View dialog:

    • For Project, verify your project is selected.
    • For Dataset, choose shared_views.
    • For Table ID, type the view name: github_analyst_view.
    • Click OK.

  5. Click Add and then click Save changes.

Command-line

  1. When you apply access controls to a dataset using the CLI, the existing controls are overwritten. First, export the existing access controls to a JSON file using the show command.

    bq --format=json show github_source_data >github_source_data.json
    
  2. Add the authorized view to the "access" section of the JSON file. Replace [PROJECT_ID] with your project ID.

    For example:

    {
     "access": [
      {
       "role": "READER",
       "specialGroup": "projectReaders"
      },
      {
       "view":{
       "datasetId", "shared_views",
       "projectId": "[PROJECT_ID]",
       "tableId": "github_analyst_view"
      },
     ],
    }
    

  3. When your edits are complete, use the update command and include the JSON file using the --source flag.

    bq update --source=github_source_data.json github_source_data

  4. To verify your access control changes, use the show command.

    bq show github_source_data

Verify the configuration

When your configuration is complete, a member of the data_analysts group can verify the configuration by querying the view.

To verify the configuration:

Web UI

  1. Have a member of the data_analysts group go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click the COMPOSE QUERY button.

  3. Copy and paste the following query into the New Query text area.

    #standardSQL
    SELECT
      *
    FROM
      `shared_views.github_analyst_view`
    

Command-line

Use the query command to query the view.

bq query --use_legacy_sql=false 'SELECT * FROM `shared_views.github_analyst_view`'

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the Cloud Platform Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Send feedback about...