Creating an authorized view

BigQuery is a petabyte-scale analytics data warehouse that you can use to run SQL queries over vast amounts of data in near realtime.

Giving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view allows you to share query results with particular users and groups without giving them access to the underlying tables. You can also use the view's SQL query to restrict the columns (fields) the users are able to query. In this tutorial, you create an authorized view.

Objectives

In this tutorial you:

  • Create datasets and apply access controls to them
  • Assign access controls to your project
  • Create an authorized view that restricts the data users are able to query

Costs

BigQuery is a paid product and you will incur BigQuery usage costs in this tutorial. BigQuery query pricing provides the first 1 TB per month free of charge. For more information, see the Pricing page.

Before you begin

Before you begin this tutorial, use the Google Cloud Platform Console to create or select a project.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the project selector page

  3. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to Enable the BigQuery API.

    Enable the API

  4. BigQuery provides a sandbox if you do not want to provide a credit card or enable billing for your project. The steps in this topic work for a project whether or not your project has billing enabled. If you optionally want to enable billing, see Learn how to enable billing.

Introduction

In this tutorial, you create two datasets - one dataset for your source data and a second dataset for your authorized view. You populate the source dataset with data from the GitHub public dataset. You then create a view that queries a table in the source dataset.

Once you create your datasets and your view, you assign access controls to the project, to the dataset containing the view, and to the dataset containing the source data.

Giving a view access to the source dataset is also referred to as creating an authorized view. When you create an authorized view, follow these steps:

  • Create a separate dataset to store the view
  • Create the view in the new dataset
  • Assign access controls to the project
  • Assign access controls to the dataset containing the view
  • Authorize the view to access the source dataset

Create a source dataset

You begin by creating a dataset to store your source data. For this tutorial, you populate a table in your source dataset by querying the GitHub public dataset. The data in your source dataset contains information you do not want your data analysts to see. You restrict access to the data using an authorized view.

To create your source dataset:

Console

  1. Open the BigQuery web UI in the GCP Console.
    Go to the GCP Console

  2. In the navigation panel, in the Resources section, select your project and click Create dataset.

  3. For Dataset ID type github_source_data.

  4. Leave all of the other default settings in place and click Create dataset.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

from google.cloud import bigquery

client = bigquery.Client()
source_dataset_id = 'github_source_data'

source_dataset = bigquery.Dataset(client.dataset(source_dataset_id))
# Specify the geographic location where the dataset should reside.
source_dataset.location = 'US'
source_dataset = client.create_dataset(source_dataset)  # API request

After creating the source dataset, you populate a table in it using a SQL query. This query retrieves data from the GitHub public dataset.

Console

  1. Open the BigQuery web UI in the GCP Console.
    Go to the GCP Console

  2. Click Compose new query.

  3. Copy and paste the following query into the Query editor text area.

    SELECT
      commit,
      author,
      committer,
      repo_name
    FROM
      `bigquery-public-data.github_repos.commits`
    LIMIT
      1000
    
  4. Click More and select Query settings.

  5. For Destination check the box for Set a destination table for query results.

    • For Project name verify your project is selected.
    • For Dataset name verify github_source_data is selected.
    • For Table name type: github_contributors.
    • Click Save.

  6. Click Run.

  7. When the query completes, click github_contributors and then click Preview to verify the data was written to the table.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

source_table_id = 'github_contributors'
job_config = bigquery.QueryJobConfig()
job_config.destination = source_dataset.table(source_table_id)
sql = """
    SELECT commit, author, committer, repo_name
    FROM `bigquery-public-data.github_repos.commits`
    LIMIT 1000
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location='US',
    job_config=job_config)  # API request - starts the query

query_job.result()  # Waits for the query to finish

Create a separate dataset to store your view

After creating your source dataset you create a new dataset to store the view you will share with your data analysts. This view will have access to the data in the source dataset. Your data analysts will have access to the view, not the source data.

When you create the view, it must be created in a dataset separate from the source data queried by the view. Because you can assign access controls only at the dataset level, if the view is created in the same dataset as the source data, your data analysts would have access to both the view and the data.

To create a dataset to store your view:

Console

  1. Open the BigQuery web UI in the GCP Console.
    Go to the GCP Console

  2. In the navigation panel, in the Resources section, select your project and click Create dataset.

  3. For Dataset ID type shared_views.

  4. Leave all of the other default settings in place and click Create dataset.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

shared_dataset_id = 'shared_views'
shared_dataset = bigquery.Dataset(client.dataset(shared_dataset_id))
shared_dataset.location = 'US'
shared_dataset = client.create_dataset(shared_dataset)  # API request

Create the view in the new dataset

In the new dataset, you create the view you intend to authorize. This is the view you share with your data analysts. This view is created using a SQL query that excludes the columns you do not want the data analysts to see.

For this tutorial, your shared view excludes all of the author's information except for the author's name, and all of the committer's information except for the committer's name.

To create the view in the new dataset:

Console

  1. Open the BigQuery web UI in the GCP Console.
    Go to the GCP Console

  2. Click Compose new query.

  3. Copy and paste the following query into the Query editor text area. Replace project_id with your project ID.

    SELECT
      commit,
      author.name as author,
      committer.name as committer,
      repo_name
    FROM
      `project_id.github_source_data.github_contributors`
    
  4. Click More and select Query settings.

  5. Under SQL dialect select Standard. Click Save to update query settings.

  6. Click Save view.

  7. In the Save view dialog:

    • For Project name verify your project is selected.
    • For Dataset name verify shared_views is selected.
    • For Table name type: github_analyst_view.
    • Click Save.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

shared_view_id = 'github_analyst_view'
view = bigquery.Table(shared_dataset.table(shared_view_id))
sql_template = """
    SELECT
        commit, author.name as author,
        committer.name as committer, repo_name
    FROM
        `{}.{}.{}`
"""
view.view_query = sql_template.format(
    client.project, source_dataset_id, source_table_id)
view = client.create_table(view)  # API request

Assign a project-level Cloud IAM role to your data analysts

In order to query the view, your data analysts need permission to run query jobs. The bigquery.user role includes permissions to run jobs, including query jobs, within the project. If you grant a user or group the bigquery.user role at the project level, the user can create datasets and can run query jobs against tables in those datasets. The bigquery.user role does not give users permission to query data, view table data, or view table schema details for datasets the user did not create.

Assigning your data analysts the project-level bigquery.user role does not give them the ability to view or query table data in the dataset containing the tables queried by the view. Most individuals (data scientists, business intelligence analysts, data analysts) in an enterprise should be assigned the project-level bigquery.user role.

For this tutorial, your data analysts are in a group named data_analysts@example.com. This group name is for example purposes only. When you add a group to an IAM role, the email address and domain must be associated with an active Google Account or Google Apps account.

To assign the data analysts group to the bigquery.user role at the project level:

Console

  1. Open the IAM page in the Google Cloud Platform Console.

    Open the IAM page

  2. Click Select a project.

  3. Select your project and click Open.

  4. On the IAM page, click Add.

  5. In the Add members dialog:

    • For Members type the group name: data_analysts@example.com.
    • For Roles, click Select a role and choose BigQuery > BigQuery User.
    • Click Add.

Assign access controls to the dataset containing the view

In order for your data analysts to query the view, they need to be granted the bigquery.dataViewer role on the dataset containing the view. The bigquery.user role gives your data analysts the permissions required to create query jobs, but they cannot successfully query the view unless they also have at least bigquery.dataViewer access to the dataset that contains the view.

To give your data analysts bigquery.dataViewer access to the dataset:

Console

  1. In the Resources section, select the shared_views dataset and then click Share dataset.

  2. In the Dataset permissions panel, click Add members.

  3. Type data_analysts@example.com in the New members text box.

  4. Cick Select role and select BigQuery > BigQuery Data Viewer.

  5. Click Done.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# analyst_group_email = 'data_analysts@example.com'
access_entries = shared_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry('READER', 'groupByEmail', analyst_group_email)
)
shared_dataset.access_entries = access_entries
shared_dataset = client.update_dataset(
    shared_dataset, ['access_entries'])  # API request

Authorize the view to access the source dataset

Once you create access controls for the dataset containing the view, you add the view as an authorized view in the source dataset. This gives the view access to the source data, not your data analysts group.

To authorize the view to access the source data:

Console

  1. Select the github_source_data dataset from Resources and click Share dataset.

  2. In the Dataset permissions panel, click the Authorized views tab.

  3. Under Share authorized view:

    • For Select project, verify your project is selected.
    • For Select dataset, choose shared_views.
    • For Select view, type the view name: github_analyst_view.
    • Click OK.

  4. Click Add and then click Done.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

access_entries = source_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry(None, 'view', view.reference.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
    source_dataset, ['access_entries'])  # API request

Verify the configuration

When your configuration is complete, a member of the data_analysts group can verify the configuration by querying the view.

To verify the configuration:

Console

  1. Have a member of the data_analysts group go to the BigQuery web UI in the GCP Console.
    Go to the GCP Console

  2. Click Compose new query.

  3. Copy and paste the following query into the Query editor text area.

    SELECT
      *
    FROM
      `shared_views.github_analyst_view`
    

Complete source code

Here is the complete source code for the tutorial for your reference.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# Create a source dataset
from google.cloud import bigquery

client = bigquery.Client()
source_dataset_id = 'github_source_data'

source_dataset = bigquery.Dataset(client.dataset(source_dataset_id))
# Specify the geographic location where the dataset should reside.
source_dataset.location = 'US'
source_dataset = client.create_dataset(source_dataset)  # API request

# Populate a source table
source_table_id = 'github_contributors'
job_config = bigquery.QueryJobConfig()
job_config.destination = source_dataset.table(source_table_id)
sql = """
    SELECT commit, author, committer, repo_name
    FROM `bigquery-public-data.github_repos.commits`
    LIMIT 1000
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location='US',
    job_config=job_config)  # API request - starts the query

query_job.result()  # Waits for the query to finish

# Create a separate dataset to store your view
shared_dataset_id = 'shared_views'
shared_dataset = bigquery.Dataset(client.dataset(shared_dataset_id))
shared_dataset.location = 'US'
shared_dataset = client.create_dataset(shared_dataset)  # API request

# Create the view in the new dataset
shared_view_id = 'github_analyst_view'
view = bigquery.Table(shared_dataset.table(shared_view_id))
sql_template = """
    SELECT
        commit, author.name as author,
        committer.name as committer, repo_name
    FROM
        `{}.{}.{}`
"""
view.view_query = sql_template.format(
    client.project, source_dataset_id, source_table_id)
view = client.create_table(view)  # API request

# Assign access controls to the dataset containing the view
# analyst_group_email = 'data_analysts@example.com'
access_entries = shared_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry('READER', 'groupByEmail', analyst_group_email)
)
shared_dataset.access_entries = access_entries
shared_dataset = client.update_dataset(
    shared_dataset, ['access_entries'])  # API request

# Authorize the view to access the source dataset
access_entries = source_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry(None, 'view', view.reference.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
    source_dataset, ['access_entries'])  # API request

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the GCP Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Segítségére volt ez az oldal? Tudassa velünk a véleményét:

Visszajelzés küldése a következővel kapcsolatban: