Create an authorized view


BigQuery is a petabyte-scale analytics data warehouse that you can use to run SQL queries over vast amounts of data in near real time.

Giving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view lets you share query results with particular users and groups without giving them access to the underlying tables. You can also use the view's SQL query to restrict the columns (fields) the users are able to query. In this tutorial, you create an authorized view.

Objectives

This tutorial shows you how to complete the following tasks:

  • Create datasets and apply access controls to them
  • Assign access controls to your project
  • Create an authorized view that restricts the data users are able to query

Costs

BigQuery is a paid product and you incur BigQuery usage in this tutorial. BigQuery offers some resources free of charge up to a specific limit. For more information, see BigQuery free operations and free tier.

Before you begin

Before you begin this tutorial, use the Google Cloud console to create or select a project.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a preexisting project, go to

    Enable the BigQuery API.

    Enable the API

  5. Optional: Enable billing for the project. If you don't want to enable billing or provide a credit card, the steps in this document still work. BigQuery provides you a sandbox to perform the steps. For more information, see Enable the BigQuery sandbox.

Create a source dataset

You begin by creating a dataset to store your source data. For this tutorial, you populate a table in your source dataset by querying the GitHub public dataset. The data in your source dataset contains information you do not want your data analysts to see. You restrict access to the data using an authorized view.

To create your source dataset:

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, select the project where you want to create the dataset.

  3. Expand the Actions option and click Create dataset.

  4. For Dataset ID, enter github_source_data.

  5. Leave the other default settings in place and click Create dataset.

SQL

Use the CREATE SCHEMA DDL statement:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE SCHEMA github_source_data;
    

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Create a source dataset to store your table.
Dataset sourceDataset = bigquery.create(DatasetInfo.of(sourceDatasetId));

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

from google.cloud import bigquery

client = bigquery.Client()
source_dataset_id = "github_source_data"
source_dataset_id_full = "{}.{}".format(client.project, source_dataset_id)


source_dataset = bigquery.Dataset(source_dataset_id_full)
# Specify the geographic location where the dataset should reside.
source_dataset.location = "US"
source_dataset = client.create_dataset(source_dataset)  # API request

After creating the source dataset, you populate a table in it using a SQL query. This query retrieves data from the GitHub public dataset.

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. Copy and paste the following query into the Editor pane.

    SELECT
      commit,
      author,
      committer,
      repo_name
    FROM
      `bigquery-public-data.github_repos.commits`
    LIMIT
      1000;
    
  3. Click More and select Query settings.

  4. For Destination, select Set a destination table for query results.

  5. For Dataset, enter PROJECT_ID.github_source_data. Replace PROJECT_ID with your project ID.

  6. For Table Id, enter github_contributors.

  7. Click Save.

  8. Click Run.

  9. When the query completes, click github_contributors and then click Preview to verify the data was written to the table.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Populate a source table
String tableQuery =
    "SELECT commit, author, committer, repo_name"
        + " FROM `bigquery-public-data.github_repos.commits`"
        + " LIMIT 1000";
QueryJobConfiguration queryConfig =
    QueryJobConfiguration.newBuilder(tableQuery)
        .setDestinationTable(TableId.of(sourceDatasetId, sourceTableId))
        .build();
bigquery.query(queryConfig);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

source_table_id = "github_contributors"
job_config = bigquery.QueryJobConfig()
job_config.destination = source_dataset.table(source_table_id)
sql = """
    SELECT commit, author, committer, repo_name
    FROM `bigquery-public-data.github_repos.commits`
    LIMIT 1000
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location="US",
    job_config=job_config,
)  # API request - starts the query

query_job.result()  # Waits for the query to finish

Create a dataset where you can store your view

After creating your source dataset, you create a new, separate dataset to store the authorized view that you share with your data analysts. In a later step, you grant the authorized view access to the data in the source dataset. Your data analysts then have access to the authorized view, but not direct access to the source data.

Authorized views should be created in a different dataset from the source data. That way, data owners can give users access to the authorized view without simultaneously granting access to the underlying data. The source data dataset and authorized view dataset must be in the same regional location.

To create a dataset to store your view:

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, select the project where you want to create the dataset.

  3. Expand the Actions option and click Create dataset.

  4. For Dataset ID, enter shared_views.

  5. Leave the other default settings in place and click Create dataset.

SQL

Use the CREATE SCHEMA DDL statement:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE SCHEMA shared_views;
    

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Create a separate dataset to store your view
Dataset sharedDataset = bigquery.create(DatasetInfo.of(sharedDatasetId));

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

shared_dataset_id = "shared_views"
shared_dataset_id_full = "{}.{}".format(client.project, shared_dataset_id)


shared_dataset = bigquery.Dataset(shared_dataset_id_full)
shared_dataset.location = "US"
shared_dataset = client.create_dataset(shared_dataset)  # API request

Create the view in the new dataset

In the new dataset, you create the view you intend to authorize. This is the view you share with your data analysts. This view is created using a SQL query that excludes the columns you do not want the data analysts to see.

For this tutorial, your shared view excludes the author's information except for the author's name, and excludes the committer's information except for the committer's name.

To create the view in the new dataset:

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. Copy and paste the following query into the Editor pane. Replace PROJECT_ID with your project ID.

    SELECT
      commit,
      author.name AS author,
      committer.name AS committer,
      repo_name
    FROM
      `PROJECT_ID.github_source_data.github_contributors`;
    
  3. Click Save > Save view.

  4. In the Save view dialog:

    1. For Project, verify your project is selected.
    2. For Dataset, enter shared_views.
    3. For Table, enter github_analyst_view.
    4. Click Save.

SQL

Use the CREATE VIEW DDL statement:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE VIEW shared_views.github_analyst_view
    AS (
      SELECT
        commit,
        author.name AS author,
        committer.name AS committer,
        repo_name
      FROM
        `PROJECT_ID.github_source_data.github_contributors`
    );
    

    Replace PROJECT_ID with your project ID.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Create the view in the new dataset
String viewQuery =
    String.format(
        "SELECT commit, author.name as author, committer.name as committer, repo_name FROM %s.%s.%s",
        projectId, sourceDatasetId, sourceTableId);

ViewDefinition viewDefinition = ViewDefinition.of(viewQuery);

Table view =
    bigquery.create(TableInfo.of(TableId.of(sharedDatasetId, sharedViewId), viewDefinition));

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

shared_view_id = "github_analyst_view"
view = bigquery.Table(shared_dataset.table(shared_view_id))
sql_template = """
    SELECT
        commit, author.name as author,
        committer.name as committer, repo_name
    FROM
        `{}.{}.{}`
"""
view.view_query = sql_template.format(
    client.project, source_dataset_id, source_table_id
)
view = client.create_table(view)  # API request

Assign a project-level IAM role to your data analysts

To query the view, your data analysts need permission to run query jobs. The bigquery.user role includes permissions to run jobs, including query jobs, within the project. If you grant a user or group the bigquery.user role at the project level, the user can create datasets and can run query jobs against tables in those datasets. The bigquery.user role does not give users permission to query data, view table data, or view table schema details for datasets the user did not create.

Assigning your data analysts the project-level bigquery.user role does not give them the ability to view or query table data in the dataset containing the tables queried by the view. The bigquery.user role also does not grant users the ability to update your views. Most individuals (data scientists, business intelligence analysts, data analysts) in an enterprise should be assigned the project-level bigquery.user role.

When you add a group to an IAM role, the email address and domain must be associated with an active Google Account or Google Apps account.

To assign the data analysts group to the bigquery.user role at the project level:

Console

  1. Open the IAM page in the Google Cloud console.

    Open the IAM page

  2. Ensure your project is selected in the project selector in the top bar.

  3. Click Grant access.

  4. In the Grant access to dialog:

    1. In the New principals box, enter the group that contains your data analysts. For example, data_analysts@example.com.
    2. In the Select a role box, search for the BigQuery User role and select it.
    3. Click Save.

Assign access controls to the dataset containing the view

For your data analysts to query the view, they need to be granted the bigquery.dataViewer role on the dataset containing the view. The bigquery.user role gives your data analysts the permissions required to create query jobs. However, they cannot successfully query the view unless they also have bigquery.dataViewer access to the dataset that contains the view.

To give your data analysts bigquery.dataViewer access to the dataset:

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, select the shared_views dataset.

  3. Click Sharing > Permissions.

  4. In the Dataset permissions pane, click Add principal.

  5. In the New principals box, enter the group that contains your data analysts (for example, data_analysts@example.com).

  6. Click Select a role and select BigQuery > BigQuery Data Viewer.

  7. Click Save.

  8. Click Close.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Assign access controls to the dataset containing the view
List<Acl> viewAcl = new ArrayList<>(sharedDataset.getAcl());
viewAcl.add(Acl.of(new Acl.Group("example-analyst-group@google.com"), Acl.Role.READER));
sharedDataset.toBuilder().setAcl(viewAcl).build().update();

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

# analyst_group_email = 'data_analysts@example.com'
access_entries = shared_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry("READER", "groupByEmail", analyst_group_email)
)
shared_dataset.access_entries = access_entries
shared_dataset = client.update_dataset(
    shared_dataset, ["access_entries"]
)  # API request

Authorize the view to access the source dataset

Once you create access controls for the dataset containing the view, you add the view as an authorized view in the source dataset. This authorization gives the view, but not your data analysts group, access to the source data.

To authorize the view to access the source data:

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, select the github_source_data dataset.

  3. Click Sharing, and then select Authorize views.

  4. In the Authorized views pane that opens, enter github_analyst_view view in the Authorized views field.

  5. Click Add Authorization.

The github_analyst_view view is now authorized to access data in the source dataset.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Authorize the view to access the source dataset
List<Acl> srcAcl = new ArrayList<>(sourceDataset.getAcl());
srcAcl.add(Acl.of(new Acl.View(view.getTableId())));
sourceDataset.toBuilder().setAcl(srcAcl).build().update();

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

access_entries = source_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry(None, "view", view.reference.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
    source_dataset, ["access_entries"]
)  # API request

Verify the configuration

When your configuration is complete, a member of your data analysts group (for example, data_analysts) can verify the configuration by querying the view.

To verify the configuration:

SQL

Have a member of your data analysts group do the following:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    SELECT
      *
    FROM
      `PROJECT_ID.shared_views.github_analyst_view`;
    

    Replace PROJECT_ID with your project ID.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Complete source code

Here is the complete source code for the tutorial for your reference.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Create a source dataset to store your table.
Dataset sourceDataset = bigquery.create(DatasetInfo.of(sourceDatasetId));

// Populate a source table
String tableQuery =
    "SELECT commit, author, committer, repo_name"
        + " FROM `bigquery-public-data.github_repos.commits`"
        + " LIMIT 1000";
QueryJobConfiguration queryConfig =
    QueryJobConfiguration.newBuilder(tableQuery)
        .setDestinationTable(TableId.of(sourceDatasetId, sourceTableId))
        .build();
bigquery.query(queryConfig);

// Create a separate dataset to store your view
Dataset sharedDataset = bigquery.create(DatasetInfo.of(sharedDatasetId));

// Create the view in the new dataset
String viewQuery =
    String.format(
        "SELECT commit, author.name as author, committer.name as committer, repo_name FROM %s.%s.%s",
        projectId, sourceDatasetId, sourceTableId);

ViewDefinition viewDefinition = ViewDefinition.of(viewQuery);

Table view =
    bigquery.create(TableInfo.of(TableId.of(sharedDatasetId, sharedViewId), viewDefinition));

// Assign access controls to the dataset containing the view
List<Acl> viewAcl = new ArrayList<>(sharedDataset.getAcl());
viewAcl.add(Acl.of(new Acl.Group("example-analyst-group@google.com"), Acl.Role.READER));
sharedDataset.toBuilder().setAcl(viewAcl).build().update();

// Authorize the view to access the source dataset
List<Acl> srcAcl = new ArrayList<>(sourceDataset.getAcl());
srcAcl.add(Acl.of(new Acl.View(view.getTableId())));
sourceDataset.toBuilder().setAcl(srcAcl).build().update();

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

# Create a source dataset
from google.cloud import bigquery

client = bigquery.Client()
source_dataset_id = "github_source_data"
source_dataset_id_full = "{}.{}".format(client.project, source_dataset_id)


source_dataset = bigquery.Dataset(source_dataset_id_full)
# Specify the geographic location where the dataset should reside.
source_dataset.location = "US"
source_dataset = client.create_dataset(source_dataset)  # API request

# Populate a source table
source_table_id = "github_contributors"
job_config = bigquery.QueryJobConfig()
job_config.destination = source_dataset.table(source_table_id)
sql = """
    SELECT commit, author, committer, repo_name
    FROM `bigquery-public-data.github_repos.commits`
    LIMIT 1000
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location="US",
    job_config=job_config,
)  # API request - starts the query

query_job.result()  # Waits for the query to finish

# Create a separate dataset to store your view
shared_dataset_id = "shared_views"
shared_dataset_id_full = "{}.{}".format(client.project, shared_dataset_id)


shared_dataset = bigquery.Dataset(shared_dataset_id_full)
shared_dataset.location = "US"
shared_dataset = client.create_dataset(shared_dataset)  # API request

# Create the view in the new dataset
shared_view_id = "github_analyst_view"
view = bigquery.Table(shared_dataset.table(shared_view_id))
sql_template = """
    SELECT
        commit, author.name as author,
        committer.name as committer, repo_name
    FROM
        `{}.{}.{}`
"""
view.view_query = sql_template.format(
    client.project, source_dataset_id, source_table_id
)
view = client.create_table(view)  # API request

# Assign access controls to the dataset containing the view
# analyst_group_email = 'data_analysts@example.com'
access_entries = shared_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry("READER", "groupByEmail", analyst_group_email)
)
shared_dataset.access_entries = access_entries
shared_dataset = client.update_dataset(
    shared_dataset, ["access_entries"]
)  # API request

# Authorize the view to access the source dataset
access_entries = source_dataset.access_entries
access_entries.append(
    bigquery.AccessEntry(None, "view", view.reference.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
    source_dataset, ["access_entries"]
)  # API request

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next