Creating table snapshots with a scheduled query

This document describes how to create monthly snapshots of a table using a service account that runs a scheduled DDL query. The document steps you through the following example:

  1. In the PROJECT project, create a service account named snapshot-bot.
  2. Grant the snapshot-bot service account the permissions that it needs to take table snapshots of the TABLE table, which is located in the DATASET dataset, and store the table snapshots in the BACKUP dataset.
  3. Write a query that creates monthly snapshots of the TABLE table and places them in the BACKUP dataset. Because you can't overwrite an existing table snapshot, the table snapshots must have unique names. To achieve this, the query appends the current date to the table snapshot names; for example, TABLE_20220521. The table snapshots expire after 40 days.
  4. Schedule the snapshot-bot service account to run the query on the first day of every month.

This document is intended for users who are familiar with BigQuery and BigQuery table snapshots.

Permissions and roles

This section describes the Identity and Access Management (IAM) permissions you need to create a service account and to schedule a query, and the predefined IAM roles that grant those permissions.

Permissions

To create a service account, you need the following permissions:

Permission Resource Resource type
iam.serviceAccounts.* PROJECT Project

To schedule a query, you need the following permissions:

Permission Resource Resource type
bigquery.jobs.create PROJECT Project

Roles

The minimum predefined role that you need to create a service account is the following:

Role Resource Resource type
roles/iam.serviceAccountAdmin PROJECT Project

The minimum predefined role that you need to schedule a query is the following:

Role Resource Resource type
roles/bigquery.user PROJECT Project

Create the snapshot-bot service account

Follow these steps to create the snapshot-bot service account and grant it the permissions that it needs to run queries in the PROJECT project:

Console

  1. In Google Cloud Console, go to the Service accounts page:

    Go to Service accounts

  2. Select the PROJECT project.

  3. Create the snapshot-bot service account:

    1. Click Create service account.

    2. In the Service account name field, enter snapshot-bot.

    3. Click Create and Continue.

  4. Give the service account the permissions that it needs to run BigQuery jobs:

    1. In the Grant this service account access to project section, select the BigQuery User role.

    2. Click Done.

BigQuery creates the service account with the email address snapshot-bot@PROJECT.iam.gserviceaccount.com.

To verify that BigQuery created the service account with the permissions that you specified, follow these steps:

Console

Verify that BigQuery has created the service account:

  1. In Cloud Console, go to the Service accounts page:

    Go to Service Accounts

  2. Select the PROJECT project.

  3. Click snapshot-bot@PROJECT.iam.gserviceaccount.com.

  4. Verify that the Service account status message indicates that your service account is active.

Verify that BigQuery has granted your service account the permission that it needs to run queries:

  1. In Cloud Console, go to to the Manage resources page:

    Go to Manage Resources

  2. Click PROJECT.

  3. Click Show Info Panel.

  4. In the Permissions tab, expand the BigQuery User node.

  5. Verify that your snapshot-bot service account is listed.

Grant permissions to the service account

This section describes how to grant the snapshot-bot service account the permissions it needs to create table snapshots of the DATASET.TABLE table in the BACKUP dataset.

Permission to take snapshots of the base table

To give the snapshot-bot service account the permissions that it needs to take snapshots of the DATASET.TABLE table, follow these steps:

Console

  1. In Cloud Console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand the PROJECT project node.

  3. Expand the DATASET dataset node.

  4. Select the TABLE table.

  5. Click Share. The Share pane opens.

  6. Click Add member. The Add members pane opens.

  7. In the New members field, enter the email address of the service account: snapshot-bot@PROJECT.iam.gserviceaccount.com.

  8. From the Select a role dropdown, select the BigQuery Data Editor role.

  9. Click Save.

  10. In the Share pane, expand the BigQuery Data Editor node and verify that the snapshot-bot@PROJECT.iam.gserviceaccount.com service account is listed.

  11. Click Close.

bq

  1. In Google Cloud Console, activate Cloud Shell:

    Activate Cloud Shell

  2. Enter the following bq add-iam-policy-binding command:

    bq add-iam-policy-binding \
    --member=serviceAccount:snapshot-bot@PROJECT.iam.gserviceaccount.com \
    --role=roles/bigquery.dataEditor DATASET.TABLE
    

BigQuery confirms that the new policy binding has been added.

Permission to create tables in the destination dataset

Give the snapshot-bot service account the permissions that it needs to create table snapshots in the BACKUP dataset as follows:

Console

  1. In Cloud Console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand the PROJECT project node.

  3. Click the menu for the BACKUP dataset node and select Open.

  4. Click Share Dataset. The Dataset permissions pane opens.

  5. In the Add members field, enter the service account's email address: snapshot-bot@PROJECT.iam.gserviceaccount.com.

  6. From the Select a role dropdown, select the BigQuery Data Editor role.

  7. Click Add.

  8. On the Dataset permissions pane, verify that the snapshot-bot@PROJECT.iam.gserviceaccount.com service account is listed under the BigQuery Data Editor node.

  9. Click Done.

Your snapshot-bot service account now has the following IAM roles for the following resources:

Role Resource Resource type Purpose
BigQuery Data Editor PROJECT:DATASET.TABLE Table Take snapshots of the TABLE table.
BigQuery Data Editor PROJECT:BACKUP Dataset Store table snapshots in the BACKUP dataset.
BigQuery User PROJECT Project Run the scheduled query that creates the table snapshots.

These roles provide the permissions that the snapshot-bot service account needs to run queries that create table snapshots of the DATASET.TABLE table and place the table snapshots in the BACKUP dataset.

Write a script with a DDL statement

This section describes how to write a script that creates a table snapshot of the DATASET.TABLE table. The script contains the CREATE SNAPSHOT TABLE DDL statement.

Console

  1. In Cloud Console, go to the BigQuery page:

    Go to BigQuery

  2. In the Editor pane, enter the following CREATE SNAPSHOT TABLE query:

    DECLARE snapshot_name STRING;
    DECLARE expiration TIMESTAMP;
    DECLARE query STRING;
    SET expiration = DATE_ADD(current_timestamp(), INTERVAL 40 DAY);
    SET snapshot_name = CONCAT("BACKUP.TABLE_",
     FORMAT_DATETIME("%Y%m%d", current_date()));
    SET query = CONCAT("CREATE SNAPSHOT TABLE ", snapshot_name,
       " CLONE PROJECT.DATASET.TABLE OPTIONS(expiration_timestamp=TIMESTAMP '",
       expiration,"');");
    EXECUTE IMMEDIATE query;
    
  3. Click Run to test the query.

  4. In the Explorer pane, open the BACKUP dataset and verify that the TABLE_YYYYMMDD table snapshot has been created, where YYYYMMDD is the current date and time; for example, TABLE_20220521.

  5. After testing, remove the table snapshot by clicking the table snapshot name, and then clicking Delete.

Schedule the monthly query

Schedule your query to run at 5:00 AM on the first day of every month as follows:

bq

  1. In Google Cloud Console, activate Cloud Shell:

    Activate Cloud Shell

  2. Enter the following bq query command:

    bq query --use_legacy_sql=false --display_name="Monthly snapshots of the TABLE table" \
    --location="us" --schedule="1 of month 05:00" \
    --project_id=PROJECT \
    'DECLARE snapshot_name STRING;
    DECLARE expiration TIMESTAMP;
    DECLARE query STRING;
    SET expiration = DATE_ADD(@run_time, INTERVAL 40 DAY);
    SET snapshot_name = CONCAT("BACKUP.TABLE_",
      FORMAT_DATETIME("%Y%m%d", @run_date));
    SET query = CONCAT("CREATE SNAPSHOT TABLE ", snapshot_name,
      " CLONE PROJECT.DATASET.TABLE OPTIONS(expiration_timestamp=TIMESTAMP \"",
      expiration, "\");");
    EXECUTE IMMEDIATE query;'
    
  3. BigQuery schedules the query.

The script in the bq command-line tool command differs from the script you ran in Cloud Console as follows:

  • The bq command-line tool script uses @run_date instead of current_date(). In a scheduled query, the @run_date parameter contains the current date. But in an interactive query, the @run_date parameter is not supported. You can use current_date() instead of @run_date for testing an interactive query before you schedule it.
  • The bq command-line tool script uses @run_time instead of current_timestamp() for a similar reason—the @run_time parameter is not supported in interactive queries, but current_timestamp() can be used instead of @run_time for testing the interactive query.
  • The bq command-line tool script uses a slash and a double quote \" instead of a single quote ' because single quotes are used to enclose the query.

Configure the service account to run the scheduled query

The query is currently scheduled to run using your credentials. Update your scheduled query to run with the snapshot-bot service account credentials as follows:

  1. Run the bq ls command to get the identity of the scheduled query job:

    bq ls --transfer_config=true --transfer_location=us
    

    The output looks similar to the following:

    name displayName dataSourceId state
    projects/12345/locations/us/transferConfigs/12345 Monthly snapshots of the TABLE table scheduled_query RUNNING
  2. Using the identifier in the name field, run the following bq update command:

    bq update --transfer_config --update_credentials \
    --service_account_name=snapshot-bot@PROJECT.iam.gserviceaccount.com \
    projects/12345/locations/us/transferConfigs/12345
    

Cloud Shell confirms that the scheduled query has been successfully updated.

Check your work

This section describes how to verify that your query is scheduled correctly, how to see if there were any errors when your query ran, and how to verify that the monthly snapshots are being created.

View the scheduled query

To verify that BigQuery has scheduled your monthly table snapshots query, follow these steps:

Console

  1. In Cloud Console, go to the Scheduled queries page:

    Go to Scheduled queries

  2. Click Monthly snapshots of the TABLE table.

  3. Click Configuration.

  4. Verify that the Query string contains your query, and that your query is scheduled to run on the first day of every month.

View the scheduled query's run history

After the scheduled query has run, you can see whether it ran successfully as follows:

Console

  1. In Cloud Console, go to the Scheduled queries page:

    Go to Scheduled queries

  2. Click the query description, Monthly snapshots of the TABLE table.

  3. Click Run history.

You can see the date and time that the query ran, whether the run was successful, and if not, what errors occurred. To see more details about a particular run, click its row in the Run history table. The Run details pane displays additional details.

View the table snapshots

To verify that the table snapshots are being created, follow these steps:

Console

  1. In Cloud Console, go to the BigQuery page:

    Go to BigQuery

  2. In the Explorer pane, open the BACKUP dataset and verify that the TABLE_YYYYMMDD snapshots have been created, where YYYYMMDD is the first day of each month.

    For example:

    • TABLE_20220601
    • TABLE_20220701
    • TABLE_20220801

What's next