Create table snapshots with a scheduled query
This document describes how to create monthly snapshots of a table using a service account that runs a scheduled DDL query. The document steps you through the following example:
- In the
PROJECT
project, create a service account namedsnapshot-bot
. - Grant the
snapshot-bot
service account the permissions that it needs to take table snapshots of theTABLE
table, which is located in theDATASET
dataset, and store the table snapshots in theBACKUP
dataset. - Write a query that creates monthly snapshots of
the
TABLE
table and places them in theBACKUP
dataset. Because you can't overwrite an existing table snapshot, the table snapshots must have unique names. To achieve this, the query appends the current date to the table snapshot names; for example,TABLE_20220521
. The table snapshots expire after 40 days. - Schedule the
snapshot-bot
service account to run the query on the first day of every month.
This document is intended for users who are familiar with BigQuery and BigQuery table snapshots.
Permissions and roles
This section describes the Identity and Access Management (IAM) permissions you need to create a service account and to schedule a query, and the predefined IAM roles that grant those permissions.
Permissions
To work with a service account, you need the following permissions:
Permission | Resource | Resource type |
---|---|---|
iam.serviceAccounts.*
|
PROJECT |
Project |
To schedule a query, you need the following permission:
Permission | Resource | Resource type |
---|---|---|
bigquery.jobs.create
|
PROJECT |
Project |
Roles
The predefined roles that provide the permissions that are required to work with a service account are as follows:
Role | Resource | Resource type |
---|---|---|
Any of the following:roles/iam.serviceAccountAdmin roles/editor roles/owner
|
PROJECT |
Project |
The predefined BigQuery roles that provide the permissions that are required to schedule a query are as follows:
Role | Resource | Resource type |
---|---|---|
Any of the following:roles/bigquery.user roles/bigquery.jobuser roles/bigquery.admin `
|
PROJECT |
Project |
Create the snapshot-bot
service account
Follow these steps to create the snapshot-bot
service account
and grant it the
permissions that it needs to run
queries in the PROJECT
project:
Console
In Google Cloud console, go to the Service accounts page:
Select the
PROJECT
project.Create the
snapshot-bot
service account:Click Create service account.
In the Service account name field, enter snapshot-bot.
Click Create and Continue.
Give the service account the permissions that it needs to run BigQuery jobs:
In the Grant this service account access to project section, select the BigQuery User role.
Click Done.
BigQuery creates the service account with the email address
snapshot-bot@PROJECT.iam.gserviceaccount.com
.
To verify that BigQuery created the service account with the permissions that you specified, follow these steps:
Console
Verify that BigQuery has created the service account:
In Google Cloud console, go to the Service accounts page:
Select the
PROJECT
project.Click snapshot-bot@PROJECT.iam.gserviceaccount.com.
Verify that the Service account status message indicates that your service account is active.
Verify that BigQuery has granted your service account the permission that it needs to run queries:
In Google Cloud console, go to to the Manage resources page:
Click
PROJECT
.Click Show Info Panel.
In the Permissions tab, expand the BigQuery User node.
Verify that your snapshot-bot service account is listed.
Grant permissions to the service account
This section describes how to grant the snapshot-bot
service account the
permissions it needs to create table snapshots of the DATASET.TABLE
table
in the BACKUP
dataset.
Permission to take snapshots of the base table
To give the snapshot-bot
service account the permissions that it needs to
take snapshots of the DATASET.TABLE
table, follow these steps:
Console
In Google Cloud console, open the BigQuery page.
In the Explorer pane, expand the
PROJECT
project node.Expand the DATASET dataset node.
Select the TABLE table.
Click Share. The Share pane opens.
Click Add Principal. The Grant access pane opens.
In New principals, enter the email address of the service account: snapshot-bot@PROJECT.iam.gserviceaccount.com.
From the Select a role dropdown, select the BigQuery Data Editor role.
Click Save.
In the Share pane, expand the BigQuery Data Editor node and verify that the snapshot-bot@PROJECT.iam.gserviceaccount.com service account is listed.
Click Close.
bq
In Google Cloud console, activate Cloud Shell:
Enter the following
bq add-iam-policy-binding
command:bq add-iam-policy-binding \ --member=serviceAccount:snapshot-bot@PROJECT.iam.gserviceaccount.com \ --role=roles/bigquery.dataEditor DATASET.TABLE
BigQuery confirms that the new policy binding has been added.
Permission to create tables in the destination dataset
Give the snapshot-bot
service account the permissions that it needs to create
table snapshots in the BACKUP
dataset as follows:
Console
In Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand the
PROJECT
project node.Click the menu for the BACKUP dataset node and select Open.
Click Share Dataset. The Dataset permissions pane opens.
In the Add members field, enter the service account's email address: snapshot-bot@PROJECT.iam.gserviceaccount.com.
From the Select a role dropdown, select the BigQuery Data Owner role.
Click Add.
On the Dataset permissions pane, verify that the snapshot-bot@PROJECT.iam.gserviceaccount.com service account is listed under the BigQuery Data Owner node.
Click Done.
Your snapshot-bot
service account now has the following IAM
roles for the following resources:
Role | Resource | Resource type | Purpose |
---|---|---|---|
BigQuery Data Editor | PROJECT:DATASET.TABLE |
Table | Take snapshots of the TABLE table. |
BigQuery Data Owner | PROJECT:BACKUP |
Dataset | Create and delete table snapshots in the BACKUP dataset. |
BigQuery User | PROJECT |
Project | Run the scheduled query that creates the table snapshots. |
These roles provide the permissions that the snapshot-bot
service account
needs to run queries that create table snapshots of the DATASET.TABLE
table
and place the table snapshots in the BACKUP
dataset.
Write a multi-statement query
This section describes how to write a
multi-statement query
that creates a table snapshot
of the DATASET.TABLE
table by using the
CREATE SNAPSHOT TABLE
DDL statement.
The snapshot is saved in the BACKUP
dataset and it expires after one day.
-- Declare variables DECLARE snapshot_name STRING; DECLARE expiration TIMESTAMP; DECLARE query STRING; -- Set variables SET expiration = DATE_ADD(current_timestamp(), INTERVAL 1 DAY); SET snapshot_name = CONCAT( "BACKUP.TABLE_", FORMAT_DATETIME('%Y%m%d', current_date())); -- Construct the query to create the snapshot SET query = CONCAT( "CREATE SNAPSHOT TABLE ", snapshot_name, " CLONE mydataset.mytable OPTIONS(expiration_timestamp = TIMESTAMP '", expiration, "');"); -- Run the query EXECUTE IMMEDIATE query;
Schedule the monthly query
Schedule your query to run at 5:00 AM on the first day of every month as follows:
bq
In Google Cloud console, activate Cloud Shell:
Enter the following
bq query
command:bq query --use_legacy_sql=false --display_name="Monthly snapshots of the TABLE table" \ --location="us" --schedule="1 of month 05:00" \ --project_id=PROJECT \ 'DECLARE snapshot_name STRING; DECLARE expiration TIMESTAMP; DECLARE query STRING; SET expiration = DATE_ADD(@run_time, INTERVAL 40 DAY); SET snapshot_name = CONCAT("BACKUP.TABLE_", FORMAT_DATETIME("%Y%m%d", @run_date)); SET query = CONCAT("CREATE SNAPSHOT TABLE ", snapshot_name, " CLONE PROJECT.DATASET.TABLE OPTIONS(expiration_timestamp=TIMESTAMP \"", expiration, "\");"); EXECUTE IMMEDIATE query;'
BigQuery schedules the query.
The multi-statement query in the bq command-line tool command differs from the query you ran in Google Cloud console as follows:
- The bq command-line tool query uses
@run_date
instead ofcurrent_date()
. In a scheduled query, the@run_date
parameter contains the current date. But in an interactive query, the@run_date
parameter is not supported. You can usecurrent_date()
instead of@run_date
for testing an interactive query before you schedule it. - The bq command-line tool query uses
@run_time
instead ofcurrent_timestamp()
for a similar reason—the@run_time
parameter is not supported in interactive queries, butcurrent_timestamp()
can be used instead of@run_time
for testing the interactive query. - The bq command-line tool query uses a slash and a double quote
\"
instead of a single quote'
because single quotes are used to enclose the query.
Configure the service account to run the scheduled query
The query is currently scheduled to run using your credentials.
Update your scheduled query to run with the snapshot-bot
service account
credentials as follows:
Run the
bq ls
command to get the identity of the scheduled query job:bq ls --transfer_config=true --transfer_location=us
The output looks similar to the following:
name
displayName
dataSourceId
state
projects/12345/locations/us/transferConfigs/12345
Monthly snapshots of the TABLE table
scheduled_query
RUNNING
Using the identifier in the
name
field, run the followingbq update
command:bq update --transfer_config --update_credentials \ --service_account_name=snapshot-bot@PROJECT.iam.gserviceaccount.com \ projects/12345/locations/us/transferConfigs/12345
Cloud Shell confirms that the scheduled query has been successfully updated.
Check your work
This section describes how to verify that your query is scheduled correctly, how to see if there were any errors when your query ran, and how to verify that the monthly snapshots are being created.
View the scheduled query
To verify that BigQuery has scheduled your monthly table snapshots query, follow these steps:
Console
In Google Cloud console, go to the Scheduled queries page:
Click Monthly snapshots of the TABLE table.
Click Configuration.
Verify that the Query string contains your query, and that your query is scheduled to run on the first day of every month.
View the scheduled query's run history
After the scheduled query has run, you can see whether it ran successfully as follows:
Console
In Google Cloud console, go to the Scheduled queries page:
Click the query description, Monthly snapshots of the TABLE table.
Click Run history.
You can see the date and time that the query ran, whether the run was successful, and if not, what errors occurred. To see more details about a particular run, click its row in the Run history table. The Run details pane displays additional details.
View the table snapshots
To verify that the table snapshots are being created, follow these steps:
Console
In Google Cloud console, go to the BigQuery page:
In the Explorer pane, open the
BACKUP
dataset and verify that theTABLE_YYYYMMDD
snapshots have been created, whereYYYYMMDD
is the first day of each month.For example:
TABLE_20220601
TABLE_20220701
TABLE_20220801
What's next
- For more information about table snapshots, see Working with table snapshots.
- For more information about scheduling queries, see Scheduling queries.
- For more information about Google Cloud service accounts, see Service accounts.