This page describes how to schedule recurring queries in BigQuery.
You can schedule queries to run on a recurring basis. Scheduled queries must be written in standard SQL, which can include Data Definition Language (DDL) and Data Manipulation Language (DML) statements. You can organize query results by date and time by parameterizing the query string and destination table.
Before you begin
Scheduled queries use features of BigQuery Data Transfer Service. Verify that you have completed all actions required in Enabling BigQuery Data Transfer Service.
Required permissions
Before scheduling a query:
Ensure that the person creating the transfer has the following required permissions in BigQuery:
- Either
bigquery.jobs.create
orbigquery.transfers.update
permissions to create the transfer. bigquery.jobs.create
to execute the scheduled query.bigquery.datasets.update
permissions on the target dataset.
The
bigquery.jobUser
predefined IAM role includesbigquery.jobs.create
permissions. For more information on IAM roles in BigQuery, see Predefined roles and permissions.- Either
Before modifying a scheduled query:
- Ensure that the person modifying the scheduled query has the
following required permissions in BigQuery:
bigquery.jobs.create
permissions and the person must be the creator of the schedule.bigquery.transfers.update
permissions.
Configuration options
Query string
The query string must be valid and written in standard SQL. Each run of a scheduled query can receive the following query parameters.
To manually test a query string with @run_time
and @run_date
parameters
before scheduling a query, use the bq
command-line tool.
Available parameters
Parameter | Standard SQL Type | Value |
---|---|---|
@run_time |
TIMESTAMP | Represented in UTC time. For regularly scheduled queries, run_time represents the intended time of execution. For example, if the scheduled query is set to "every 24 hours", the run_time difference between two consecutive queries is exactly 24 hours, even though the actual execution time might slightly vary. |
@run_date |
DATE | Represents a logical calendar date. |
Example
The @run_time
parameter is part of the query string in this example, which
queries a public dataset named hacker_news.stories
.
SELECT @run_time AS time, title, author, text FROM `bigquery-public-data.hacker_news.stories` LIMIT 1000
Destination table
If the destination table for your results doesn't exist when you set up the scheduled query, BigQuery attempts to create the table for you.
If you are using a DDL or DML query, then in the Cloud Console, choose the Processing location or region. Processing location is required for DDL or DML queries that create the destination table.
If the destination table does exist, BigQuery can update the destination
table's schema based on the query
results. To allow this, either add columns using ALLOW_FIELD_ADDITION
or relax
a column's mode from REQUIRED
to NULLABLE
using ALLOW_FIELD_RELAXATION
.
Otherwise, table schema changes between runs cause the scheduled query to fail.
Queries can reference tables from different projects and different datasets. When configuring your scheduled query, you don't need to include the destination dataset in the table name. You specify the destination dataset separately.
Write preference
The write preference you select determines how your query results are written to an existing destination table.
WRITE_TRUNCATE
: If the table exists, BigQuery overwrites the table data.WRITE_APPEND
: If the table exists, BigQuery appends the data to the table.
If you're using a DDL or DML query, you can't use the write preference option.
Creating, truncating, or appending a destination table only happens if BigQuery is able to successfully complete the query. Creation, truncation, or append actions occur as one atomic update upon job completion.
Clustering
Scheduled queries can create clustering on new tables only, when the table is
made with a DDL CREATE TABLE AS SELECT
statement. See
Creating a clustered table from a query result
on the Using Data Definition Language statements
page.
Partitioning options
Scheduled queries can create partitioned or non-partitioned destination tables.
Partitioning is available in the Cloud Console, bq
command-line tool, and API
setup methods. If you're using a DDL or DML query with partitioning, leave the
Destination table partitioning field blank.
There are two types of table partitioning in BigQuery:
- Tables partitioned by ingestion time: Tables partitioned based on the scheduled query's runtime.
- Tables partitioned on a column: Tables partitioned based on a
TIMESTAMP
orDATE
column.
For tables partitioned on a column, in the Cloud Console, specify the column name in the Destination table partitioning field when you set up a scheduled query.
For ingestion-time partitioned tables, leave the Destination table partitioning field blank and indicate the date partitioning in the destination table's name. For more information, see Parameter templating syntax.
Partitioning examples
- Table with no partitioning
- Destination table:
mytable
- Partitioning field: leave blank
- Destination table:
- Ingestion-time partitioned table
- Destination table:
mytable$YYYYMMDD
- Partitioning field: leave blank
- Destination table:
- Column-partitioned table
- Destination table:
mytable
- Partitioning field: name of the
TIMESTAMP
orDATE
column used to partition the table
- Destination table:
Available parameters
When setting up the scheduled query, you can specify how you want to partition the destination table with runtime parameters.
Parameter | Template Type | Value |
---|---|---|
run_time |
Formatted timestamp | In UTC time, per the schedule. For regularly scheduled queries, run_time represents the intended time of execution. For example, if the scheduled query is set to "every 24 hours", the run_time difference between two consecutive queries is exactly 24 hours, even though the actual execution time may vary slightly.See TransferRun.runTime . |
run_date |
Date string | The date of the run_time parameter in the following format: %Y%m%d ; for example, 20180101 . This format is compatible with ingestion-time partitioned tables. |
Templating system
Scheduled queries support runtime parameters in the destination table name with a templating syntax.
Parameter templating syntax
The templating syntax supports basic string templating and time offsetting. Parameters are referenced in the following formats:
{run_date}
{run_time[+\-offset]|"time_format"}
Parameter | Purpose |
---|---|
run_date |
This parameter is replaced by the date in format YYYYMMDD . |
run_time |
This parameter supports the following properties:
|
- No whitespace is allowed between run_time, offset, and time format.
- To include literal curly braces in the string, you can escape them as
‘\{‘ and ‘\}’
. - To include literal quotes or a vertical bar in the time_format, such as
“YYYY|MM|DD”
, you can escape them in the format string as:‘\”’
or‘\|’
.
Parameter templating examples
These examples demonstrate specifying destination table names with different time formats, and offsetting the run time.run_time (UTC) | Templated parameter | Output destination table name |
---|---|---|
2018-02-15 00:00:00 | mytable | mytable |
2018-02-15 00:00:00 | mytable_{run_time|"%Y%m%d"} | mytable_20180215 |
2018-02-15 00:00:00 | mytable_{run_time+25h|"%Y%m%d"} | mytable_20180216 |
2018-02-15 00:00:00 | mytable_{run_time-1h|"%Y%m%d"} | mytable_20180214 |
2018-02-15 00:00:00 | mytable_{run_time+1.5h|"%Y%m%d;%H"}
or mytable_{run_time+90m|"%Y%m%d;%H"} |
mytable_2018021501 |
2018-02-15 00:00:00 | {run_time+97s|"%Y%m%d"}_mytable_{run_time|"%H%M%s"} | 20180215_mytable_000137 |
Using a service account
You can set up a scheduled query to authenticate with a service account. A service account is a Google Account associated with your Google Cloud project. The service account can run jobs, such as scheduled queries or batch processing pipelines, with its own service credentials rather than an end user's credentials.
Read more about authenticating with service accounts in Introduction to authentication.
You can set up the scheduled query with a service account in Setting up a scheduled query, under Advanced options.
You can update an existing scheduled query with the credentials of a service account with the
bq
command-line tool. See Updating the credentials of a scheduled query.Updating a scheduled query to use service account credentials is not currently supported in the Cloud Console.
Setting up a scheduled query
Console
Open the BigQuery page in the Cloud Console.
Run the query that you're interested in. When you are satisfied with your results, click Schedule and Create new scheduled query.
The scheduled query options open in the New scheduled query pane.
On the New scheduled query pane:
- For Name for the scheduled query, enter a name such as
My scheduled query
. The scheduled query name can be any value that you can identify later if you need to modify the query. (Optional) By default, the query runs every day. You can change the default Schedule options as follows:
- To change the frequency, change the
Repeats option from Daily to the desired frequency. To
specify a custom frequency, select Custom, then enter a
Cron-like time specification in the Custom schedule field;
for example
every 3 hours
. The shortest allowed period is 15 minutes. See theschedule
field underTransferConfig
for a description of valid values. - To change the start time, select the Select start time option, enter the desired start date and time, and then save.
- To specify an end time, select the Select end time option, enter the desired end date and time, and then save.
- To save the query without a schedule, so you can run it on demand at a later time, select On Demand as the Repeats option.
- To change the frequency, change the
Repeats option from Daily to the desired frequency. To
specify a custom frequency, select Custom, then enter a
Cron-like time specification in the Custom schedule field;
for example
- For Name for the scheduled query, enter a name such as
For a standard SQL
SELECT
query, provide information about the destination dataset.- For Dataset name, choose the appropriate destination dataset.
- For Table name, enter the name of your destination table.
- For a DDL or DML query, this option is not shown.
For Destination table write preference, choose either
WRITE_TRUNCATE
to overwrite the destination table orWRITE_APPEND
to append data to the table.- For a DDL or DML query, this option is not shown.
(Optional) Advanced options:
(Optional) CMEK: If you use customer-managed encryption keys, you can select Customer-managed key under Advanced options. A list of your available CMEKs appears for you to choose from.
(Optional) Authenticate with a service account: If you have one or more service accounts associated with your Google Cloud project, you can associate a service account with your scheduled query instead of using your user credentials. Under Scheduled query credential, click the menu to see a list of your available service accounts.
Additional configurations:
(Optional) Check Send email notifications to allow email notifications of transfer run failures.
For DDL and DML queries, choose the Processing location or region.
(Optional) For Pub/Sub topic, enter your Pub/Sub topic name, for example:
projects/myproject/topics/mytopic
.
Click the Schedule button.
bq
Option 1: Use the bq query
command.
To create a scheduled query, add the options destination_table
(or
target_dataset
), --schedule
, and --display_name
to your
bq query
command.
bq query \ --display_name=name \ --destination_table=table \ --schedule=interval
Replace the following:
name
. The display name for the scheduled query. The display name can be any value that you can identify later if you need to modify the query.table
. The destination table for the query results.--target_dataset
is an alternative way to name the target dataset for the query results, when used with DDL and DML queries.- Use either
--destination_table
or--target_dataset
, but not both.
interval
. When used withbq query
, makes a query a recurring scheduled query. A schedule for how often the query should run is required. Examples:--schedule='every 24 hours'
--schedule='every 3 hours'
Optional flags:
--project_id
is your project ID. If--project_id
isn't specified, the default project is used.--replace
truncates the destination table and write new results with every run of the scheduled query.--append_table
appends results to the destination table.
For example, the following command creates a scheduled query named
My Scheduled Query
using the simple query SELECT 1 from mydataset.test
.
The destination table is mytable
in the dataset mydataset
. The scheduled
query is created in the default project:
bq query \
--use_legacy_sql=false \
--destination_table=mydataset.mytable \
--display_name='My Scheduled Query' \
--schedule='every 24 hours' \
--replace=true \
'SELECT
1
FROM
mydataset.test'
Option 2: Use the bq mk
command.
Scheduled queries are a kind of transfer. To schedule a query, you can use
the bq
command-line tool to make a transfer configuration.
Queries must be in StandardSQL dialect to be scheduled.
Enter the bq mk
command and supply the transfer creation flag
--transfer_config
. The following flags are also required:
--data_source
--target_dataset
(optional for DDL and DML queries)--display_name
--params
Optional flags:
--project_id
is your project ID. If--project_id
isn't specified, the default project is used.--schedule
is how often you want the query to run. If--schedule
isn't specified, the default is 'every 24 hours' based on creation time.For DDL and DML queries, you can also supply the
--location
flag to specify a particular region for processing. If--location
isn't specified, the global Google Cloud location is used.--service_account_name
is for authenticating your scheduled query with a service account instead of your individual user account.
bq mk \ --transfer_config \ --project_id=project_id \ --target_dataset=dataset \ --display_name=name \ --params='parameters' \ --data_source=data_source
Replace the following:
dataset
. The target dataset for the transfer configuration.- This parameter is optional for DDL and DML queries. It is required for all other queries.
name
. The display name for the transfer configuration. The display name can be any value that you can identify later if you need to modify the query.parameters
. Contains the parameters for the created transfer configuration in JSON format. For example:--params='{"param":"param_value"}'
.- For a scheduled query, you must supply the
query
parameter. - The
destination_table_name_template
parameter is the name of your destination table.- This parameter is optional for DDL and DML queries. It is required for all other queries.
- For the
write_disposition
parameter, you can chooseWRITE_TRUNCATE
to truncate (overwrite) the destination table orWRITE_APPEND
to append the query results to the destination table.- This parameter is optional for DDL and DML queries. It is required for all other queries.
- (Optional) The
destination_table_kms_key
parameter is for customer-managed encryption keys. - (Optional) The
--service_account_name
parameter is for authenticating with a service account instead of an individual user account.
- For a scheduled query, you must supply the
data_source
. The data source:scheduled_query
.
For example, the following command creates a scheduled query transfer
configuration named My Scheduled Query
using the simple query SELECT 1
from mydataset.test
. The destination table mytable
is truncated for every
write, and the target dataset is mydataset
. The scheduled query is created
in the default project, and authenticates with a service account:
bq mk \
--transfer_config \
--target_dataset=mydataset \
--display_name='My Scheduled Query' \
--params='{"query":"SELECT 1 from mydataset.test","destination_table_name_template":"mytable","write_disposition":"WRITE_TRUNCATE"}' \
--data_source=scheduled_query \
--service_account_name=abcdef-test-sa@abcdef-test.iam.gserviceaccount.com
The first time you run the command, you receive a message like the following:
[URL omitted] Please copy and paste the above URL into your web browser and
follow the instructions to retrieve an authentication code.
Follow the instructions in the message and paste the authentication code on the command line.
API
Use the projects.locations.transferConfigs.create
method and supply an instance of the
TransferConfig
resource.
Java
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Java API reference documentation.
Python
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Python API reference documentation.
Setting up a scheduled query with a service account
Java
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Java API reference documentation.
Python
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Python API reference documentation.
Viewing the status of a scheduled query
Console
To view the status of your scheduled queries, click Scheduled queries in the navigation pane. Refresh the page to see the updated status of your scheduled queries. Click a scheduled query to get more details about it.
bq
Scheduled queries are a kind of transfer. To show the details of a
scheduled query, you can first use the bq
command-line tool to list your transfer
configurations.
Enter the bq ls
command and supply the transfer flag
--transfer_config
. The following flags are also required:
--transfer_location
For example:
bq ls \
--transfer_config \
--transfer_location=us \
To show the details of a single scheduled query, enter the bq show
command and supply the transfer_path
for that
scheduled query/transfer config.
For example:
bq show \
--transfer_config \
projects/862514376110/locations/us/transferConfigs/5dd12f26-0000-262f-bc38-089e0820fe38 \
API
Use the projects.locations.transferConfigs.list
method and supply an instance of the
TransferConfig
resource.
Java
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Java API reference documentation.
Python
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Python API reference documentation.
Updating the credentials of a scheduled query
If you're scheduling an existing query, you might need to update the user credentials on the query. Credentials are automatically up to date for new scheduled queries.
Some other situations that could require updating credentials include the following:
- You wish to query Drive data in a scheduled query.
You receive an INVALID_USER error when you attempt to schedule the query:
Error code 5 : Authentication failure: User Id not found. Error code: INVALID_USERID
Console
To refresh the existing credentials on a scheduled query:
Find and view the status of a scheduled query.
Click the MORE button and select Update credentials.
Allow 10 to 20 minutes for the change to take effect. You might need to clear your browser's cache.
bq
Scheduled queries are a kind of transfer. To update the credentials of a
scheduled query, you can use the bq
command-line tool to
update the transfer configuration.
Enter the bq update
command and supply the transfer flag
--transfer_config
. The following flags are also required:
--update_credentials
Optional flag:
--service_account_name
is for authenticating your scheduled query with a service account instead of your individual user account.
For example, the following command updates a scheduled query transfer configuration to authenticate with a service account:
bq update \
--transfer_config \
--update_credentials \
--service_account_name=abcdef-test-sa@abcdef-test.iam.gserviceaccount.com projects/862514376110/locations/us/transferConfigs/5dd12f26-0000-262f-bc38-089e0820fe38 \
Java
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Java API reference documentation.
Python
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Python API reference documentation.
Setting up a manual run on historical dates
In addition to scheduling a query to run in the future, you can also trigger
immediate runs manually. Triggering an immediate run would be necessary if your
query uses the run_date
parameter, and there were issues during a prior run.
For example, every day at 09:00 you query a source table for rows that match
the current date. However, you find that data wasn't added to the source table
for the last three days. In this situation, you can set the query to run on
historical data within a date range that you specify. Your query runs using
combinations of run_date
and run_time
parameters that correspond to the dates you
configured in your scheduled query.
After setting up a scheduled query, here's how you can run the query by using a historical date range:
Console
After clicking Schedule to save your scheduled query, you can click the Scheduled queries button to see the list of currently scheduled queries. Click any display name to see the query schedule's details. At the top right of the page, click Schedule backfill to specify a historical date range.
The chosen runtimes are all within your selected range, including the first date and excluding the last date.
Example 1
Your scheduled query is set to run every day 09:00
Pacific Time. You're
missing data from January 1, January 2, and January 3. Choose the following historic
date range:
Start Time = 1/1/19
End Time = 1/4/19
Your query runs using run_date
and run_time
parameters that correspond
to the following times:
- 1/1/19 09:00 Pacific Time
- 1/2/19 09:00 Pacific Time
- 1/3/19 09:00 Pacific Time
Example 2
Your scheduled query is set to run every day 23:00
Pacific Time. You're
missing data from January 1, January 2, and January 3. Choose the following historic
date ranges (later dates are chosen because UTC has a different date at
23:00 Pacific Time):
Start Time = 1/2/19
End Time = 1/5/19
Your query runs using run_date
and run_time
parameters that correspond
to the following times:
- 1/2/19 09:00 UTC, or 1/1/2019 23:00 Pacific Time
- 1/3/19 09:00 UTC, or 1/2/2019 23:00 Pacific Time
- 1/4/19 09:00 UTC, or 1/3/2019 23:00 Pacific Time
After setting up manual runs, refresh the page to see them in the list of runs.
bq
To manually run the query on a historical date range:
Enter the bq mk
command and supply the transfer run flag
--transfer_run
. The following flags are also required:
--start_time
--end_time
bq mk \ --transfer_run \ --start_time='start_time' \ --end_time='end_time' \ resource_name
Replace the following:
start_time
andend_time
. Timestamps that end in Z or contain a valid time zone offset. Examples:- 2017-08-19T12:11:35.00Z
- 2017-05-25T00:00:00+00:00
resource_name
. The scheduled query's (or transfer's) Resource Name. The Resource Name is also known as the transfer configuration.
For example, the following command schedules a backfill for scheduled query
resource (or transfer configuration):
projects/myproject/locations/us/transferConfigs/1234a123-1234-1a23-1be9-12ab3c456de7
.
bq mk \
--transfer_run \
--start_time 2017-05-25T00:00:00Z \
--end_time 2017-05-25T00:00:00Z \
projects/myproject/locations/us/transferConfigs/1234a123-1234-1a23-1be9-12ab3c456de7
For more information, see bq mk --transfer_run
.
API
Use the projects.locations.transferConfigs.scheduleRun method and supply a path of the TransferConfig resource.
Java
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Java API reference documentation.
Python
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Python API reference documentation.
Deleting a scheduled query
Java
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Java API reference documentation.
Python
To learn how to install and use the client library for BigQuery, see the BigQuery Client Libraries. For more information, see the BigQuery Python API reference documentation.
Quotas
Scheduled queries are executed with the creator's credentials and project, as if you were executing the query yourself.
Although scheduled queries use features of BigQuery Data Transfer Service, they are not transfers and are not subject to the load jobs quota. Scheduled queries are subject to the same BigQuery quotas and limits as manual queries.
Pricing
Scheduled queries are priced the same as manual BigQuery queries.
Supported regions
Scheduled queries are supported in the following locations.
Regional locations
Region description | Region name | |
---|---|---|
Americas | ||
Las Vegas | us-west4 |
|
Los Angeles | us-west2 |
|
Montréal | northamerica-northeast1 |
|
Northern Virginia | us-east4 |
|
Oregon | us-west1 |
|
Salt Lake City | us-west3 |
|
São Paulo | southamerica-east1 |
|
South Carolina | us-east1 |
|
Europe | ||
Belgium | europe-west1 |
|
Finland | europe-north1 |
|
Frankfurt | europe-west3 |
|
London | europe-west2 |
|
Netherlands | europe-west4 |
|
Zürich | europe-west6 |
|
Asia Pacific | ||
Hong Kong | asia-east2 |
|
Jakarta | asia-southeast2 |
|
Mumbai | asia-south1 |
|
Osaka | asia-northeast2 |
|
Seoul | asia-northeast3 |
|
Singapore | asia-southeast1 |
|
Sydney | australia-southeast1 |
|
Taiwan | asia-east1 |
|
Tokyo | asia-northeast1 |
Multi-regional locations
Multi-region description | Multi-region name |
---|---|
Data centers within member states of the European Union1 | EU |
Data centers in the United States | US |
1 Data located in the EU
multi-region is not
stored in the europe-west2
(London) or europe-west6
(Zürich) data
centers.