Scheduling queries

This page describes how to schedule recurring queries in BigQuery.

Overview

You can schedule queries to run on a recurring basis. Scheduled queries must be written in standard SQL, which can include Data Definition Language (DDL) and Data Manipulation Language (DML) statements. The query string and destination table can be parameterized, allowing you to organize query results by date and time.

Before you begin

Before you create a scheduled query:

  • Scheduled queries use features of the BigQuery Data Transfer Service. Verify that you have completed all actions required in Enabling the BigQuery Data Transfer Service.

  • If you are creating the scheduled query by using the classic BigQuery web UI, allow pop-ups in your browser from bigquery.cloud.google.com, so that you can view the permissions window. You must allow the BigQuery Data Transfer Service permission to manage your scheduled query.

Required permissions

Before scheduling a query:

  • Ensure that the person creating the transfer has the following required permissions in BigQuery:

    • bigquery.transfers.update permissions to create the transfer
    • bigquery.datasets.update permissions on the target dataset

    The bigquery.admin predefined Cloud IAM role includes bigquery.transfers.update and bigquery.datasets.update permissions. For more information on Cloud IAM roles in BigQuery, see Access control.

Configuration options

Query string

The query string must be valid and written in standard SQL. Each run of a scheduled query can receive the following query parameters.

To manually test a query string with @run_time and @run_date parameters before scheduling a query, use the command line interface.

Available parameters

Parameter Standard SQL Type Value
@run_time TIMESTAMP Represented in UTC time. For regularly scheduled queries, run_time represents the intended time of execution. For example, if the scheduled query is set to "every 24 hours", the run_time difference between two consecutive queries is exactly 24 hours - even though the actual execution time might slightly vary.
@run_date DATE Represents a logical calendar date.

Example

The @run_time parameter is part of the query string in this example, which queries a public dataset named hacker_news.stories.

SELECT @run_time AS time,
  title,
  author,
  text
FROM `bigquery-public-data.hacker_news.stories`
LIMIT
  1000

Destination table

When you set up the scheduled query, if the destination table for your results doesn't exist, BigQuery attempts to create the table for you.

If you are using a DDL or DML query:

  • In the GCP Console, choose the Processing location or region. Processing location is required for DDL or DML queries that create the destination table.
  • In the classic BigQuery web UI, leave Destination table blank.

If the target table does exist, the destination table's schema might be updated based on the query results, if you add columns to the schema (ALLOW_FIELD_ADDITION) or relax a column's mode from REQUIRED to NULLABLE (ALLOW_FIELD_RELAXATION). In all other cases, table schema changes between runs cause the scheduled query to fail.

Queries can reference tables from different projects and different datasets. When configuring your scheduled query, you don't need to include the destination dataset in the table name. You specify the destination dataset separately.

Write preference

The write preference you select determines how your query results are written to an existing destination table.

  • WRITE_TRUNCATE: If the table exists, BigQuery overwrites the table data.
  • WRITE_APPEND: If the table exists, BigQuery appends the data to the table.

If you are using a DDL or DML query:

  • In the GCP Console, the write preference option will not appear.
  • In the classic BigQuery web UI, leave the Write preference blank.

Creating, truncating, or appending a destination table only happens if BigQuery is able to successfully complete the query. Creation, truncation, or append actions occur as one atomic update upon job completion.

Clustering

Scheduled queries can create clustering on new tables only, when the table is made with a DDL CREATE TABLE AS SELECT statement. See Creating a clustered table from a query result on the Using Data Definition Language statements page.

Partitioning options

Scheduled queries can create partitioned or non-partitioned destination tables. Partitioning is not available in the GCP Console, but is available in the classic BigQuery web UI, CLI and API setup methods. If you are using a DDL or DML query with partitioning, leave the Partitioning field blank.

There are two types of table partitioning in BigQuery:

  • Tables partitioned by ingestion time: Tables partitioned based on the scheduled query's run time.
  • Tables partitioned on a column: Tables that are partitioned based on a TIMESTAMP or DATE column.

For tables partitioned on a column:

  • In the classic BigQuery web UI, if the destination table will be partitioned on a column, you'll specify the column name in the Partitioning field when Setting up a scheduled query. For ingestion-time partitioned tables and non-partitioned tables, leave the Partitioning field blank.

For ingestion-time partitioned tables:

  • Indicate the date partitioning in the destination table's name. See the table name templating syntax, explained below.

Partitioning examples

  • Table with no partitioning
    • Destination table - mytable
    • Partitioning field - leave blank
  • Ingestion-time partitioned table
    • Destination table - mytable$YYYYMMDD
    • Partitioning field - leave blank
  • Column-partitioned table
    • Destination table - mytable
    • Partitioning field - name of the TIMESTAMP or DATE column used to partition the table

Available parameters

When setting up the scheduled query, you can specify how you want to partition the destination table with runtime parameters.

Parameter Template Type Value
run_time Formatted timestamp In UTC time, per the schedule. For regularly scheduled queries, run_time represents the intended time of execution. For example, if the scheduled query is set to "every 24 hours", the run_time difference between two consecutive queries is exactly 24 hours - even though the actual execution time may vary slightly.

See TransferRun.runTime.
run_date Date string The date of the run_time parameter in the following format: %Y%m%d; for example, 20180101. This format is compatible with ingestion-time partitioned tables.

Templating system

Scheduled queries support runtime parameters in the destination table name with a templating syntax.

Parameter templating syntax

The templating syntax supports basic string templating and time offsetting. Parameters are referenced in the following formats:

  • {run_date}
  • {run_time[+\-offset]|"time_format"}
Parameter Purpose
run_date This parameter is replaced by the date in format YYYYMMDD.
run_time This parameter supports the following properties:


offset
Time offset expressed in hours (h), minutes (m), and seconds (s) in that order.
Days (d) are not supported.
Decimals are allowed, for example: 1.5h.

time_format
A formatting string. The most common formatting parameters are years (%Y), months (%m), and days (%d).
For partitioned tables, YYYYMMDD is the required suffix - this is equivalent to "%Y%m%d".

Read more about formatting datetime elements.

Usage notes:
  • No whitespace is allowed between run_time, offset, and time format.
  • To include literal curly braces in the string, you can escape them as ‘\{‘ and ‘\}’.
  • To include literal quotes or a vertical bar in the time_format, such as “YYYY|MM|DD”, you can escape them in the format string as: ‘\”’ or ‘\|’.

Parameter templating examples

These examples demonstrate specifying destination table names with different time formats, and offsetting the run time.
run_time (UTC) Templated parameter Output destination table name
2018-02-15 00:00:00 mytable mytable
2018-02-15 00:00:00 mytable_{run_time|"%Y%m%d"} mytable_20180215
2018-02-15 00:00:00 mytable_{run_time+25h|"%Y%m%d"} mytable_20180216
2018-02-15 00:00:00 mytable_{run_time-1h|"%Y%m%d"} mytable_20180214
2018-02-15 00:00:00 mytable_{run_time+1.5h|"%Y%m%d;%H"}
or
mytable_{run_time+90m|"%Y%m%d;%H"}
mytable_2018021501
2018-02-15 00:00:00 {run_time+97s|"%Y%m%d"}_mytable_{run_time|"%H%M%s"} 20180215_mytable_000137

Setting up a scheduled query

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Run the query that you're interested in. When you are satisfied with your results, click Schedule query and Create new scheduled query.

    Schedule query in BigQuery web UI

  3. The scheduled query options open in the New scheduled query pane.

  4. On the New scheduled query pane:

    • For Name for the scheduled query, enter a name such as My scheduled query. The scheduled query name can be any value that allows you to easily identify the scheduled query if you need to modify it later.
    • (Optional) For Schedule options, you can leave the default value of Daily (every 24 hours, based on creation time), or click Schedule start time to change the time. You can also change the interval to Weekly, Monthly, or Custom. When selecting Custom, a Cron-like time specification is expected, for example every 3 hours. The shortest allowed period is 15 minutes. See the schedule field under TransferConfig for more valid API values.

      New scheduled query top

  5. For DDL/DML queries, you'll choose the Processing location or region.

    New scheduled query DDL/DML

  6. For a standard SQL SELECT query, provide information about the destination dataset.

    • For Dataset name, choose the appropriate destination dataset.
    • For Table name, enter the name of your destination table.
      • For a DDL or DML query, this option is not shown.
    • For Destination table write preference, choose either WRITE_TRUNCATE to overwrite the destination table or WRITE_APPEND to append data to the table.
      • For a DDL or DML query, this option is not shown.
    • (Optional) For Advanced options, if you use customer-managed encryption keys, you can select Customer-managed key here. A list of your available CMEKs will appear for you to choose from.

      New scheduled query bottom

  7. For all queries:

  8. Click Schedule.

  9. To view the status of your scheduled queries, click Scheduled queries in the navigation pane. Refresh the page to see the updated status of your scheduled queries. Click one to get more details about that scheduled query.

    List scheduled queries

Classic UI

  1. Go to the classic BigQuery web UI.

    Go to the classic BigQuery web UI

  2. Run the query that you're interested in.

    Schedule query in classic BigQuery web UI

  3. When you are satisfied with your results, click Schedule Query. The scheduled query options open underneath the query box.

  4. On the New Scheduled Query page:

    • For Destination dataset, choose the appropriate dataset.
    • For Display name, enter a name for the scheduled query such as My scheduled query. The scheduled query name can be any value that allows you to easily identify the scheduled query if you need to modify it later.
    • For Destination table:
      • For a standard SQL query, enter the name of your destination table.
      • For a DDL or DML query, leave this field blank.
    • For Write preference:
      • For a standard SQL query, choose either WRITE_TRUNCATE to overwrite the destination table or WRITE_APPEND to append data to the table.
      • For a DDL or DML query, choose Unspecified.
    • (Optional) For Partitioning field:

      • For a standard SQL query, if the destination table is a column-partitioned table, enter the column name where the table should be partitioned. Leave this field blank for ingestion-time partitioned tables and non-partitioned tables.
      • For a DDL or DML query, leave this field blank.
    • (Optional) For Destination table KMS key, if you use customer-managed encryption keys, you can enter a customer-managed encryption key here.

      New scheduled query

    • (Optional) For Schedule, you can leave the default value of Daily (every 24 hours, based on creation time), or click Edit to change the time. You can also change the interval to Weekly, Monthly, or Custom. When selecting Custom, a Cron-like time specification is expected, for example every 3 hours. The shortest allowed period is fifteen minutes. See the schedule field under TransferConfig for more valid API values.

      Query schedule

    • (Optional) Expand the Advanced section and configure notifications.

      • For Cloud Pub/Sub topic, enter your Cloud Pub/Sub topic name, for example, projects/myproject/topics/mytopic.
      • Check Send email notifications to allow email notifications of transfer run failures.

        Cloud Pub/Sub topic

  5. Click Add.

  6. To view the status of your scheduled queries, click Scheduled queries in the navigation pane. Refresh the page to see the updated status of your scheduled queries. Click one to get more details about that scheduled query.

    List scheduled queries

CLI

Option 1: Use the bq query command.

With this method, you will add the flags destination_table (or target_dataset), --schedule and --display_name options to your bq query command to create a scheduled query.

bq query \
--display_name=name \
--destination_table=table \
--schedule=interval

Where:

  • name is the display name for the scheduled query. The display name can be any value that allows you to easily identify the scheduled query if you need to modify it later.
  • table is the destination table for the query results.
    • --target_dataset is an alternative way to name the target dataset for the query results, when used with DDL/DML queries.
    • Use either --destination_table or --target_dataset, but not both.
  • interval, when used with bq query makes a query a recurring scheduled query. A schedule for how often the query should run is required. Examples:
    • --schedule='every 24 hours'
    • --schedule='every 3 hours'

Optional flags:

  • --project_id is your project ID. If --project_id isn't specified, the default project is used.

  • --replace will truncate the destination table and write new results with every run of the scheduled query.

  • --append_table will append results to the destination table.

For example, the following command creates a scheduled query named My Scheduled Query using the simple query SELECT 1 from mydataset.test. The destination table is mytable in the dataset mydataset. The scheduled query is created in the default project:

    bq query \
    --use_legacy_sql=false \
    --destination_table=mydataset.mytable \
    --display_name='My Scheduled Query' \
    --replace=true \
    'SELECT
      1
    FROM
      mydataset.test'


Option 2: Use the bq mk command.

Scheduled Queries are a kind of transfer. To schedule a query, you can use the BigQuery Data Transfer Service CLI to make a transfer configuration.

Queries must be in StandardSQL dialect to be scheduled.

Enter the bq mk command and supply the transfer creation flag --transfer_config. The following flags are also required:

  • --data_source
  • --target_dataset (Optional for DDL/DML queries.)
  • --display_name
  • --params

Optional flags:

  • --project_id is your project ID. If --project_id isn't specified, the default project is used.

  • --schedule is how often you want the query to run. If --schedule isn't specified, the default is 'every 24 hours' based on creation time.

  • For DDL/DML queries, you can also supply the --location flag to specify a particular region for processing. If --location isn't specified, the global Google Cloud Platform location is used.

bq mk \
--transfer_config \
--project_id=project_id \
--target_dataset=dataset \
--display_name=name \
--params='parameters' \
--data_source=data_source

Where:

  • dataset is the target dataset for the transfer configuration.
    • This parameter is optional for DDL/DML queries. It is required for all other queries.
  • name is the display name for the transfer configuration. The display name can be any value that allows you to easily identify the scheduled query (transfer) if you need to modify it later.
  • parameters contains the parameters for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'. For a scheduled query, you must supply the query parameter.
    • The destination_table_name_template parameter is the name of your destination table.
    • This parameter is optional for DDL/DML queries. It is required for all other queries.
    • For the write_disposition parameter, you can choose WRITE_TRUNCATE to truncate (overwrite) the destination table or WRITE_APPEND to append the query results to the destination table.
      • This parameter is optional for DDL/DML queries. It is required for all other queries.
    • (Optional) The destination_table_kms_key parameter is for customer-managed encryption keys.
  • data_source is the data source — scheduled_query.

For example, the following command creates a scheduled query transfer configuration named My Scheduled Query using the simple query SELECT 1 from mydataset.test. The destination table mytable is truncated for every write, and the target dataset is mydataset. The scheduled query is created in the default project:

bq mk \
--transfer_config \
--target_dataset=mydataset \
--display_name='My Scheduled Query' \
--params='{"query":"SELECT 1 from mydataset.test","destination_table_name_template":"mytable","write_disposition":"WRITE_TRUNCATE"}' \
--data_source=scheduled_query

The first time you run the command, you receive a message like the following:

[URL omitted] Please copy and paste the above URL into your web browser and follow the instructions to retrieve an authentication code.

Follow the instructions in the message and paste the authentication code on the command line.

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

    from google.cloud import bigquery_datatransfer_v1
    import google.protobuf.json_format

    client = bigquery_datatransfer_v1.DataTransferServiceClient()

    # TODO(developer): Set the project_id to the project that contains the
    #                  destination dataset.
    # project_id = "your-project-id"

    # TODO(developer): Set the destination dataset. The authorized user must
    #                  have owner permissions on the dataset.
    # dataset_id = "your_dataset_id"

    # TODO(developer): The first time you run this sample, set the
    # authorization code to a value from the URL:
    # https://www.gstatic.com/bigquerydatatransfer/oauthz/auth?client_id=433065040935-hav5fqnc9p9cht3rqneus9115ias2kn1.apps.googleusercontent.com&scope=https://www.googleapis.com/auth/bigquery%20https://www.googleapis.com/auth/drive&redirect_uri=urn:ietf:wg:oauth:2.0:oob
    #
    # authorization_code = "_4/ABCD-EFGHIJKLMNOP-QRSTUVWXYZ"
    #
    # You can use an empty string for authorization_code in subsequent runs of
    # this code sample with the same credentials.
    #
    # authorization_code = ""

    # Use standard SQL syntax for the query.
    query_string = """
SELECT
  CURRENT_TIMESTAMP() as current_time,
  @run_time as intended_run_time,
  @run_date as intended_run_date,
  17 as some_integer
"""

    parent = client.project_path(project_id)

    transfer_config = google.protobuf.json_format.ParseDict(
        {
            "destination_dataset_id": dataset_id,
            "display_name": "Your Scheduled Query Name",
            "data_source_id": "scheduled_query",
            "params": {
                "query": query_string,
                "destination_table_name_template": "your_table_{run_date}",
                "write_disposition": "WRITE_TRUNCATE",
                "partitioning_field": "",
            },
            "schedule": "every 24 hours",
        },
        bigquery_datatransfer_v1.types.TransferConfig(),
    )

    response = client.create_transfer_config(
        parent, transfer_config, authorization_code=authorization_code
    )

    print("Created scheduled query '{}'".format(response.name))

Setting up a manual run on historical dates

In addition to scheduling a query to run in the future, you can also trigger immediate runs manually. Triggering an immediate run would be necessary if your query uses the run_date parameter, and there were issues during a prior run.

For example, every day at 09:00 you query a source table for rows that match the current date. However, you find that data wasn't added to the source table for the last three days. In this situation, you can set the query to run on historical data within a date range that you specify. Your query is run using combinations of run_date and run-time that correspond to the dates you configured in your scheduled query.

After setting up a scheduled query, here's how you can run the query by using a historical date range:

Console

After clicking Schedule to save your scheduled query, you can click the Scheduled queries button to see the list of currently scheduled queries. Click any display name to see the query schedule's details. At the top right of the page, click Schedule backfill to specify a historical date range.

schedule backfill button

The run times chosen are all within your selected range, including the first date and excluding the last date.

set historic dates

Example 1

Your scheduled query is set to run every day 09:00 Pacific Time. You're missing data from Jan 1, Jan 2, and Jan 3. Choose the following historic date range:

Start Time = 1/1/19
End Time = 1/4/19

Your query runs using run_date and run_time parameters that correspond to the following times:

  • 1/1/19 09:00 Pacific Time
  • 1/2/19 09:00 Pacific Time
  • 1/3/19 09:00 Pacific Time

Example 2

Your scheduled query is set to run every day 23:00 Pacific Time. You're missing data from Jan 1, Jan 2, and Jan 3. Choose the following historic date ranges (later dates are chosen because UTC has a different date at 23:00 Pacific Time):

Start Time = 1/2/19
End Time = 1/5/19

Your query runs using run_date and run_time parameters that correspond to the following times:

  • 1/2/19 09:00 UTC, or 1/1/2019 23:00 Pacific Time
  • 1/3/19 09:00 UTC, or 1/2/2019 23:00 Pacific Time
  • 1/4/19 09:00 UTC, or 1/3/2019 23:00 Pacific Time

After setting up manual runs, refresh the page to see them in the list of runs.

Classic UI

After clicking Add to save your scheduled query, you'll see the details of your scheduled query displayed. Below the details, click the Start Manual Runs button to specify a historical date range.

start manual runs button

You can further refine the date range to have a start and end time, or leave the time fields as 00:00:00.

set historic dates

Example 1

If your scheduled query is set to run every day 14:00, and you apply the following historic date range:

Start Time = 2/21/2018 00:00:00 AM
End Time = 2/24/2018 00:00:00 AM

Your query runs at the following times:

  • 2/21/2018 14:00:00
  • 2/22/2018 14:00:00
  • 2/23/2018 14:00:00

Example 2

If your scheduled query is set to run every fri at 01:05 and you apply the following historic date range:

Start Time = 2/1/2018 00:00:00(a Thursday)
End Time = 2/24/2018 00:00:00 AM (also a Thursday)

Your query runs at the following times:

  • 2/2/2018 01:05:00
  • 2/9/2018 01:05:00

CLI

To manually run the query on a historical date range:

Enter the bq mk command and supply the transfer run flag --transfer_run. The following flags are also required:

  • --start_time
  • --end_time
bq mk \
--transfer_run \
--start_time='start_time' \
--end_time='end_time' \
resource_name

Where:

  • start_time and end_time are timestamps that end in Z or contain a valid time zone offset. Examples:
    • 2017-08-19T12:11:35.00Z
    • 2017-05-25T00:00:00+00:00
  • resource_name is the scheduled query's (or transfer's) Resource Name. The Resource Name is also known as the transfer configuration.

For example, the following command schedules a backfill for scheduled query resource (or transfer configuration): projects/myproject/locations/us/transferConfigs/1234a123-1234-1a23-1be9-12ab3c456de7.

  bq mk \
  --transfer_run \
  --start_time 2017-05-25T00:00:00Z \
  --end_time 2017-05-25T00:00:00Z \
  projects/myproject/locations/us/transferConfigs/1234a123-1234-1a23-1be9-12ab3c456de7

API

Use the projects.locations.transferConfigs.scheduleRun method and supply a path of the TransferConfig resource.

Quotas

A scheduled query is executed with the creator's credentials and project, as if you were executing the query yourself. A scheduled query is subject to the same BigQuery Quotas and limits as manual queries.

Pricing

Scheduled queries are priced the same as manual BigQuery queries.

Known issues and limitations

Regions

Cross-region queries are not supported, and the destination table for your scheduled query must be in the same region as the data being queried. See Dataset locations for more information about regions and multi-regions.

Google Drive

You can query Google Drive data in a scheduled query. If you're scheduling an existing query, you might need to click "Update Credentials" in the scheduled query details screen. Allow 10—20 minutes for the change to take effect. You might need to clear your browser's cache. Credentials are automatically up to date for new scheduled queries.

Update credentials

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
BigQuery
Tarvitsetko apua? Siirry tukisivullemme.