Using Parameters in Cloud Storage Transfers

The BigQuery Data Transfer Service for Cloud Storage allows you to schedule recurring data loads from Cloud Storage to BigQuery.

By using runtime parameters in a Cloud Storage transfer, you can:

  • Specify how you want to partition the destination table
  • Retrieve files from Cloud Storage that match a particular date

Available runtime parameters

When setting up the Cloud Storage transfer, you can specify how you want to partition the destination table with runtime parameters.

Parameter Template Type Value
run_time Formatted timestamp In UTC time, per the schedule. For regularly scheduled transfers, run_time represents the intended time of execution. For example, if the trasnfer is set to "every 24 hours", the run_time difference between two consecutive queries will be exactly 24 hours — even though the actual execution time might vary slightly.

See TransferRun.runTime
run_date Date string The date of the run_time parameter in the following format: %Y%m%d; for example, "20180101". This format is compatible with ingestion-time partitioned tables.

Templating system

Cloud Storage transfers support runtime parameters in the destination table name via a templating syntax.

Parameter templating syntax

The templating syntax supports basic string templating and time offsetting. Parameters are referenced in the following formats:

  • {run_date}
  • {run_time[+\-offset]|"time_format"}

Parameter Purpose
run_date This parameter is replaced by the date in format YYYYMMDD.
run_time This parameter supports the following properties:


offset
Time offset expressed in hours (h), minutes (m), and seconds (s) in that order.
Days (d) are not supported.
Decimals are allowed, for example: 1.5h.

time_format
A formatting string. The most common formatting parameters are years (%Y), months (%m), and days (%d).
For partitioned tables, YYYYMMDD is the required suffix - this is equivalent to "%Y%m%d".

Read more about formatting datetime elements.

Usage notes:

  • No whitespace is allowed between run_time, offset, and time format.
  • To include literal curly braces in the string, you can escape them as ‘\{‘ and ‘\}’.
  • To include literal quotes or a vertical bar in the time_format, such as “YYYY|MM|DD”, you can escape them in the format string as: ‘\”’ or ‘\|’.

Parameter templating examples

These examples demonstrate specifying destination table names with different time formats, and offsetting the run time.
run_time (UTC) Templated parameter Output destination table name
2018-02-15 00:00:00 mytable mytable
2018-02-15 00:00:00 mytable_{run_time|"%Y%m%d"} mytable_20180215
2018-02-15 00:00:00 mytable_+25h{run_time|"%Y%m%d"} mytable_20180216
2018-02-15 00:00:00 mytable_-1h{run_time|"%Y%m%d"} mytable_20180214
2018-02-15 00:00:00 mytable_+1.5h{run_time|"%Y%m%d;%H"}
or
mytable_+90m{run_time|"%Y%m%d;%H"}
mytable_2018021501
2018-02-15 00:00:00 {run_time+97s|"%Y%m%d"}_mytable_{run_time|"%H%M%s"} 20180215_mytable_000137

Partitioning options

Cloud Storage transfers can write to partitioned or non-partitioned destination tables. There are two types of table partitioning in BigQuery:

  • Tables partitioned by ingestion time: Tables that are partitioned based on the ingestion time. For Cloud Storage transfers, the ingestion time is the transfer's run time.
  • Partitioned tables: Tables that are partitioned based on a TIMESTAMP or DATE column.

If the destination table is partitioned on a column, you identify the partitioning column when you create the destination table and specify its schema. Learn more about creating column-based partitioned tables in Creating and Using Partitioned Tables.

Partitioning examples

Runtime parameter examples

These examples show ways to combine the wildcard character and parameters for common use cases. Assume the run_time is 2018-02-15 00:00:00 (UTC) for all examples.

If you partition your data based on your local timezone, you will need to manually calculate the hour offset from UTC using the offsetting mechanism in the templating syntax.

Transferring Cloud Storage data to a non-partitioned table

This use case applies to loading new files from a Cloud Storage bucket into a non-partitioned table. This example uses a wildcard in the Cloud Storage URI and uses an ad hoc refresh transfer to pick up new files at least one hour old.

Source (Cloud Storage URI) Destination table name
Configured gs://bucket/*.csv mytable
Evaluated gs://bucket/*.csv mytable

Loading a snapshot of all data into an ingestion-time partitioned table

In this case, all data in the specified Cloud Storage URI is transferred to a table partitioned by today's date. In a refresh transfer, this configuration picks up newly added files since the last load and adds them to a particular partition.

Source (Cloud Storage URI) Destination table name
Configured gs://bucket/*.csv mytable${run_time|”%Y%m%d”}
Evaluated gs://bucket/*.csv my_table$20180215

Loading date-specific data into an ingestion-time partitioned table

This use case transfers today's Cloud Storage data to a table partitioned on today's date. This example also applies to a refresh transfer that retrieves newly added files matching a certain date and loads the data into the corresponding partition.

Source (Cloud Storage URI) Destination table name
Configured gs://bucket/events-{run_time|"%Y%m%d"}/*.csv mytable${run_time|”%Y%m%d”}
Evaluated gs://bucket/events-20180215/*.csv my_table$20180215

What's next

¿Te sirvió esta página? Envíanos tu opinión:

Enviar comentarios sobre…

¿Necesitas ayuda? Visita nuestra página de asistencia.