When setting up a Cloud Storage or Amazon S3 transfer, the URI and the
destination table can both be parameterized,
allowing you to load data from buckets organized by date. These parameters are
referred to as _runtime parameters_
to distinguish them from query parameters.
By using runtime parameters in a transfer, you can:
- Specify how you want to partition the destination table
- Retrieve files that match a particular date
Available runtime parameters
When setting up the Cloud Storage or Amazon S3 transfer, you can specify how you want to partition the destination table by using runtime parameters.
Parameter | Template Type | Value |
---|---|---|
run_time |
Formatted timestamp | In UTC time, per the schedule. For regularly scheduled transfers, run_time represents the intended time of execution. For example, if the transfer is set to "every 24 hours", the run_time difference between two consecutive queries will be exactly 24 hours — even though the actual execution time might vary slightly.See TransferRun.runTime |
run_date |
Date string | The date of the run_time parameter in the following format: %Y%m%d ; for example, "20180101". This format is compatible with ingestion-time partitioned tables. |
Templating system
Cloud Storage and Amazon S3 transfers support runtime parameters in the destination table name via a templating syntax.
Parameter templating syntax
The templating syntax supports basic string templating and time offsetting. Parameters are referenced in the following formats:
{run_date}
{run_time[+\-offset]|"time_format"}
Parameter | Purpose |
---|---|
run_date |
This parameter is replaced by the date in format YYYYMMDD . |
run_time |
This parameter supports the following properties:
|
- No whitespace is allowed between run_time, offset, and time format.
- To include literal curly braces in the string, you can escape them as
‘\{‘ and ‘\}’
. - To include literal quotes or a vertical bar in the time_format, such as
“YYYY|MM|DD”
, you can escape them in the format string as:‘\”’
or‘\|’
.
Parameter templating examples
These examples demonstrate specifying destination table names with different time formats, and offsetting the run time.run_time (UTC) | Templated parameter | Output destination table name |
---|---|---|
2018-02-15 00:00:00 | mytable |
mytable |
2018-02-15 00:00:00 | mytable_{run_time|"%Y%m%d"} |
mytable_20180215 |
2018-02-15 00:00:00 | mytable_{run_time+25h|"%Y%m%d"} |
mytable_20180216 |
2018-02-15 00:00:00 | mytable_{run_time-1h|"%Y%m%d"} |
mytable_20180214 |
2018-02-15 00:00:00 | mytable_{run_time+1.5h|"%Y%m%d;%H"}
or mytable_{run_time+90m|"%Y%m%d;%H"} |
mytable_2018021501 |
2018-02-15 00:00:00 | {run_time+97s|"%Y%m%d"}_mytable_{run_time|"%H%M%s"} |
20180215_mytable_000137 |
Partitioning options
Cloud Storage and Amazon S3 transfers can write to partitioned or non-partitioned destination tables. There are two types of table partitioning in BigQuery:
- Tables partitioned by ingestion time: Tables that are partitioned based on the ingestion time. For Cloud Storage and Amazon S3 transfers, the ingestion time is the transfer's run time.
- Partitioned tables: Tables that are partitioned based on a
TIMESTAMP
orDATE
column.
If the destination table is partitioned on a column, you identify the partitioning column when you create the destination table and specify its schema. Learn more about creating column-based partitioned tables in Creating and using partitioned tables.
Partitioning examples
- Table with no partitioning
- Destination table -
mytable
- Destination table -
- Ingestion-time partitioned table
- Destination table -
mytable$YYYYMMDD
- Destination table -
- Column-partitioned table
- Destination table -
mytable
- Specify the partitioning column as a
TIMESTAMP
orDATE
column when you create the table's schema.
- Destination table -
Runtime parameter examples
These examples show ways to combine the wildcard character and parameters for
common use cases. Assume the run_time
is 2018-02-15 00:00:00
(UTC) for all
examples.
If you partition your data based on your local timezone, you will need to manually calculate the hour offset from UTC using the offsetting mechanism in the templating syntax.
Transferring Cloud Storage or Amazon S3 data to a non-partitioned table
This use case applies to loading new files from a Cloud Storage or Amazon S3 bucket into a non-partitioned table. This example uses a wildcard in the URI and uses an ad hoc refresh transfer to pick up new files at least one hour old.
Data source | Source URI | Destination table name |
---|---|---|
Cloud Storage | gs://bucket/*.csv | mytable |
Amazon S3 | s3://bucket/*.csv | mytable |
Loading a snapshot of all data into an ingestion-time partitioned table
In this case, all data in the specified URI is transferred to a table partitioned by today's date. In a refresh transfer, this configuration picks up newly added files since the last load and adds them to a particular partition.
Data source | Source URI | Parameterized destination table | Evaluated destination table |
---|---|---|---|
Cloud Storage | gs://bucket/*.csv | mytable${run_time|"%Y%m%d"} | my_table$20180215 |
Amazon S3 | s3://bucket/*.csv | mytable${run_time|"%Y%m%d"} | my_table$20180215 |
Loading date-specific data into an ingestion-time partitioned table
This use case transfers today's data into a table partitioned on today's date. This example also applies to a refresh transfer that retrieves newly added files matching a certain date and loads the data into the corresponding partition.
Data source | Parameterized URI | Parameterized destination table | Evaluated URI | Evaluated destination table |
---|---|---|---|---|
Cloud Storage | gs://bucket/events-{run_time|"%Y%m%d"}/*.csv | mytable${run_time|"%Y%m%d"} | gs://bucket/events-20180215/*.csv | my_table$20180215 |
Amazon S3 | s3://bucket/events-{run_time|"%Y%m%d"}/*.csv | mytable${run_time|"%Y%m%d"} | s3://bucket/events-20180215/*.csv | my_table$20180215 |
What's next
- Learn about setting up a Cloud Storage transfer
- Learn about setting up an Amazon S3 transfer
- Learn more about the BigQuery Data Transfer Service