Best Practices for Partitioned Tables

Viewing most recent partitions

To limit the amount of data scanned to a set of partitions, create a view that contains a filter on _PARTITIONTIME. For example, the following query can be used to create a view that includes only the most recent seven days of data from a table named mydataset.partitioned_table:

legacy SQL

#legacySQL
SELECT
  *
FROM
  mydataset.partitioned_table
WHERE
  _PARTITIONTIME BETWEEN TIMESTAMP(UTC_USEC_TO_DAY(NOW() - 7 * 60 * 60 * 24 * 1000000))
  AND TIMESTAMP(UTC_USEC_TO_DAY(CURRENT_TIMESTAMP()));

standard SQL

#standardSQL
SELECT
  *
FROM
  mydataset.partitioned_table
WHERE
  _PARTITIONTIME BETWEEN TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 * 24 HOUR),DAY)
  AND TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(),DAY);

For information about creating views, see Creating Views.

Dealing with timezone issues

The value of _PARTITIONTIME is based on the UTC date when the field is populated, which means that partitions are divided based on 12:00 AM UTC. If you want to query data based on a timezone other than UTC, you should choose one of the following options before you start loading data into your table.

There are two ways to query data in a partitioned table using a custom, non-UTC, timezone. You can either create a separate timestamp column or you can use partition decorators to load data into a specific partition.

If you use a timestamp column, you can use the default UTC-based partitioning and account for timezone differences in your SQL queries. Alternately, if you prefer to have partitions that are grouped by a timezone other than UTC, use partition decorators to load data into partitions based on a different timezone.

Tracking timezones using a timestamp

To adjust for timezones using a timestamp, create a separate column to store a timestamp that enables you to address rows by the hour or minute.

To query for data based on a timezone other than UTC, use both the _PARTITIONTIME pseudo column and your custom timestamp column. Using _PARTITIONTIME limits the table scan to the relevant partitions, and your custom timestamp further limits the results to your timezone. For example, to query data from a partitioned table (mydataset.partitioned_table) with a timestamp field [MY_TIMESTAMP_FIELD] for data added to the table between 2016-05-01 08:00:00 PST and 2016-05-05 14:00:00 PST:

legacy SQL

#legacySQL
SELECT
  field1
FROM
  mydataset.partitioned_table
WHERE
  _PARTITIONTIME BETWEEN TIMESTAMP("2016-05-01")
  AND TIMESTAMP("2016-05-06")
  AND DATE_ADD([MY_TIMESTAMP_FIELD], 8, 'HOUR') BETWEEN TIMESTAMP("2016-05-01 12:00:00")
  AND TIMESTAMP("2016-05-05 14:00:00");

standard SQL

#standardSQL
SELECT
  field1
FROM
  mydataset.partitioned_table
WHERE
  _PARTITIONTIME BETWEEN TIMESTAMP("2016-05-01")
  AND TIMESTAMP("2016-05-06")
  AND TIMESTAMP_ADD([MY_TIMESTAMP_FIELD], INTERVAL 8 HOUR) BETWEEN TIMESTAMP("2016-05-01 12:00:00")
  AND TIMESTAMP("2016-05-05 14:00:00");

Loading data using partition decorators

Partition decorators enable you to load data into a specific partition. To adjust for timezones, use a partition decorator to load data into a partition based on your preferred timezone. For example, if you are on Pacific Standard Time (PST), load all data that is generated on May 1, 2016 PST into the partition for that date by using the corresponding partition decorator:

[TABLE_NAME]$20160501

Better performance with the pseudo column

To improve query performance, use the _PARTITIONTIME pseudo column by itself on the left side of a comparison. For example, the following queries process the same amount of data, but the second example can provide better performance.

Example 1. The following query can be slower because it combines the pseudo column value with other operations in the WHERE filter.

legacy SQL

#legacySQL
/* Can be slower */
SELECT
  field1
FROM
  mydataset.table1
WHERE
  DATE_ADD(_PARTITIONTIME, 5, "DAY") > TIMESTAMP("2016-04-15")

standard SQL

#standardSQL
/* Can be slower */
SELECT
  field1
FROM
  mydataset.table1
WHERE
  TIMESTAMP_ADD(_PARTITIONTIME, INTERVAL 5 DAY) > TIMESTAMP("2016-04-15")

Example 2. The following query can perform better because it places the pseudo column by itself on the left side of the filter comparison.

legacy SQL

#legacySQL
/* Often performs better */
SELECT
  field1
FROM
  mydataset.table1
WHERE
  _PARTITIONTIME > DATE_ADD(TIMESTAMP('2016-04-15'), -5, "DAY")

standard SQL

#standardSQL
/* Often performs better */
SELECT
  field1
FROM
  mydataset.table1
WHERE
  _PARTITIONTIME > TIMESTAMP_SUB(TIMESTAMP('2016-04-15'), INTERVAL 5 DAY)

Depending on the table size, the second query, which places _PARTITIONTIME by itself on the left side of the > comparison operator, can provide better performance than the first query. Because the queries process the same amount of data, the number of bytes billed is the same in both cases.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...