Querying externally partitioned data

BigQuery supports querying externally partitioned data in Avro, Parquet, ORC, JSON and CSV formats that is stored on Cloud Storage using a default hive partitioning layout. Hive partitioning support is enabled by setting the appropriate options in the table definition file. Table creation and modification is currently supported using the Cloud Console, bq command-line tool, and REST API.

For instructions on querying managed partitioned tables, see working with partitioned tables.


  • Hive partitioning support is built assuming a common source URI prefix for all URIs that ends immediately before partition encoding, which looks like gs://<bucket>/<path_to_table>/.
  • The directory structure of a hive partitioned table is assumed to have the same partitioning keys appear in the same order, with a maximum of ten partition keys per table.
  • The data must follow a default hive partitioning layout.
  • The hive partitioning keys and the columns in the underlying files cannot overlap.
  • Table creation and modification is currently limited to the Cloud Console, bq command-line tool, and REST API.
  • All limitations for querying external data sources stored on Cloud Storage apply.
  • Support is for Standard SQL only.

Supported data layouts

The data must follow a default hive partitioned layout. For instance, the following files follow the default layout, with the key/value pairs laid out as directories with an = sign as a separator, and the partition keys are always in the same order.


The common source URI prefix in this example is gs://bucket/table.

Unsupported data layouts

If the partition key names are not encoded in the directory path, partition schema detection will fail. For example, consider the following path, which does not encode the partition key names:


Files where the schema is not in a consistent order also fail detection. For example, consider the following two files with inverted partition key encodings:


Partition schema detection modes

Hive partition keys appear as normal columns when quering data from Cloud Storage. We support three modes of hive partition schema detection:

  • AUTO: key names and types are auto detected. The following types can be detected: STRING, INTEGER, DATE and TIMESTAMP.
  • STRINGS: key names are automatically inferred with STRING type.
  • CUSTOM: partition key schema is encoded in the source URI prefix.

Providing a custom partition key schema

CUSTOM detection requires encoding the schema in the source URI prefix field. Using CUSTOM allows for user-specified types for each partition key. The values must validly parse as the specified type or the query will fail.

For example, setting source_uri_prefix to: gs://my_bucket/my_table/{dt:DATE}/{val:STRING} treats val as a STRING while treating dt as a DATE, and extracting gs://my_bucket/my_table as the source URI prefix for the matched files.

See detailed instructions for the Cloud Console, bq command-line tool, and REST API.

Partition pruning

BigQuery prunes partitions when possible using query predicates on the partition keys, which allows it to avoid reading unnecessary files. This improves performance.

Requiring predicate filters on partition keys in queries

When you create an externally partitioned table, you can require the use of predicate filters on partition keys by enabling the requirePartitionFilter option under HivePartitioningOptions.

When this option is enabled, attempts to query the externally partitioned table without specifying a WHERE clause produce the following error: Cannot query over table <table_name> without a filter over column(s) <partition key names> that can be used for partition elimination.

For more information about adding the requirePartitionFilter option when you create a partitioned table, see the instructions for the bq command-line tool and REST API.

Setting HivePartitioningOptions using the Cloud Console

To set hive partitioning options in the Cloud Console, do the following steps:

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the Resources pane, select your dataset.

  3. Click the Create table button to create a new table.

  4. Select Cloud Storage from the Create table from drop-down.

Hive Partitioning Options

  1. In the Select file from Cloud Storage bucket field, enter your dataset's file path, then select its file type from the drop-down.
  2. Check the Source data partitioning box, then enter the source URI prefix in the provided field.
  3. Under Partition inference mode, choose "Automatically infer types" for AUTO or "All columns are strings" for STRINGS.
  4. Choose "External table" from the Table type drop-down.

Setting HivePartitioningOptions using the bq command-line tool

The HivePartitioningOptions are set on the ExternalDataConfiguration object while creating the table definition file.

Requesting automatic partition key type detection

bq mkdef --source_format=ORC --hive_partitioning_mode=AUTO \
--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix \
gcs_uris >/tmp/table_def_file

Requesting string-typed partition key detection

bq mkdef --source_format=PARQUET --hive_partitioning_mode=STRINGS \
--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix \
gcs_uris >/tmp/table_def_file

Providing a custom partition key schema encoded via the source_uri_prefix field

bq mkdef --source_format=NEWLINE_DELIMITED_JSON --hive_partitioning_mode=CUSTOM \
--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix/custom_schema_encoding \
gcs_uris file_schema >/tmp/table_def_file

The partition key schema is encoded immediately following the source URI prefix. Use the following format to specify --hive_partitioning_source_uri_prefix:


Requiring a partition filter

To enforce the use of a predicate filter at query time, add --require_hive_partition_filter=True to the bq mkdef commands above. This example uses AUTO partition key type detection:

bq mkdef --source_format=ORC --hive_partitioning_mode=AUTO \
--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix \
--require_hive_partition_filter=True \
gcs_uris >/tmp/table_def_file

Setting HivePartitioningOptions using the REST API

Support for hive partitioning exists by setting the HivePartitioningOptions on the ExternalDataConfiguration when creating a table definition file.

Note that when hivePartitioningOptions.mode is set to CUSTOM, you must encode the partition key schema within the sourceUriPrefix by setting hivePartitioningOptions.sourceUriPrefix to gs://bucket/path_to_table/{key1:TYPE1}/{key2:TYPE2}/{key3:TYPE3}.

To enforce the use of a predicate filter at query time, set hivePartitioningOptions.requirePartitionFilter to true.