Stay organized with collections Save and categorize content based on your preferences.

Data definition language (DDL) statements in GoogleSQL

Data definition language (DDL) statements let you create and modify BigQuery resources using GoogleSQL query syntax. You can use DDL commands to create, alter, and delete resources, such as tables, table clones, table snapshots, views, user-defined functions (UDFs), and row-level access policies.

Required permissions

To create a job that runs a DDL statement, you must have the bigquery.jobs.create permission for the project where you are running the job. Each DDL statement also requires specific permissions on the affected resources, which are documented under each statement.

IAM roles

The predefined IAM roles bigquery.user, bigquery.jobUser, and bigquery.admin include the required bigquery.jobs.create permission.

For more information about IAM roles in BigQuery, see Predefined roles and permissions or the IAM permissions reference.

Running DDL statements

You can run DDL statements by using the Google Cloud console, by using the bq command-line tool, by calling the jobs.query REST API, or programmatically using the BigQuery API client libraries.

Console

  1. Go to the BigQuery page in the Google Cloud console.

    Go to BigQuery

  2. Click Compose new query.

    Compose new query.

  3. Enter the DDL statement into the Query editor text area. For example:

     CREATE TABLE mydataset.newtable ( x INT64 )
     

  4. Click Run.

bq

Enter the bq query command and supply the DDL statement as the query parameter. Set the use_legacy_sql flag to false.

bq query --use_legacy_sql=false \
  'CREATE TABLE mydataset.newtable ( x INT64 )'

API

Call the jobs.query method and supply the DDL statement in the request body's query property.

DDL functionality extends the information returned by a Jobs resource. statistics.query.statementType includes the following additional values for DDL support:

  • CREATE_TABLE
  • CREATE_TABLE_AS_SELECT
  • DROP_TABLE
  • CREATE_VIEW
  • DROP_VIEW

statistics.query has 2 additional fields:

  • ddlOperationPerformed: The DDL operation performed, possibly dependent on the existence of the DDL target. Current values include:
    • CREATE: The query created the DDL target.
    • SKIP: No-op. Examples — CREATE TABLE IF NOT EXISTS was submitted, and the table exists. Or DROP TABLE IF EXISTS was submitted, and the table does not exist.
    • REPLACE: The query replaced the DDL target. Example — CREATE OR REPLACE TABLE was submitted, and the table already exists.
    • DROP: The query deleted the DDL target.
  • ddlTargetTable: When you submit a CREATE TABLE/VIEW statement or a DROP TABLE/VIEW statement, the target table is returned as an object with 3 fields:
    • "projectId": string
    • "datasetId": string
    • "tableId": string

Java

Call the BigQuery.create() method to start a query job. Call the Job.waitFor() method to wait for the DDL query to finish.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;

// Sample to create a view using DDL
public class DDLCreateView {

  public static void runDDLCreateView() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetId = "MY_DATASET_ID";
    String tableId = "MY_VIEW_ID";
    String ddl =
        "CREATE VIEW "
            + "`"
            + projectId
            + "."
            + datasetId
            + "."
            + tableId
            + "`"
            + " OPTIONS("
            + " expiration_timestamp=TIMESTAMP_ADD("
            + " CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),"
            + " friendly_name=\"new_view\","
            + " description=\"a view that expires in 2 days\","
            + " labels=[(\"org_unit\", \"development\")]"
            + " )"
            + " AS SELECT name, state, year, number"
            + " FROM `bigquery-public-data.usa_names.usa_1910_current`"
            + " WHERE state LIKE 'W%'`";
    ddlCreateView(ddl);
  }

  public static void ddlCreateView(String ddl) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration config = QueryJobConfiguration.newBuilder(ddl).build();

      // create a view using query and it will wait to complete job.
      Job job = bigquery.create(JobInfo.of(config));
      job = job.waitFor();
      if (job.isDone()) {
        System.out.println("View created successfully");
      } else {
        System.out.println("View was not created");
      }
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("View was not created. \n" + e.toString());
    }
  }
}

Node.js

// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function ddlCreateView() {
  // Creates a view via a DDL query

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const projectId = "my_project"
  // const datasetId = "my_dataset"
  // const tableId = "my_new_view"

  const query = `
  CREATE VIEW \`${projectId}.${datasetId}.${tableId}\`
  OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(
          CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
      friendly_name="new_view",
      description="a view that expires in 2 days",
      labels=[("org_unit", "development")]
  )
  AS SELECT name, state, year, number
      FROM \`bigquery-public-data.usa_names.usa_1910_current\`
      WHERE state LIKE 'W%'`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);

  job.on('complete', metadata => {
    console.log(`Created new view ${tableId} via job ${metadata.id}`);
  });
}

Python

Call the Client.query() method to start a query job. Call the QueryJob.result() method to wait for the DDL query to finish.

# from google.cloud import bigquery
# project = 'my-project'
# dataset_id = 'my_dataset'
# table_id = 'new_view'
# client = bigquery.Client(project=project)

sql = """
CREATE VIEW `{}.{}.{}`
OPTIONS(
    expiration_timestamp=TIMESTAMP_ADD(
        CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
    friendly_name="new_view",
    description="a view that expires in 2 days",
    labels=[("org_unit", "development")]
)
AS SELECT name, state, year, number
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state LIKE 'W%'
""".format(
    project, dataset_id, table_id
)

job = client.query(sql)  # API request.
job.result()  # Waits for the query to finish.

print(
    'Created new view "{}.{}.{}".'.format(
        job.destination.project,
        job.destination.dataset_id,
        job.destination.table_id,
    )
)

CREATE SCHEMA statement

Creates a new dataset.

Syntax

CREATE SCHEMA [ IF NOT EXISTS ]
[project_name.]dataset_name
[DEFAULT COLLATE collate_specification]
[OPTIONS(schema_option_list)]

Arguments

  • IF NOT EXISTS: If any dataset exists with the same name, the CREATE statement has no effect. Cannot appear with OR REPLACE.

  • DEFAULT COLLATE collate_specification: When a new table is created in the dataset, the table inherits a default collation specification unless a collation specification is explicitly specified for a column.

    If you remove or change this collation specification later with the ALTER SCHEMA statement, this will not change existing collation specifications in this dataset. If you want to update an existing collation specification in a dataset, you must alter the column that contains the specification.

  • project_name: The name of the project where you are creating the dataset. Defaults to the project that runs this DDL statement.

  • dataset_name: The name of the dataset to create.

  • schema_option_list: A list of options for creating the dataset.

Details

The dataset is created in the location that you specify in the query settings. For more information, see Specifying your location.

For more information about creating a dataset, see Creating datasets. For information about quotas, see Dataset limits.

schema_option_list

The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...

The following options are supported:

NAME VALUE Details
default_kms_key_name STRING Specifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table.
default_partition_expiration_days FLOAT64 Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table.
default_table_expiration_days FLOAT64 Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table.
description STRING The description of the dataset.
friendly_name STRING A descriptive name for the dataset.
is_case_insensitive BOOL

TRUE if the dataset and its table names are case-insensitive, otherwise FALSE. By default, this is FALSE, which means the dataset and its table names are case-sensitive.

  • Datasets: mydataset and MyDataset can coexist in the same project, unless one of them has case-sensitivity turned off.
  • Tables: mytable and MyTable can coexist in the same dataset if case-sensitivity for the dataset is turned on.

labels <ARRAY<STRUCT<STRING, STRING>>> An array of labels for the dataset, expressed as key-value pairs.
location STRING The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails.
max_time_travel_hours SMALLINT

In preview.

Specifies the duration in hours of the time travel window for the dataset. The max_time_travel_hours value must be an integer between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified.

For more information on the time travel window, see Configuring the time travel window.

storage_billing_model STRING

In preview.

Alters the storage billing model for the dataset to use physical bytes instead of logical bytes when calculating storage changes. To make this change, set the storage_billing_model value to physical.

When you change a dataset's billing model, it takes 24 hours for the change to take effect.

If you change a dataset's storage billing model to use physical bytes, you can't change it back to using logical bytes.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.datasets.create The project where you create the dataset.

Examples

Creating a new dataset

The following example creates a dataset with a default table expiration and a set of labels.

CREATE SCHEMA mydataset
OPTIONS(
  location="us",
  default_table_expiration_days=3.75,
  labels=[("label1","value1"),("label2","value2")]
  )

Creating a case-insensitive dataset

The following example creates a case-insensitive dataset. Both the dataset name and table names inside the dataset are case-insensitive.

CREATE SCHEMA mydataset
OPTIONS(
  is_case_insensitive=TRUE
)

Creating a dataset with collation support

The following example creates a dataset with a collation specification.

CREATE SCHEMA mydataset
DEFAULT COLLATE 'und:ci'

CREATE TABLE statement

Creates a new table.

Syntax

CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ]
table_name
[(
  column | constraint_definition[, ...]
)]
[DEFAULT COLLATE collate_specification]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(table_option_list)]
[AS query_statement]

column:=
column_definition

constraint_definition:=
[primary_key]
| [[CONSTRAINT constraint_name] foreign_key, ...]

primary_key :=
PRIMARY KEY (column_name[, ...]) NOT ENFORCED

foreign_key :=
FOREIGN KEY (column_name[, ...]) foreign_reference

foreign_reference :=
REFERENCES primary_key_table(column_name[, ...]) NOT ENFORCED

Arguments

  • OR REPLACE: Replaces any table with the same name if it exists. Cannot appear with IF NOT EXISTS.

  • TEMP | TEMPORARY: Creates a temporary table.

  • IF NOT EXISTS: If any table exists with the same name, the CREATE statement has no effect. Cannot appear with OR REPLACE.

  • table_name: The name of the table to create. See Table path syntax. For temporary tables, do not include the project name or dataset name.

  • column: The table's schema information.

  • constraint_definition: An expression that defines a table constraint.

  • collation_specification: When a new column is added to the table without an explicit collation specification, the column inherits this collation specification for STRING types.

    If you remove or change this collation specification later with the ALTER TABLE statement, this will not change existing collation specifications in this table. If you want to update an existing collation specification in a table, you must alter the column that contains the specification.

    If the table is part of a dataset, the default collation specification for this table overrides the default collation specification for the dataset.

  • partition_expression: An expression that determines how to partition the table.

  • clustering_column_list: A comma-separated list of column references that determine how to cluster the table. You cannot have collation on columns in this list.

  • table_option_list: A list of options for creating the table.

  • query_statement: The query from which the table should be created. For the query syntax, see SQL syntax reference. If a collation specification is used on this table, collation passes through this query statement.

  • primary_key: An expression that defines a primary key table constraint.

  • foreign_key: An expression that defines a foreign key table constraint.

Details

CREATE TABLE statements must comply with the following rules:

  • Only one CREATE statement is allowed.
  • Either the column list, the as query_statement clause, or both must be present.
  • When both the column list and the as query_statement clause are present, BigQuery ignores the names in the as query_statement clause and matches the columns with the column list by position.
  • When the as query_statement clause is present and the column list is absent, BigQuery determines the column names and types from the as query_statement clause.
  • Column names must be specified either through the column list, the as query_statement clause or schema of the table in the LIKE clause.
  • Duplicate column names are not allowed.
  • When both the LIKE and the as query_statement clause are present, the column list in the query statement must match the columns of the table referenced by the LIKE clause.
  • Table names are case-sensitive unless the dataset they belong to is not. To create a case-insensitive dataset, see Creating a case-insensitive dataset. To alter a dataset to make it case-insensitive dataset, see Turning on case insensitivity for a dataset.

Limitations:

  • It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
  • It is not possible to use the OR REPLACE modifier to replace a table with a different kind of partitioning. Instead, DROP the table, and then use a CREATE TABLE ... AS SELECT ... statement to recreate it.

This statement supports the following variants, which have the same limitations:

column

(column_name column_schema[, ...]) contains the table's schema information in a comma-separated list.

column :=
  column_name column_schema

column_schema :=
   {
     simple_type
     | STRUCT<field_list>
     | ARRAY<array_element_schema>
   }
   [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED]
   [DEFAULT default_expression]
   [NOT NULL]
   [OPTIONS(column_option_list)]

simple_type :=
  { data_type | STRING COLLATE collate_specification }

field_list :=
  field_name column_schema [, ...]

array_element_schema :=
  { simple_type | STRUCT<field_list> }
  [NOT NULL]
  • column_name is the name of the column. A column name:

    • Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
    • Must start with a letter or underscore
    • Can be up to 300 characters
  • column_schema: Similar to a data type, but supports an optional NOT NULL constraint for types other than ARRAY. column_schema also supports options on top-level columns and STRUCT fields.

    column_schema can be used only in the column definition list of CREATE TABLE statements. It cannot be used as a type in expressions.

  • simple_type: Any supported data type aside from STRUCT and ARRAY.

    If simple_type is a STRING, it supports an additional clause for collation, which defines how a resulting STRING can be compared and sorted. The syntax looks like this:

    STRING COLLATE collate_specification
    

    If you have DEFAULT COLLATE collate_specification assigned to the table, the collation specification for a column overrides the specification for the table.

  • default_expression: The default value assigned to the column.

  • field_list: Represents the fields in a struct.

  • field_name: The name of the struct field. Struct field names have the same restrictions as column names.

  • NOT NULL: When the NOT NULL constraint is present for a column or field, the column or field is created with REQUIRED mode. Conversely, when the NOT NULL constraint is absent, the column or field is created with NULLABLE mode.

    Columns and fields of ARRAY type do not support the NOT NULL modifier. For example, a column_schema of ARRAY<INT64> NOT NULL is invalid, since ARRAY columns have REPEATED mode and can be empty but cannot be NULL. An array element in a table can never be NULL, regardless of whether the NOT NULL constraint is specified. For example, ARRAY<INT64> is equivalent to ARRAY<INT64 NOT NULL>.

    The NOT NULL attribute of a table's column_schema does not propagate through queries over the table. If table T contains a column declared as x INT64 NOT NULL, for example, CREATE TABLE dataset.newtable AS SELECT x FROM T creates a table named dataset.newtable in which x is NULLABLE.

partition_expression

PARTITION BY is an optional clause that controls table partitioning. partition_expression is an expression that determines how to partition the table. The partition expression can contain the following values:

  • _PARTITIONDATE. Partition by ingestion time with daily partitions. This syntax cannot be used with the AS query_statement clause.
  • DATE(_PARTITIONTIME). Equivalent to _PARTITIONDATE. This syntax cannot be used with the AS query_statement clause.
  • <date_column>. Partition by a DATE column with daily partitions.
  • DATE({ <timestamp_column> | <datetime_column> }). Partition by a TIMESTAMP or DATETIME column with daily partitions.
  • DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR }). Partition by a DATETIME column with the specified partitioning type.
  • TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }). Partition by a TIMESTAMP column with the specified partitioning type.
  • TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR }). Partition by ingestion time with the specified partitioning type. This syntax cannot be used with the AS query_statement clause.
  • DATE_TRUNC(<date_column>, { MONTH | YEAR }). Partition by a DATE column with the specified partitioning type.
  • RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>])). Partition by an integer column with the specified range, where:

    • start is the start of range partitioning, inclusive.
    • end is the end of range partitioning, exclusive.
    • interval is the width of each range within the partition. Defaults to 1.

clustering_column_list

CLUSTER BY is an optional clause that controls table clustering. clustering_column_list is a comma-separated list that determines how to cluster the table. The clustering column list can contain a list of up to four clustering columns.

table_option_list

The option list allows you to set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

partition_expiration_days

FLOAT64

Example: partition_expiration_days=7

Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions do not expire.

This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example: require_partition_filter=true

Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value is false.

This property is equivalent to the timePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

kms_key_name

STRING

Example: kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property.

See more details about Protecting data with Cloud KMS keys.

friendly_name

STRING

Example: friendly_name="my_table"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a table that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

If VALUE evaluates to NULL, the corresponding option NAME in the CREATE TABLE statement is ignored.

column_option_list

Specify a column option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
description

STRING

Example: description="a unique id"

This property is equivalent to the schema.fields[].description table resource property.

rounding_mode

STRING

In preview.

Example: rounding_mode = "ROUND_HALF_EVEN"

This specifies the rounding mode that's used for values written to a parameterized NUMERIC or BIGNUMERIC type column or STRUCT field. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0.

This property is equivalent to the roundingMode table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERA