Data definition language (DDL) statements in GoogleSQL
Data definition language (DDL) statements let you create and modify BigQuery resources using GoogleSQL query syntax. You can use DDL commands to create, alter, and delete resources, such as tables, table clones, table snapshots, views, user-defined functions (UDFs), and row-level access policies.
Required permissions
To create a job that runs a DDL statement, you must have the
bigquery.jobs.create
permission for the project where you are running the job.
Each DDL statement also requires specific permissions on the affected resources,
which are documented under each statement.
IAM roles
The predefined IAM roles bigquery.user
,
bigquery.jobUser
, and bigquery.admin
include the required
bigquery.jobs.create
permission.
For more information about IAM roles in BigQuery, see Predefined roles and permissions or the IAM permissions reference.
Running DDL statements
You can run DDL statements by using the Google Cloud console, by using the
bq
command-line tool, by calling the
jobs.query
REST API, or
programmatically using the
BigQuery API client libraries.
Console
Go to the BigQuery page in the Google Cloud console.
Click Compose new query.
Enter the DDL statement into the Query editor text area. For example:
CREATE TABLE mydataset.newtable ( x INT64 )
Click Run.
bq
Enter the
bq query
command
and supply the DDL statement as the query parameter. Set the
use_legacy_sql
flag to false
.
bq query --use_legacy_sql=false \ 'CREATE TABLE mydataset.newtable ( x INT64 )'
API
Call the jobs.query
method
and supply the DDL statement in the request body's query
property.
DDL functionality extends the information returned by a
Jobs resource.
statistics.query.statementType
includes the following additional values for DDL
support:
CREATE_TABLE
CREATE_TABLE_AS_SELECT
DROP_TABLE
CREATE_VIEW
DROP_VIEW
statistics.query
has 2 additional fields:
ddlOperationPerformed
: The DDL operation performed, possibly dependent on the existence of the DDL target. Current values include:CREATE
: The query created the DDL target.SKIP
: No-op. Examples —CREATE TABLE IF NOT EXISTS
was submitted, and the table exists. OrDROP TABLE IF EXISTS
was submitted, and the table does not exist.REPLACE
: The query replaced the DDL target. Example —CREATE OR REPLACE TABLE
was submitted, and the table already exists.DROP
: The query deleted the DDL target.
ddlTargetTable
: When you submit aCREATE TABLE/VIEW
statement or aDROP TABLE/VIEW
statement, the target table is returned as an object with 3 fields:- "projectId": string
- "datasetId": string
- "tableId": string
Java
Call the
BigQuery.create()
method to start a query job. Call the
Job.waitFor()
method to wait for the DDL query to finish.
Node.js
Python
Call the
Client.query()
method to start a query job. Call the
QueryJob.result()
method to wait for the DDL query to finish.
CREATE SCHEMA
statement
Creates a new dataset.
Syntax
CREATE SCHEMA [ IF NOT EXISTS ] [project_name.]dataset_name [DEFAULT COLLATE collate_specification] [OPTIONS(schema_option_list)]
Arguments
IF NOT EXISTS
: If any dataset exists with the same name, theCREATE
statement has no effect. Cannot appear withOR REPLACE
.DEFAULT COLLATE collate_specification
: When a new table is created in the dataset, the table inherits a default collation specification unless a collation specification is explicitly specified for a column.If you remove or change this collation specification later with the
ALTER SCHEMA
statement, this will not change existing collation specifications in this dataset. If you want to update an existing collation specification in a dataset, you must alter the column that contains the specification.project_name
: The name of the project where you are creating the dataset. Defaults to the project that runs this DDL statement.dataset_name
: The name of the dataset to create.schema_option_list
: A list of options for creating the dataset.
Details
The dataset is created in the location that you specify in the query settings. For more information, see Specifying your location.
For more information about creating a dataset, see Creating datasets. For information about quotas, see Dataset limits.
schema_option_list
The option list specifies options for the dataset. Specify the options in the
following format: NAME=VALUE, ...
The following options are supported:
NAME |
VALUE |
Details |
---|---|---|
default_kms_key_name |
STRING |
Specifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table. |
default_partition_expiration_days |
FLOAT64 |
Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table. |
default_table_expiration_days |
FLOAT64 |
Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table. |
description |
STRING |
The description of the dataset. |
friendly_name |
STRING |
A descriptive name for the dataset. |
is_case_insensitive |
BOOL |
|
labels |
<ARRAY<STRUCT<STRING, STRING>>> |
An array of labels for the dataset, expressed as key-value pairs. |
location |
STRING |
The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails. |
max_time_travel_hours |
SMALLINT |
In preview. Specifies the duration in hours of the time travel window for the
dataset. The For more information on the time travel window, see Configuring the time travel window. |
storage_billing_model |
STRING |
In preview. Alters the
storage billing model
for the dataset to use physical bytes instead of logical bytes
when calculating storage changes. To make this change,
set the When you change a dataset's billing model, it takes 24 hours for the change to take effect. If you change a dataset's storage billing model to use physical bytes, you can't change it back to using logical bytes. |
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.create |
The project where you create the dataset. |
Examples
Creating a new dataset
The following example creates a dataset with a default table expiration and a set of labels.
CREATE SCHEMA mydataset OPTIONS( location="us", default_table_expiration_days=3.75, labels=[("label1","value1"),("label2","value2")] )
Creating a case-insensitive dataset
The following example creates a case-insensitive dataset. Both the dataset name and table names inside the dataset are case-insensitive.
CREATE SCHEMA mydataset OPTIONS( is_case_insensitive=TRUE )
Creating a dataset with collation support
The following example creates a dataset with a collation specification.
CREATE SCHEMA mydataset DEFAULT COLLATE 'und:ci'
CREATE TABLE
statement
Creates a new table.
Syntax
CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ] table_name [( column | constraint_definition[, ...] )] [DEFAULT COLLATE collate_specification] [PARTITION BY partition_expression] [CLUSTER BY clustering_column_list] [OPTIONS(table_option_list)] [AS query_statement] column:= column_definition constraint_definition:= [primary_key] | [[CONSTRAINT constraint_name] foreign_key, ...] primary_key := PRIMARY KEY (column_name[, ...]) NOT ENFORCED foreign_key := FOREIGN KEY (column_name[, ...]) foreign_reference foreign_reference := REFERENCES primary_key_table(column_name[, ...]) NOT ENFORCED
Arguments
OR REPLACE
: Replaces any table with the same name if it exists. Cannot appear withIF NOT EXISTS
.TEMP | TEMPORARY
: Creates a temporary table.IF NOT EXISTS
: If any table exists with the same name, theCREATE
statement has no effect. Cannot appear withOR REPLACE
.table_name
: The name of the table to create. See Table path syntax. For temporary tables, do not include the project name or dataset name.column
: The table's schema information.constraint_definition
: An expression that defines a table constraint.collation_specification
: When a new column is added to the table without an explicit collation specification, the column inherits this collation specification forSTRING
types.If you remove or change this collation specification later with the
ALTER TABLE
statement, this will not change existing collation specifications in this table. If you want to update an existing collation specification in a table, you must alter the column that contains the specification.If the table is part of a dataset, the default collation specification for this table overrides the default collation specification for the dataset.
partition_expression
: An expression that determines how to partition the table.clustering_column_list
: A comma-separated list of column references that determine how to cluster the table. You cannot have collation on columns in this list.table_option_list
: A list of options for creating the table.query_statement
: The query from which the table should be created. For the query syntax, see SQL syntax reference. If a collation specification is used on this table, collation passes through this query statement.primary_key
: An expression that defines a primary key table constraint.foreign_key
: An expression that defines a foreign key table constraint.
Details
CREATE TABLE
statements must comply with the following rules:
- Only one
CREATE
statement is allowed. - Either the column list, the
as query_statement
clause, or both must be present. - When both the column list and the
as query_statement
clause are present, BigQuery ignores the names in theas query_statement
clause and matches the columns with the column list by position. - When the
as query_statement
clause is present and the column list is absent, BigQuery determines the column names and types from theas query_statement
clause. - Column names must be specified either through the column list,
the
as query_statement
clause or schema of the table in theLIKE
clause. - Duplicate column names are not allowed.
- When both the
LIKE
and theas query_statement
clause are present, the column list in the query statement must match the columns of the table referenced by theLIKE
clause. - Table names are case-sensitive unless the dataset they belong to is not. To create a case-insensitive dataset, see Creating a case-insensitive dataset. To alter a dataset to make it case-insensitive dataset, see Turning on case insensitivity for a dataset.
Limitations:
- It is not possible to create an
ingestion-time partitioned table
from the result of a query. Instead, use a
CREATE TABLE
DDL statement to create the table, and then use anINSERT
DML statement to insert data into it. - It is not possible to use the
OR REPLACE
modifier to replace a table with a different kind of partitioning. Instead,DROP
the table, and then use aCREATE TABLE ... AS SELECT ...
statement to recreate it.
This statement supports the following variants, which have the same limitations:
CREATE TABLE LIKE
: Create a table with the same schema as an existing table.CREATE TABLE COPY
: Create a table by copying schema and data from an existing table.
column
(column_name column_schema[, ...])
contains the table's
schema information in a comma-separated list.
column := column_name column_schema column_schema := { simple_type | STRUCT<field_list> | ARRAY<array_element_schema> } [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED] [DEFAULT default_expression] [NOT NULL] [OPTIONS(column_option_list)] simple_type := { data_type | STRING COLLATE collate_specification } field_list := field_name column_schema [, ...] array_element_schema := { simple_type | STRUCT<field_list> } [NOT NULL]
column_name
is the name of the column. A column name:- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
- Must start with a letter or underscore
- Can be up to 300 characters
column_schema
: Similar to a data type, but supports an optionalNOT NULL
constraint for types other thanARRAY
.column_schema
also supports options on top-level columns andSTRUCT
fields.column_schema
can be used only in the column definition list ofCREATE TABLE
statements. It cannot be used as a type in expressions.simple_type
: Any supported data type aside fromSTRUCT
andARRAY
.If
simple_type
is aSTRING
, it supports an additional clause for collation, which defines how a resultingSTRING
can be compared and sorted. The syntax looks like this:STRING COLLATE collate_specification
If you have
DEFAULT COLLATE collate_specification
assigned to the table, the collation specification for a column overrides the specification for the table.default_expression
: The default value assigned to the column.field_list
: Represents the fields in a struct.field_name
: The name of the struct field. Struct field names have the same restrictions as column names.NOT NULL
: When theNOT NULL
constraint is present for a column or field, the column or field is created withREQUIRED
mode. Conversely, when theNOT NULL
constraint is absent, the column or field is created withNULLABLE
mode.Columns and fields of
ARRAY
type do not support theNOT NULL
modifier. For example, acolumn_schema
ofARRAY<INT64> NOT NULL
is invalid, sinceARRAY
columns haveREPEATED
mode and can be empty but cannot beNULL
. An array element in a table can never beNULL
, regardless of whether theNOT NULL
constraint is specified. For example,ARRAY<INT64>
is equivalent toARRAY<INT64 NOT NULL>
.The
NOT NULL
attribute of a table'scolumn_schema
does not propagate through queries over the table. If tableT
contains a column declared asx INT64 NOT NULL
, for example,CREATE TABLE dataset.newtable AS SELECT x FROM T
creates a table nameddataset.newtable
in whichx
isNULLABLE
.
partition_expression
PARTITION BY
is an optional clause that controls
table partitioning. partition_expression
is an expression that determines how to partition the table. The partition
expression can contain the following values:
_PARTITIONDATE
. Partition by ingestion time with daily partitions. This syntax cannot be used with theAS query_statement
clause.DATE(_PARTITIONTIME)
. Equivalent to_PARTITIONDATE
. This syntax cannot be used with theAS query_statement
clause.<date_column>
. Partition by aDATE
column with daily partitions.DATE({ <timestamp_column> | <datetime_column> })
. Partition by aTIMESTAMP
orDATETIME
column with daily partitions.DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR })
. Partition by aDATETIME
column with the specified partitioning type.TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR })
. Partition by aTIMESTAMP
column with the specified partitioning type.TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR })
. Partition by ingestion time with the specified partitioning type. This syntax cannot be used with theAS query_statement
clause.DATE_TRUNC(<date_column>, { MONTH | YEAR })
. Partition by aDATE
column with the specified partitioning type.RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>]))
. Partition by an integer column with the specified range, where:start
is the start of range partitioning, inclusive.end
is the end of range partitioning, exclusive.interval
is the width of each range within the partition. Defaults to 1.
clustering_column_list
CLUSTER BY
is an optional clause that controls table clustering.
clustering_column_list
is a comma-separated list that determines how to
cluster the table. The clustering column list can contain a list of up to four
clustering columns.
table_option_list
The option list allows you to set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a table option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
expiration_timestamp |
TIMESTAMP |
Example: This property is equivalent to the expirationTime table resource property. |
partition_expiration_days |
|
Example: Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions do not expire. This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours. This property can only be set if the table is partitioned. |
require_partition_filter |
|
Example: Specifies whether queries on this table must include a a predicate
filter that filters on the partitioning column. For more information,
see
Set partition filter requirements. The default value is
This property is equivalent to the timePartitioning.requirePartitionFilter table resource property. This property can only be set if the table is partitioned. |
kms_key_name |
|
Example: This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property. See more details about Protecting data with Cloud KMS keys. |
friendly_name |
|
Example: This property is equivalent to the friendlyName table resource property. |
description |
|
Example: This property is equivalent to the description table resource property. |
labels |
|
Example: This property is equivalent to the labels table resource property. |
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the
CREATE TABLE
statement is ignored.
column_option_list
Specify a column option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
description |
|
Example: This property is equivalent to the schema.fields[].description table resource property. |
rounding_mode |
|
In preview. Example: This specifies the rounding mode
that's used for values written to a parameterized
This property is equivalent to the
|
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERA