Using data definition language statements

Data definition language (DDL) statements let you create and modify BigQuery resources using standard SQL query syntax. Currently, you can use DDL commands in BigQuery to:

Running DDL statements

You can run DDL statements by using the Cloud Console, by using the bq command-line tool, by calling the jobs.query REST API, or programmatically using the BigQuery API client libraries.

Console

  1. Go to the BigQuery page in the Cloud Console.

    Go to BigQuery

  2. Click Compose new query.

    Compose new query.

  3. Enter the DDL statement into the Query editor text area. For example:

     CREATE TABLE mydataset.newtable ( x INT64 )
     

  4. Click Run.

bq

Enter the bq query command and supply the DDL statement as the query parameter. Set the use_legacy_sql flag to false.

bq query --use_legacy_sql=false \
  'CREATE TABLE mydataset.newtable ( x INT64 )'

API

Call the jobs.query method and supply the DDL statement in the request body's query property.

DDL functionality extends the information returned by a Jobs resource. statistics.query.statementType includes the following additional values for DDL support:

  • CREATE_TABLE
  • CREATE_TABLE_AS_SELECT
  • DROP_TABLE
  • CREATE_VIEW
  • DROP_VIEW

statistics.query has 2 additional fields:

  • ddlOperationPerformed: The DDL operation performed, possibly dependent on the existence of the DDL target. Current values include:
    • CREATE: The query created the DDL target.
    • SKIP: No-op. Examples — CREATE TABLE IF NOT EXISTS was submitted, and the table exists. Or DROP TABLE IF EXISTS was submitted, and the table does not exist.
    • REPLACE: The query replaced the DDL target. Example — CREATE OR REPLACE TABLE was submitted, and the table already exists.
    • DROP: The query deleted the DDL target.
  • ddlTargetTable: When you submit a CREATE TABLE/VIEW statement or a DROP TABLE/VIEW statement, the target table is returned as an object with 3 fields:
    • "projectId": string
    • "datasetId": string
    • "tableId": string

Java

Call the BigQuery.create() method to start a query job. Call the Job.waitFor() method to wait for the DDL query to finish.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;

// Sample to create a view using DDL
public class DDLCreateView {

  public static void runDDLCreateView() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetId = "MY_DATASET_ID";
    String tableId = "MY_VIEW_ID";
    String ddl =
        "CREATE VIEW "
            + "`"
            + projectId
            + "."
            + datasetId
            + "."
            + tableId
            + "`"
            + " OPTIONS("
            + " expiration_timestamp=TIMESTAMP_ADD("
            + " CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),"
            + " friendly_name=\"new_view\","
            + " description=\"a view that expires in 2 days\","
            + " labels=[(\"org_unit\", \"development\")]"
            + " )"
            + " AS SELECT name, state, year, number"
            + " FROM `bigquery-public-data.usa_names.usa_1910_current`"
            + " WHERE state LIKE 'W%'`";
    ddlCreateView(ddl);
  }

  public static void ddlCreateView(String ddl) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration config = QueryJobConfiguration.newBuilder(ddl).build();

      // create a view using query and it will wait to complete job.
      Job job = bigquery.create(JobInfo.of(config));
      job = job.waitFor();
      if (job.isDone()) {
        System.out.println("View created successfully");
      } else {
        System.out.println("View was not created");
      }
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("View was not created. \n" + e.toString());
    }
  }
}

Node.js

// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function ddlCreateView() {
  // Creates a view via a DDL query

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const projectId = "my_project"
  // const datasetId = "my_dataset"
  // const tableId = "my_new_view"

  const query = `
  CREATE VIEW \`${projectId}.${datasetId}.${tableId}\`
  OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(
          CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
      friendly_name="new_view",
      description="a view that expires in 2 days",
      labels=[("org_unit", "development")]
  )
  AS SELECT name, state, year, number
      FROM \`bigquery-public-data.usa_names.usa_1910_current\`
      WHERE state LIKE 'W%'`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);

  job.on('complete', metadata => {
    console.log(`Created new view ${tableId} via job ${metadata.id}`);
  });
}

Python

Call the Client.query() method to start a query job. Call the QueryJob.result() method to wait for the DDL query to finish.

# from google.cloud import bigquery
# project = 'my-project'
# dataset_id = 'my_dataset'
# table_id = 'new_view'
# client = bigquery.Client(project=project)

sql = """
CREATE VIEW `{}.{}.{}`
OPTIONS(
    expiration_timestamp=TIMESTAMP_ADD(
        CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
    friendly_name="new_view",
    description="a view that expires in 2 days",
    labels=[("org_unit", "development")]
)
AS SELECT name, state, year, number
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state LIKE 'W%'
""".format(
    project, dataset_id, table_id
)

job = client.query(sql)  # API request.
job.result()  # Waits for the query to finish.

print(
    'Created new view "{}.{}.{}".'.format(
        job.destination.project,
        job.destination.dataset_id,
        job.destination.table_id,
    )
)

CREATE TABLE statement

To create a table in BigQuery, use the CREATE TABLE DDL statement.

{CREATE TABLE | CREATE TABLE IF NOT EXISTS | CREATE OR REPLACE TABLE}
[[project_name.]dataset_name.]table_name
[(
  column_name column_schema[, ...]
)]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(table_option_list)]
[AS query_statement]

Where:

{CREATE TABLE | CREATE TABLE IF NOT EXISTS | CREATE OR REPLACE TABLE} is one of the following statements:

  • CREATE TABLE: Creates a new table.
  • CREATE TABLE IF NOT EXISTS: Creates a new table only if the table does not currently exist in the specified dataset.
  • CREATE OR REPLACE TABLE: Creates a table and replaces an existing table with the same name in the specified dataset.

CREATE TABLE statements must comply with the following rules:

  • Only one CREATE statement is allowed.
  • Either the column list, the as query_statement clause, or both must be present.
  • When both the column list and the as query_statement clause are present, BigQuery ignores the names in the as query_statement clause and matches the columns with the column list by position.
  • When the as query_statement clause is present and the column list is absent, BigQuery determines the column names and types from the as query_statement clause.
  • Column names must be specified either through the column list or as query_statement clause.
  • Duplicate column names are not allowed.

Table path

project_name is the name of the project where you are creating the table. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset where you are creating the table. Defaults to the defaultDataset in the request.

table_name is the name of the table you're creating.

When you create a table in BigQuery, the table name must be unique per dataset. The table name can:

  • Contain up to 1,024 characters.
  • Contain Unicode characters in category L (letter), M (mark), N (number), Pc (connector, including underscore), Pd (dash), Zs (space). For more information, see General Category.

For example, the following are all valid table names: table-01, ग्राहक, 00_お客様, étudiant.

column_name and column_schema

(column_name column_schema[, ...]) contains the table's schema information in a comma-separated list:

  • column_name is the name of the column. A column name:
    • Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
    • Must start with a letter or underscore
    • Can be up to 128 characters
  • column_schema is similar to a data type, but supports an optional NOT NULL constraint for types other than ARRAY. column_schema also supports options on top-level columns and STRUCT fields.
column_schema :=
   {simple_type [NOT NULL] |
    STRUCT<field_list> [NOT NULL] |
    ARRAY<array_element_schema>}
   [OPTIONS(column_option_list)]

field_list := field_name column_schema [, ...]

array_element_schema := {simple_type | STRUCT<field_list>} [NOT NULL]

simple_type is any supported data type aside from STRUCT and ARRAY.

field_name is the name of the struct field. Struct field names have the same restrictions as column names.

When the NOT NULL constraint is present for a column or field, the column or field is created with REQUIRED mode. Conversely, when the NOT NULL constraint is absent, the column or field is created with NULLABLE mode.

Columns and fields of ARRAY type do not support the NOT NULL modifier. For example, a column_schema of ARRAY<INT64> NOT NULL is invalid, since ARRAY columns have REPEATED mode and can be empty but cannot be NULL. An array element in a table can never be NULL, regardless of whether the NOT NULL constraint is specified. For example, ARRAY<INT64> is equivalent to ARRAY<INT64 NOT NULL>.

The NOT NULL attribute of a table's column_schema does not propagate through queries over the table. If table T contains a column declared as x INT64 NOT NULL, for example, CREATE TABLE dataset.newtable AS SELECT x FROM T creates a table named dataset.newtable in which x is NULLABLE.

column_schema can be used only in the column definition list of CREATE TABLE statements. It cannot be used as a type in expressions. For example, CAST(1 AS INT64 NOT NULL) is not valid.

partition_expression

PARTITION BY is an optional clause that controls table partitioning. partition_expression is an expression that determines how to partition the table. The partition expression can contain the following values:

  • PARTITION BY DATE(_PARTITIONTIME): Partitions the table using the date-based timestamp in the _PARTITIONTIME pseudo column. This syntax is only supported with CREATE TABLE without the AS query_statement clause.
  • PARTITION BY _PARTITIONDATE: Partitions the table using the _PARTITIONDATE pseudo column. This syntax is only supported with CREATE TABLE without the AS query_statement clause and is equivalent to using PARTITION BY DATE(_PARTITIONTIME).
  • PARTITION BY DATE(<timestamp_column>): Partitions the table using the date of the TIMESTAMP column
  • PARTITION BY RANGE_BUCKET(<integer_column>, GENERATE_ARRAY(start, end[, interval])): Partitions the table using the specified INTEGER column range. start is the start of the range partitioning (inclusive), end is the end of the range partitioning (exclusive), and interval is the width of each range within the partition. Defaults to 1 if not set.
  • PARTITION BY TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }): Partitions the table using the date/hour/month/year of the TIMESTAMP column
  • PARTITION BY DATETIME_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }): Partitions the table using the date/hour/month/year of the DATETIME column
  • PARTITION BY <date_column>: Partitions the table using the DATE column

clustering_column_list

CLUSTER BY is an optional clause that controls table clustering. clustering_column_list is a comma-separated list that determines how to cluster the table. The clustering column list can contain a list of up to four clustering columns.

table_option_list

The option list allows you to set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

partition_expiration_days

FLOAT64

Example: partition_expiration_days=7

This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example: require_partition_filter=true

This property is equivalent to the timePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

kms_key_name

STRING

Example: kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property.

See more details about Protecting data with Cloud KMS keys.

friendly_name

STRING

Example: friendly_name="my_table"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a table that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions. If the constant expression evaluates to null, the corresponding option NAME is ignored.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, and UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

column_option_list

The column_option_list in column_schema lets you specify optional column or field options. Column options have the same syntax and requirements as table options but with a different list of NAMEs and VALUEs:

NAME VALUE Details
description

STRING

Example: description="a unique id"

This property is equivalent to the schema.fields[].description table resource property.

query_statement

The AS query_statement clause specifies the query from which the table should be created. See the SQL syntax reference for the supported form of query_statement.

Known limitations:

  • It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
  • It is not possible to use the OR REPLACE modifier to replace a table with a different kind of partitioning. Instead, DROP the table, and then use a CREATE TABLE ... AS SELECT ... statement to recreate it.

Temporary tables

To create a temporary table, use the TEMP or TEMPORARY keyword when you use the CREATE TABLE statement.

Syntax

{ CREATE {TEMP|TEMPORARY} TABLE |
  CREATE {TEMP|TEMPORARY} TABLE IF NOT EXISTS |
  CREATE OR REPLACE {TEMP|TEMPORARY} TABLE } ...

Other than the use of TEMP or TEMPORARY, the syntax is identical to the CREATE TABLE syntax.

Don't qualify temporary table names; that is, don't use a project or dataset qualifier. Temporary tables are automatically created in a special dataset.

You can refer to a temporary table by name for the duration of the current script; for more information, see Scripting in standard SQL. This includes tables created by a procedure within the script. You cannot query a table once the script it is created in completes.

After a script finishes, the temporary table exists for up to 24 hours. It is not saved using the name you gave it, but is assigned a random name instead. To view table structure and data, go to the BigQuery console, click Query history, and choose the query that created the temporary table. Then, in the Destination table row, click Temporary table.

You cannot share temporary tables, and they are not visible using any of the standard list or other table manipulation methods. You are not charged for storing temporary tables.

To create a temporary table:

CREATE TEMP TABLE Example
(
  x INT64,
  y STRING
);

INSERT INTO Example
VALUES (5, 'foo');

INSERT INTO Example
VALUES (6, 'bar');

SELECT *
FROM Example;

This script returns the following output:

+-----+---+-----+
| Row | x | y   |
+-----+---|-----+
| 1   | 5 | foo |
| 2   | 6 | bar |
+-----+---|-----+

You can delete a temporary table explicitly before the script completes, by executing a DROP TABLE statement:

CREATE TEMP TABLE foo(x INT64);
SELECT * FROM foo;  -- Succeeds
DROP TABLE foo;
SELECT * FROM foo;  -- Results in an error

When temporary tables are used together with a default dataset, unqualified table names refer to:

  • a temporary table, if one exists
  • otherwise, a table in the default dataset

The exception is for CREATE TABLE statements, where the target table is considered a temporary table if and only if the TEMP or TEMPORARY keyword is present.

For example, consider the following script:

-- Create table t1 in the default dataset
CREATE TABLE t1 (x INT64);

-- Create temporary table t1
CREATE TEMP TABLE t1 (x INT64);

-- This statement will select from the temporary table
SELECT * FROM t1;

-- Drop the temporary table
DROP TABLE t1;

-- Now that the temporary table is dropped, this statement will select from
-- the table in the default dataset
SELECT * FROM t1;

You can explicitly indicate that you are referring to a temporary table by qualifying the table name with _SESSION:

-- Create a temp table
CREATE TEMP TABLE t1 (x INT64);

-- Create a temp table using the `_SESSION` qualifier
CREATE TEMP TABLE _SESSION.t2 (x INT64);

-- Select from a temporary table using the `_SESSION` qualifier
SELECT * FROM _SESSION.t1;

If you use the _SESSION qualifier for a query of a temporary table that does not exist, you will received an error indicating the table does not exist. For example, if there is no temporary table named t3, you will receive the error even if a table named t3 exists in the default dataset.

You cannot use _SESSION to create a non-temporary table:

CREATE TABLE _SESSION.t4 (x INT64);  -- Fails

Examples

Creating a new table

The following example creates a partitioned table named newtable in mydataset:

CREATE TABLE mydataset.newtable
(
  x INT64 OPTIONS(description="An optional INTEGER field"),
  y STRUCT<
    a ARRAY<STRING> OPTIONS(description="A repeated STRING field"),
    b BOOL
  >
)
PARTITION BY _PARTITIONDATE
OPTIONS(
  expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC",
  partition_expiration_days=1,
  description="a table that expires in 2025, with each partition living for 24 hours",
  labels=[("org_unit", "development")]
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table uses the following partition_expression to partition the table: PARTITION BY _PARTITIONDATE. This expression partitions the table using the date in the _PARTITIONDATE pseudo column.

The table schema contains two columns:

  • x: An integer, with description "An optional INTEGER field"
  • y: A STRUCT containing two columns:

    • a: An array of strings, with description "A repeated STRING field"
    • b: A boolean

The table option list specifies the:

  • Table expiration time: January 1, 2025 at 00:00:00 UTC
  • Partition expiration time: 1 day
  • Description: A table that expires in 2025
  • Label: org_unit = development

Creating a new table from an existing table

The following example creates a table named top_words in mydataset from a query:

CREATE TABLE mydataset.top_words
OPTIONS(
  description="Top ten words per Shakespeare corpus"
) AS
SELECT
  corpus,
  ARRAY_AGG(STRUCT(word, word_count) ORDER BY word_count DESC LIMIT 10) AS top_words
FROM bigquery-public-data.samples.shakespeare
GROUP BY corpus;

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.top_words, your table qualifier might be `myproject.mydataset.top_words`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table schema contains 2 columns:

  • corpus: Name of a Shakespeare corpus
  • top_words: An ARRAY of STRUCTs containing 2 fields: word (a STRING) and word_count (an INT64 with the word count)

The table option list specifies the:

  • Description: Top ten words per Shakespeare corpus

Creating a table only if the table doesn't exist

The following example creates a table named newtable in mydataset only if no table named newtable exists in mydataset. If the table name exists in the dataset, no error is returned, and no action is taken.

CREATE TABLE IF NOT EXISTS mydataset.newtable (x INT64, y STRUCT<a ARRAY<STRING>, b BOOL>)
OPTIONS(
  expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC",
  description="a table that expires in 2025",
  labels=[("org_unit", "development")]
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • x: An integer
  • y: A STRUCT containing a (an array of strings) and b (a boolean)

The table option list specifies the:

  • Expiration time: January 1, 2025 at 00:00:00 UTC
  • Description: A table that expires in 2025
  • Label: org_unit = development

Creating or replacing a table

The following example creates a table named newtable in mydataset, and if newtable exists in mydataset, it is overwritten with an empty table.

CREATE OR REPLACE TABLE mydataset.newtable (x INT64, y STRUCT<a ARRAY<STRING>, b BOOL>)
OPTIONS(
  expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC",
  description="a table that expires in 2025",
  labels=[("org_unit", "development")]
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • x: An integer
  • y: A STRUCT containing a (an array of strings) and b (a boolean)

The table option list specifies the:

  • Expiration time: January 1, 2025 at 00:00:00 UTC
  • Description: A table that expires in 2025
  • Label: org_unit = development

Creating a table with REQUIRED columns

The following example creates a table named newtable in mydataset. The NOT NULL modifier in the column definition list of a CREATE TABLE statement specifies that a column or field is created in REQUIRED mode.

CREATE TABLE my_dataset.new_table (
  x INT64 NOT NULL,
  y STRUCT<
    a ARRAY<STRING>,
    b BOOL NOT NULL,
    c FLOAT64
  > NOT NULL,
  z STRING
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of my_dataset.new_table, your table qualifier might be `myproject.my_dataset.new_table`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table schema contains 3 columns:

  • x: A REQUIRED integer
  • y: A REQUIRED STRUCT containing a (an array of strings), b (a REQUIRED boolean), and c (a NULLABLE float)
  • z: A NULLABLE string

Creating a partitioned table

The following example creates a partitioned table named newtable in mydataset using a DATE column:

CREATE TABLE mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY transaction_date
OPTIONS(
  partition_expiration_days=3,
  description="a table partitioned by transaction_date"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • transaction_id: An integer
  • transaction_date: A date

The table option list specifies the:

  • Partition expiration: Three days
  • Description: A table partitioned by transaction_date

Creating a partitioned table from the result of a query

The following example creates a partitioned table named days_with_rain in mydataset using a DATE column:

CREATE TABLE mydataset.days_with_rain
PARTITION BY date
OPTIONS (
  partition_expiration_days=365,
  description="weather stations with precipitation, partitioned by day"
) AS
SELECT
  DATE(CAST(year AS INT64), CAST(mo AS INT64), CAST(da AS INT64)) AS date,
  (SELECT ANY_VALUE(name) FROM `bigquery-public-data.noaa_gsod.stations` AS stations
   WHERE stations.usaf = stn) AS station_name,  -- Stations can have multiple names
  prcp
FROM `bigquery-public-data.noaa_gsod.gsod2017` AS weather
WHERE prcp != 99.9  -- Filter unknown values
  AND prcp > 0      -- Filter stations/days with no precipitation

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.days_with_rain, your table qualifier might be `myproject.mydataset.days_with_rain`.

The table schema contains 2 columns:

  • date: The DATE of data collection
  • station_name: The name of the weather station as a STRING
  • prcp: The amount of precipitation in inches as a FLOAT64

The table option list specifies the:

  • Partition expiration: One year
  • Description: Weather stations with precipitation, partitioned by day

Creating a clustered table

Example 1

The following example creates a clustered table named myclusteredtable in mydataset. The table is a partitioned table, partitioned by a TIMESTAMP column and clustered by a STRING column named customer_id.

CREATE TABLE mydataset.myclusteredtable
(
  timestamp TIMESTAMP,
  customer_id STRING,
  transaction_amount NUMERIC
)
PARTITION BY DATE(timestamp)
CLUSTER BY customer_id
OPTIONS (
  partition_expiration_days=3,
  description="a table clustered by customer_id"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 3 columns:

  • timestamp: The time of data collection as a TIMESTAMP
  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description: "A table clustered by customer_id"
Example 2

The following example creates a clustered table named myclusteredtable in mydataset. The table is an ingestion-time partitioned table.

CREATE TABLE mydataset.myclusteredtable
(
  customer_id STRING,
  transaction_amount NUMERIC
)
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY
  customer_id
OPTIONS (
  partition_expiration_days=3,
  description="a table clustered by customer_id"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description: "A table clustered by customer_id"
Example 3

The following example creates a clustered table named myclusteredtable in mydataset. The table is not partitioned.

CREATE TABLE mydataset.myclusteredtable
(
  customer_id STRING,
  transaction_amount NUMERIC
)
CLUSTER BY
  customer_id
OPTIONS (
  description="a table clustered by customer_id"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Description: "A table clustered by customer_id"

Creating a clustered table from the result of a query

Example 1

The following example creates a clustered table named myclusteredtable in mydataset using the result of a query. The table is a partitioned table, partitioned by a TIMESTAMP column.

CREATE TABLE mydataset.myclusteredtable
(
  timestamp TIMESTAMP,
  customer_id STRING,
  transaction_amount NUMERIC
)
PARTITION BY DATE(timestamp)
CLUSTER BY
  customer_id
OPTIONS (
  partition_expiration_days=3,
  description="a table clustered by customer_id"
)
AS SELECT * FROM mydataset.myothertable

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 3 columns:

  • timestamp: The time of data collection as a TIMESTAMP
  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description: "A table clustered by customer_id"
Example 2

The following example creates a clustered table named myclusteredtable in mydataset using the result of a query. The table is not partitioned.

CREATE TABLE mydataset.myclusteredtable
(
  customer_id STRING,
  transaction_amount NUMERIC
)
CLUSTER BY
  customer_id
OPTIONS (
  description="a table clustered by customer_id"
)
AS SELECT * FROM mydataset.myothertable

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Description: "A table clustered by customer_id"

CREATE VIEW statement

To create a view in BigQuery, use the CREATE VIEW DDL statement.

{CREATE VIEW | CREATE VIEW IF NOT EXISTS | CREATE OR REPLACE VIEW}
[[project_name.]dataset_name.]view_name
[OPTIONS(view_option_list)]
AS query_expression

Where:

{CREATE VIEW | CREATE VIEW IF NOT EXISTS | CREATE OR REPLACE VIEW} is one of the following statements:

  • CREATE VIEW: Creates a new view.
  • CREATE VIEW IF NOT EXISTS: Creates a new view only if the view does not currently exist in the specified dataset.
  • CREATE OR REPLACE VIEW: Creates a view and replaces an existing view with the same name in the specified dataset.

project_name is the name of the project where you are creating the view. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset where you are creating the view. Defaults to the defaultDataset in the request.

view_name is the name of the view you're creating. The view name must be unique per dataset. The view name can:

  • Contain up to 1,024 characters
  • Contain letters (upper or lower case), numbers, and underscores

view_option_list lets you specify additional view creation options such as a label and an expiration time.

CREATE VIEW statements must comply with the following rules:

  • Only one CREATE statement is allowed.

query_expression is the standard SQL query expression used to define the view.

view_option_list

The option list allows you to set view options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a view option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

friendly_name

STRING

Example: friendly_name="my_view"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a view that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions. If the constant expression evaluates to null, the corresponding option NAME is ignored.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, and UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Default project in view body

If the view is created in the same project used to run the CREATE VIEW statement, the view body query_expression can reference entities without specifying the project; the default project is the project which owns the view. Consider the sample query below.

CREATE VIEW myProject.myDataset.myView AS SELECT * FROM anotherDataset.myTable;

After running the above CREATE VIEW query in the project myProject, you can run the query SELECT * FROM myProject.myDataset.myView. Regardless of the project you choose to run this SELECT query, the referenced table anotherDataset.myTable is always resolved against project myProject.

If the view is not created in the same project used to run the CREATE VIEW statement, then all references in the view body query_expression must be qualified with project IDs. For instance, the preceding sample CREATE VIEW query is invalid if it runs in a project different from myProject.

Examples

Creating a new view

The following example creates a view named newview in mydataset:

CREATE VIEW `myproject.mydataset.newview`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="newview",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

If the view name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The view is defined using the following standard SQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: newview
  • Description: A view that expires in 2 days
  • Label: org_unit = development

Creating a view only if the view doesn't exist

The following example creates a view named newview in mydataset only if no view named newview exists in mydataset. If the view name exists in the dataset, no error is returned, and no action is taken.

CREATE VIEW IF NOT EXISTS `myproject.mydataset.newview`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="newview",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view is defined using the following standard SQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: newview
  • Description: A view that expires in 2 days
  • Label: org_unit = development

Creating or replacing a view

The following example creates a view named newview in mydataset, and if newview exists in mydataset, it is overwritten using the specified query expression.

CREATE OR REPLACE VIEW `myproject.mydataset.newview`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="newview",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view is defined using the following standard SQL query:

SELECT column_1, column_2, column_3 FROM myproject.mydataset.mytable

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: newview
  • Description: A view that expires in 2 days
  • Label: org_unit = development

CREATE MATERIALIZED VIEW statement

To create a materialized view in BigQuery, use the CREATE MATERIALIZED VIEW DDL statement.

{CREATE MATERIALIZED VIEW | CREATE MATERIALIZED VIEW IF NOT EXISTS }
[[project_name.]dataset_name.]materialized_view_name
[OPTIONS(materialized_view_option_list)]
AS query_expression

Where:

{CREATE MATERIALIZED VIEW | CREATE MATERIALIZED VIEW IF NOT EXISTS } is one of the following statements:

  • CREATE MATERIALIZED VIEW: Creates a new materialized view.

  • CREATE MATERIALIZED VIEW IF NOT EXISTS: Creates a new materialized view only if the materialized view does not currently exist in the specified dataset.

project_name is the name of the project where you are creating the materialized view. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

If the project_name is omitted or it is the same as the project that runs this DDL query, then the latter is also used as the default project of the references to tables, functions, etc., in query_expression (note, the default project of the references is fixed and does not depend on the future queries that invoke the new materialized view). Otherwise, all references in query_expression must be qualified with projects.

dataset_name is the name of the dataset where you are creating the materialized view. Defaults to the defaultDataset in the request.

materialized_view_name is the name of the materialized view you're creating. The materialized view name must be unique per dataset. The materialized view name can:

  • Contain up to 1,024 characters
  • Contain letters (upper or lower case), numbers, and underscores

materialized_view_option_list allows you to specify additional materialized view options such as a whether refresh is enabled, the refresh interval, a label, and an expiration time.

CREATE MATERIALIZED VIEW statements must comply with the following rules:

  • Only one CREATE statement is allowed.

query_expression is the standard SQL query expression used to define the materialized view.

materialized_view_option_list

The option list allows you to set materialized view options such as a whether refresh is enabled. the refresh interval, a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a materialized view option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
enable_refresh BOOLEAN

Example: enable_refresh=false

refresh_interval_minutes FLOAT64

Example: refresh_interval_minutes=20

expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

friendly_name

STRING

Example: friendly_name="my_mv"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a materialized view that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

Default project in materialized view body

If the materialized view is created in the same project used to run the CREATE MATERIALIZED VIEW statement, the materialized view body query_expression can reference entities without specifying the project; the default project is the project which owns the materialized view. Consider the sample query below.

CREATE MATERIALIZED VIEW myProject.myDataset.myView AS SELECT * FROM anotherDataset.myTable;

After running the above CREATE MATERIALIZED VIEW query in the project myProject, you can run the query SELECT * FROM myProject.myDataset.myView. Regardless of the project you choose to run this SELECT query, the referenced table anotherDataset.myTable is always resolved against project myProject.

If the materialized view is not created in the same project used to run the CREATE VIEW statement, then all references in the materialized view body query_expression must be qualified with project IDs. For instance, the preceding sample CREATE MATERIALIZED VIEW query is invalid if it runs in a project different from myProject.

Examples

Creating a new materialized view

The following example creates a materialized view named new_mv in mydataset:

CREATE MATERIALIZED VIEW `myproject.mydataset.new_mv`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="new_mv",
  description="a materialized view that expires in 2 days",
  labels=[("org_unit", "development")].
  enable_refresh=true,
  refresh_interval_minutes=20
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

If the materialized view name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.materialized_view

When you use a DDL statement to create a materialized view, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view` (including the backticks if project_id contains special characters); for example, `myproject.mydataset.new_mv`.

The materialized view is defined using the following standard SQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view option list specifies the:

  • Expiration time: 48 hours from the time the materialized view is created
  • Friendly name: new_mv
  • Description: A materialized view that expires in 2 days
  • Label: org_unit = development
  • Refresh enabled: true
  • Refresh interval: 20 minutes

Creating a materialized view only if the materialized view doesn't exist

The following example creates a materialized view named new_mv in mydataset only if no materialized view named new_mv exists in mydataset. If the materialized view name exists in the dataset, no error is returned, and no action is taken.

CREATE MATERIALIZED VIEW IF NOT EXISTS `myproject.mydataset.new_mv`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="new_mv",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")],
  enable_refresh=false
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view is defined using the following standard SQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: new_mv
  • Description: A view that expires in 2 days
  • Label: org_unit = development
  • Refresh enabled: false

CREATE EXTERNAL TABLE statement

The CREATE EXTERNAL TABLE statement creates an external table. External tables let BigQuery query data that is stored outside of BigQuery storage. For more information about external tables, see Introduction to external data sources.

CREATE [OR REPLACE] EXTERNAL TABLE [IF NOT EXISTS] [[project_name.]dataset_name.]table_name
[(
  column_name column_schema,
  ...
)]
[WITH PARTITION COLUMNS
  [(
      partition_column_name partition_column_type,
      ...
  )]
]
OPTIONS (
  external_table_option_list,
  ...
);

Where:

  • project_name is the name of the project where you are creating the table. Defaults to the project that runs this DDL query.

  • dataset_name is the name of the dataset where you are creating the table.

  • table_name is the name of the external table.

  • column_name is the name of a column in the table.

  • column_schema specifies the schema of the column. It uses the same syntax as the column_schema definition in the CREATE TABLE statement. If you don't include this clause, BigQuery detects the schema automatically.

  • partition_column_name is the name of a partition column. Include this field if your external data uses a hive-partitioned layout. For more information, see: Supported data layouts.

  • partition_column_type is the partition column type.

  • external_table_option_list specifies a list of options for creating the external table.

external_table_option_list

The option list specifies options for creating the external table. The format and uris options are required. Specify the option list in the following format: NAME=VALUE, ...

Options
allow_jagged_rows

BOOL

If true, allow rows that are missing trailing optional columns.

Applies to CSV data.

allow_quoted_newlines

BOOL

If true, allow quoted data sections that contain newline characters in the file.

Applies to CSV data.

compression

STRING

The compression type of the data source. Supported values include: GZIP. If not specified, the data source is uncompressed.

Applies to CSV and JSON data.

description

STRING

A description of this table.

enable_logical_types

BOOL

If true, convert Avro logical types into their corresponding SQL types. For more information, see Logical types.

Applies to Avro data.

encoding

STRING

The character encoding of the data. Supported values include: UTF8 (or UTF-8), ISO_8859_1 (or ISO-8859-1).

Applies to CSV data.

expiration_timestamp

TIMESTAMP

The time when this table expires. If not specified, the table does not expire.

Example: "2025-01-01 00:00:00 UTC".

field_delimiter

STRING

The separator for fields in a CSV file.

Applies to CSV data.

format

STRING

The format of the external data. Supported values include: AVRO, CSV, DATASTORE_BACKUP, GOOGLE_SHEETS, NEWLINE_DELIMITED_JSON (or JSON), ORC, PARQUET.

The value JSON is equivalent to NEWLINE_DELIMITED_JSON.

hive_partition_uri_prefix

STRING

A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

Example: "gs://bucket/path".

ignore_unknown_values

BOOL

If true, ignore extra values that are not represented in the table schema, without returning an error.

Applies to CSV and JSON data.

max_bad_records

INT64

The maximum number of bad records to ignore when reading the data.

Applies to: CSV, JSON, and Sheets data.

null_marker

STRING

The string that represents NULL values in a CSV file.

Applies to CSV data.

projection_fields

STRING

A list of entity properties to load.

Applies to Datastore data.

quote

STRING

The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set the allow_quoted_newlines property to true.

Applies to CSV data.

require_hive_partition_filter

BOOL

If true, all queries over this table require a partition filter that can be used to eliminate partitions when reading data. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

sheet_range

STRING

Range of a Sheets spreadsheet to query from.

Applies to Sheets data.

Example: “sheet1!A1:B20”,

skip_leading_rows

INT64

The number of rows at the top of a file to skip when reading the data.

Applies to CSV and Sheets data.

uris

ARRAY<STRING>

An array of fully qualified URIs for the external data locations.

Example: ["gs://bucket/path/*"].

The CREATE EXTERNAL TABLE statement does not support creating temporary external tables.

To create an externally partitioned table, use the WITH PARTITION COLUMNS clause to specify the partition schema details. BigQuery validates the column definitions against the external data location. The schema declaration must strictly follow the ordering of the fields in the external path. For more information about external partitioning, see Querying externally partitioned data.

Examples

The following example creates an external table from multiple URIs. The data format is CSV. This example uses schema auto-detection.

CREATE EXTERNAL TABLE dataset.CsvTable OPTIONS (
  format = 'CSV',
  uris = ['gs://bucket/path1.csv', 'gs://bucket/path2.csv']
);

The following example creates an external table from a CSV file and explicitly specifies the schema. It also specifies the field delimeter ('|') and sets the maximum number of bad records allowed.

CREATE OR REPLACE EXTERNAL TABLE dataset.CsvTable
(
  x INT64,
  y STRING
)
OPTIONS (
  format = 'CSV',
  uris = ['gs://bucket/path1.csv'],
  field_delimiter = '|',
  max_bad_records = 5
);

The following example creates an externally partitioned table. It uses schema auto-detection to detect both the file schema and the hive partitioning layout.

For example, if the external path is gs://bucket/path/field_1=first/field_2=1/data.csv, the partition columns would be field_1 (STRING) and field_2 (INT64).

CREATE EXTERNAL TABLE dataset.AutoHivePartitionedTable
WITH PARTITION COLUMNS
OPTIONS (
  uris=['gs://bucket/path/*'],
  format=csv,
  hive_partition_uri_prefix='gs://bucket/path'
);

The following example creates an externally partitioned table by explicitly specifying the partition columns. This example assumes that the external file path has the pattern gs://bucket/path/field_1=first/field_2=1/data.csv.

CREATE EXTERNAL TABLE dataset.CustomHivePartitionedTable
WITH PARTITION COLUMNS (
  field_1 STRING, -- column order must match the external path
  field_2 INT64
)
OPTIONS (
  uris=['gs://bucket/path/*'],
  format=csv,
  hive_partition_uri_prefix='gs://bucket/path'
);

CREATE FUNCTION statement

BigQuery supports user-defined functions (UDFs). A UDF enables you to create a function using a SQL expression or JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value.

UDFs can either be persistent or temporary. You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query. For more information on UDFs, see user-defined functions.

UDF syntax

To create a persistent UDF, use the following syntax:

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
    [[project_name.]dataset_name.]function_name
    ([named_parameter[, ...]])
  [RETURNS data_type]
  { sql_function_definition | javascript_function_definition }

To create a temporary UDF, use the following syntax:

CREATE [OR REPLACE] {TEMPORARY | TEMP} FUNCTION [IF NOT EXISTS]
    function_name
    ([named_parameter[, ...]])
  [RETURNS data_type]
  { sql_function_definition | javascript_function_definition }

named_parameter:
  param_name param_type

sql_function_definition:
  AS (sql_expression)

javascript_function_definition:
  [determinism_specifier]
  LANGUAGE js
  [OPTIONS (library = library_array)]
  AS javascript_code

determinism_specifier:
  { DETERMINISTIC | NOT DETERMINISTIC }

This syntax consists of the following components:

  • CREATE { FUNCTION | OR REPLACE FUNCTION | FUNCTION IF NOT EXISTS }. Creates or updates a function. To replace any existing function with the same name, use the OR REPLACE keyword. To treat the query as successful and take no action if a function with the same name already exists, use the IF NOT EXISTS clause.

  • project_name is the name of the project where you are creating the function. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

  • dataset_name is the name of the dataset where you are creating the function. Defaults to the defaultDataset in the request.

  • named_parameter. Consists of a comma-separated param_name and param_type pair. The value of param_type is a BigQuery data type. For a SQL UDF, the value of param_type can also be ANY TYPE.

  • determinism_specifier. Applies only to JavaScript user-defined functions. Provides a hint to BigQuery as to whether the query result can be cached. Can be one of the following values:

    • DETERMINISTIC: The function always returns the same result when passed the same arguments. The query result is potentially cacheable. For example, if the function add_one(i) always returns i + 1, the function is deterministic.

    • NOT DETERMINISTIC: The function does not always return the same result when passed the same arguments, and therefore is not cacheable. For example, if add_random(i) returns i + rand(), the function is not deterministic and BigQuery will not use cached results.

      If all of the invoked functions are DETERMINISTIC, BigQuery will try to cache the result, unless the results can't be cached for other reasons. For more information, see Using cached query results.

  • [RETURNS data_type]. Specifies the data type that the function returns.

    • If the function is defined in SQL, then the RETURNS clause is optional. If the RETURNS clause is omitted, then BigQuery infers the result type of the function from the SQL function body when a query calls the function.
    • If the function is defined in JavaScript, then the RETURNS clause is required. For more information about allowed values for data_type, see Supported JavaScript UDF data types.
  • AS (sql_expression). Specifies the SQL expression that defines the function.

  • [OPTIONS (library = library_array)]. For a JavaScript UDF, specifies an array of JavaScript libraries to include in the function definition.

  • AS javascript_code. Specifies the definition of a JavaScript function. javascript_code is a string literal.

SQL UDF structure

Create SQL UDFs using the following syntax:

CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS]
    [[`project_name`.]dataset_name.]function_name
    ([named_parameter[, ...]])
  [RETURNS data_type]
  AS (sql_expression)

named_parameter:
  param_name param_type

Templated SQL UDF parameters

A templated parameter with param_type = ANY TYPE can match more than one argument type when the function is called.

  • If more than one paramater has type ANY TYPE, BigQuery does not enforce any type relationship between these arguments.
  • The function return type cannot be ANY TYPE. It must be either omitted, which means to be automatically determined based on sql_expression, or an explicit type.
  • Passing the function arguments of types that are incompatible with the function definition will result in an error at call time.

Default project in SQL UDF body

If the SQL UDF is created in the same project used to run the CREATE FUNCTION statement, the UDF body sql_expression can reference entities without specifying the project; the default project is the project which owns the UDF. Consider the sample query below.

CREATE FUNCTION myProject.myDataset.myFunction() AS (anotherDataset.anotherFunction());

After running the above CREATE FUNCTION query in the project myProject, you can run the query SELECT myProject.myDataset.myFunction(). Regardless of the project you choose to run this SELECT query, the referenced function anotherDataset.anotherFunction is always resolved against project myProject.

If the UDF is not created in the same project used to run the CREATE FUNCTION statement, then all references in the UDF body sql_expression must be qualified with project IDs. For instance, the preceding sample CREATE FUNCTION query is invalid if it runs in a project different from myProject.

SQL UDF examples

The following example creates a persistent SQL UDF. It assumes that a dataset named mydataset exists in the active project. If a dataset with this name does not exist, refer to the documentation on creating datasets.

CREATE FUNCTION mydataset.multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
AS (x * y);

After executing the CREATE FUNCTION statement, you can use the new, persistent user-defined function in a separate query. Replace the query editor with the following contents, and then run the query:

WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x, y, mydataset.multiplyInputs(x, y) as product
FROM numbers;

The example above produces the following output:

+-----+-----+--------------+
| x   | y   | product      |
+-----+-----+--------------+
| 1   | 5   | 5            |
| 2   | 10  | 20           |
| 3   | 15  | 45           |
+-----+-----+--------------+

The following example shows a persistent SQL UDF that uses a templated parameter. The resulting function accepts arguments of various types.

CREATE FUNCTION mydataset.addFourAndDivideAny(x ANY TYPE, y ANY TYPE) AS (
  (x + 4) / y
);

After executing the CREATE FUNCTION statement, you can use the new, persistent user-defined function in a separate query:

SELECT addFourAndDivideAny(3, 4) AS integer_output,
       addFourAndDivideAny(1.59, 3.14) AS floating_point_output;

This query returns the following output:

+----------------+-----------------------+
| integer_output | floating_point_output |
+----------------+-----------------------+
| 1.75           | 1.7802547770700636    |
+----------------+-----------------------+

The following example shows a SQL UDF that uses a templated parameter to return the last element of an array of any type.

CREATE FUNCTION mydataset.lastArrayElement(arr ANY TYPE) AS (
  arr[ORDINAL(ARRAY_LENGTH(arr))]
);

After executing the CREATE FUNCTION statement, you can use the new, persistent user-defined function in a separate query:

SELECT
  names[OFFSET(0)] AS first_name,
  lastArrayElement(names) AS last_name
FROM (
  SELECT ['Fred', 'McFeely', 'Rogers'] AS names UNION ALL
  SELECT ['Marie', 'Skłodowska', 'Curie']
);

The above query returns the following output:

+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Fred       | Rogers    |
| Marie      | Curie     |
+------------+-----------+

JavaScript UDF structure

Create persistent JavaScript UDFs using the following syntax.

CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS]
    [[`project_name`.]dataset_name.]function_name
    ([named_parameter[, ...]])
  RETURNS data_type
  [DETERMINISTIC | NOT DETERMINISTIC]
  LANGUAGE js
  [OPTIONS (library = library_array)]
  AS javascript_code

See Supported JavaScript UDF data types for more information about allowed values for data_type and parameter types.

JavaScript UDF examples

CREATE TEMP FUNCTION mydataset.multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS """
  return x*y;
""";

After executing the CREATE FUNCTION statement, you can use the new, persistent JavaScript UDF in a separate query:

WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;

The above example returns the following output:

+-----+-----+--------------+
| x   | y   | product      |
+-----+-----+--------------+
| 1   | 5   | 5            |
| 2   | 10  | 20           |
| 3   | 15  | 45           |
+-----+-----+--------------+

You can pass the result of a UDF as input to another UDF. For example, create a persistent UDF with the following query:

CREATE FUNCTION mydataset.multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS """
  return x*y;
""";

Then run another query to create a second persistent UDF:

CREATE FUNCTION mydataset.divideByTwo(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS """
  return x/2;
""";

Now run the following query to use both persistent UDFs in the same query:

WITH numbers AS
  (SELECT 1 AS x, 5 as y
  UNION ALL
  SELECT 2 AS x, 10 as y
  UNION ALL
  SELECT 3 as x, 15 as y)
SELECT x,
  y,
  mydataset.multiplyInputs(
    mydataset.divideByTwo(x), mydataset.divideByTwo(y)) as half_product
FROM numbers;

The above example returns the following output:

+-----+-----+--------------+
| x   | y   | half_product |
+-----+-----+--------------+
| 1   | 5   | 1.25         |
| 2   | 10  | 5            |
| 3   | 15  | 11.25        |
+-----+-----+--------------+

The following example sums the values of all fields named "foo" in the given JSON string.

CREATE FUNCTION mydataset.SumFieldsNamedFoo(json_row STRING)
  RETURNS FLOAT64
  LANGUAGE js
  AS """
  function SumFoo(obj) {
    var sum = 0;
    for (var field in obj) {
      if (obj.hasOwnProperty(field) && obj[field] != null) {
        if (typeof obj[field] == "object") {
          sum += SumFoo(obj[field]);
        } else if (field == "foo") {
          sum += obj[field];
        }
      }
    }
    return sum;
  }
  var row = JSON.parse(json_row);
  return SumFoo(row);
  """;

After executing the CREATE FUNCTION statement, you can use the new, persistent user-defined function in a separate query:

WITH Input AS (
  SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
  SELECT NULL, 4 AS foo UNION ALL
  SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
  TO_JSON_STRING(t) AS json_row,
  mydataset.SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;

The above example returns the following output:

+---------------------------------------------------------------------+---------+
| json_row                                                            | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10}       | 14.14   |
| {"s":null,"foo":4}                                                  | 4       |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59    |
+---------------------------------------------------------------------+---------+

Quoting rules

You must enclose JavaScript code in quotes. For simple, one line code snippets, you can use a standard quoted string:

CREATE FUNCTION mydataset.plusOne(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js
AS "return x+1;";

In cases where the snippet contains quotes, or consists of multiple lines, use triple-quoted blocks:

CREATE FUNCTION mydataset.customGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
  var d = new Date();
  if (d.getHours() &lt; 12) {
    return 'Good Morning, ' + a + '!';
  } else {
    return 'Good Evening, ' + a + '!';
  }
  """;

Including JavaScript libraries

You can extend your JavaScript UDFs using the OPTIONS section. This section lets you specify JavaScript code libraries for the UDF.

CREATE FUNCTION mydataset.myFunc(a FLOAT64, b STRING)
  RETURNS STRING
  LANGUAGE js
  OPTIONS (
    library=["gs://my-bucket/path/to/lib1.js", "gs://my-bucket/path/to/lib2.js"]
  )
  AS
  """
      // Assumes 'doInterestingStuff' is defined in one of the library files.
      return doInterestingStuff(a, b);
  """;

SELECT mydataset.myFunc(3.14, 'foo');

In the preceding example, code in lib1.js and lib2.js is available to any code in the javascript_code section of the UDF. Notice that you can specify library files using single-element or array syntax.

UDFs and the Cloud Console

You can use the Cloud Console to create persistent user-defined functions.

Running a query to create a persistent UDF

  1. Go to the BigQuery page in the Cloud Console.

    Go to BigQuery

  2. Click Compose new query.

    Compose new query.

  3. Enter the UDF statement into the Query editor text area. For example:

      CREATE FUNCTION mydataset.timesTwo(x FLOAT64)
      RETURNS FLOAT64
        LANGUAGE js AS """
        return x*2;
      """;
    
  4. Click Run.

  5. After creating the persistent user-defined function, replace the editor contents with a new query that uses it:

      SELECT mydataset.timesTwo(numbers) AS doubles
      FROM UNNEST([1, 2, 3, 4, 5]) AS numbers;
    
  6. Click Run.

UDFs and the bq command-line tool

You can use the bq Command-Line Tool from the Cloud SDK to create persistent UDFs.

Use the following syntax to run a query to create a persistent UDF:

bq query --use_legacy_sql=false '
  CREATE FUNCTION mydataset.AddTwo(x INT64) AS (x + 2);
'

CREATE PROCEDURE statement

Creates a procedure, which is a block of statements that can be called from other queries.

CREATE [OR REPLACE] PROCEDURE [IF NOT EXISTS]
[[project_name.]dataset_name.]procedure_name (procedure_argument[, ...] )
[OPTIONS(procedure_option_list)]
BEGIN
statement_list
END;

procedure_argument: [procedure_argument_mode] argument_name argument_type

procedure_argument_mode: IN | OUT | INOUT

Description

project_name is the name of the project where you are creating the procedure. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset where you are creating the procedure. Defaults to the defaultDataset in the request.

statement_list is a BigQuery statement list. A statement list is a series of statements that each end with a semicolon.

argument_type is any valid BigQuery type.

procedure_argument_mode specifies whether an argument is an input, an output, or both.

Procedures can call themselves recursively.

procedure_option_list

The procedure_option_list lets you specify procedure options. Procedure options have the same syntax and requirements as table options but with a different list of NAMEs and VALUEs:

NAME VALUE Details
strict_mode

BOOL

Example: strict_mode=FALSE

If strict_mode is TRUE, the procedure body will undergo additional checks for errors such as non-existent tables or columns. The CREATE PROCEDURE statement will fail if the body fails any of these checks.

While strict_mode is useful for catching many common types of errors, it is not exhaustive, and successful creation of a procedure with strict_mode does not guarantee that the procedure will successfully execute at runtime.

If strict_mode is FALSE, the procedure body is checked only for syntax. Procedures which invoke themselves recursively should be created with strict_mode=FALSE to avoid errors caused by the procedure not yet existing while it is being validated.

Default value is TRUE.

Argument mode

IN indicates that the argument is only an input to the procedure. You can specify either a variable or a value expression for IN arguments.

OUT indicates that the argument is an output of the procedure. An OUT argument is initialized to NULL when the procedure starts. You must specify a variable for OUT arguments.

INOUT indicates that the argument is both an input to and an output from the procedure. You must specify a variable for INOUT arguments. An INOUT argument can be referenced in the body of a procedure as a variable and assigned new values.

If neither IN, OUT, nor INOUT is specified, the argument is treated as an IN argument.

Variable scope

If a variable is declared outside a procedure, passed as an INOUT or OUT argument to a procedure, and the procedure assigns a new value to that variable, that new value is visible outside of the procedure.

Variables declared in a procedure are not visible outside of the procedure, and vice versa.

An OUT or INOUT argument can be assigned a value using SET, in which case the modified value is visible outside of the procedure. If the procedure exits successfully, then the value of the OUT or INOUT argument is the final value assigned to that INOUT variable.

Temporary tables exist for the duration of the script, so if a procedure creates a temporary table, the caller of the procedure will be able to reference the temporary table as well.

Default project in procedure body

Procedure bodies can reference entities without specifying the project; the default project is the project which owns the procedure, not necessarily the project used to run the CREATE PROCEDURE statement. Consider the sample query below.

CREATE PROCEDURE myProject.myDataset.QueryTable()
BEGIN
  SELECT * FROM anotherDataset.myTable;
END;

After creating the above procedure, you can run the query CALL myProject.myDataset.QueryTable(). Regardless of the project you choose to run this CALL query, the referenced table anotherDataset.myTable is always resolved against project myProject.

Examples

The following example creates a procedure that both takes x as an input argument and returns x as output; because no argument mode is present for the argument delta, it is an input argument. The procedure consists of a block containing a single statement, which assigns the sum of the two input arguments to x.

CREATE PROCEDURE mydataset.AddDelta(INOUT x INT64, delta INT64)
BEGIN
  SET x = x + delta;
END;

The following example calls the AddDelta procedure from the example above, passing it the variable accumulator both times; because the changes to x within AddDelta are visible outside of AddDelta, these procedure calls increment accumulator by a total of 8.

DECLARE accumulator INT64 DEFAULT 0;
CALL mydataset.AddDelta(accumulator, 5);
CALL mydataset.AddDelta(accumulator, 3);
SELECT accumulator;

This returns the following:

+-------------+
| accumulator |
+-------------+
|           8 |
+-------------+

The following example creates the procedure SelectFromTablesAndAppend, which takes target_date as an input argument and returns rows_added as an output. The procedure creates a temporary table DataForTargetDate from a query; then, it calculates the number of rows in DataForTargetDate and assigns the result to rows_added. Next, it inserts a new row into TargetTable, passing the value of target_date as one of the column names. Finally, it drops the table DataForTargetDate and returns rows_added.

CREATE PROCEDURE mydataset.SelectFromTablesAndAppend(
  target_date DATE, OUT rows_added INT64)
BEGIN
  CREATE TEMP TABLE DataForTargetDate AS
  SELECT t1.id, t1.x, t2.y
  FROM dataset.partitioned_table1 AS t1
  JOIN dataset.partitioned_table2 AS t2
  ON t1.id = t2.id
  WHERE t1.date = target_date
    AND t2.date = target_date;

  SET rows_added = (SELECT COUNT(*) FROM DataForTargetDate);

  SELECT id, x, y, target_date  -- note that target_date is a parameter
  FROM DataForTargetDate;

  DROP TABLE DataForTargetDate;
END;

The following example declares a variable rows_added, then passes it as an argument to the SelectFromTablesAndAppend procedure from the previous example, along with the value of CURRENT_DATE; then it returns a message stating how many rows were added.

DECLARE rows_added INT64;
CALL mydataset.SelectFromTablesAndAppend(CURRENT_DATE(), rows_added);
SELECT FORMAT('Added %d rows', rows_added);

ALTER TABLE SET OPTIONS statement

To set the options on a table in BigQuery, use the ALTER TABLE SET OPTIONS DDL statement.

ALTER TABLE [IF EXISTS] [[project_name.]dataset_name.]table_name
SET OPTIONS(table_set_options_list)

Where:

IF EXISTS: If present, the query succeeds when the specified table does not exist. If absent, the query fails when the specified table does not exist.

project_name is the name of the project containing the table you are altering. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the table you are altering. Defaults to the defaultDataset in the request.

table_name is the name of the table you're altering.

table_set_options_list

The option list allows you to set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

partition_expiration_days

FLOAT64

Example: partition_expiration_days=7

This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example: require_partition_filter=true

This property is equivalent to the timePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

kms_key_name

STRING

Example: kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property.

See more details about Protecting data with Cloud KMS keys.

friendly_name

STRING

Example: friendly_name="my_table"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a table that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions. If the constant expression evaluates to null, the corresponding option NAME is ignored.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, and UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting the VALUE replaces the existing value of that option for the table, if there was one. Setting the VALUE to NULL clears the table's value for that option.

Examples

Setting the expiration timestamp and description on a table

The following example sets the expiration timestamp on a table to seven days from the execution time of the ALTER TABLE statement, and sets the description as well:

ALTER TABLE mydataset.mytable
SET OPTIONS (
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY),
  description="Table that expires seven days from now"
)

Setting the require partition filter attribute on a partitioned table

The following example sets the timePartitioning.requirePartitionFilter attribute on a partitioned table:

ALTER TABLE mydataset.mypartitionedtable
SET OPTIONS (require_partition_filter=true)

Queries that reference this table must use a filter on the partitioning column, or else BigQuery returns an error. Setting this option to true can help prevent mistakes in querying more data than intended.

Clearing the expiration timestamp on a table

The following example clears the expiration timestamp on a table so that it will not expire:

ALTER TABLE mydataset.mytable
SET OPTIONS (expiration_timestamp=NULL)

ALTER TABLE ADD COLUMN statement

The ALTER TABLE ADD COLUMN statement adds one or more new columns to an existing table schema. For more information about schema modifications in BigQuery, see Modifying table schemas.

ALTER TABLE [[project_name.]dataset_name.]table_name
ADD COLUMN [IF NOT EXISTS] column_name column_schema [, ...]

Where:

  • project_name is the name of the project containing the table. Defaults to the project that runs this DDL query.

  • dataset_name is the name of the dataset containing the table.

  • table_name is the name of the table to alter. The table must already exist and have a schema.

  • column_name is the name of the column to add.

  • column_schema is the schema of the column. This schema uses the same syntax as the column schema for the CREATE TABLE statement.

You cannot use this statement to create:

  • Partitioned columns.
  • Clustered columns.
  • Nested columns inside existing RECORD fields.

You cannot add a REQUIRED column to an existing table schema. However, you can create a nested REQUIRED column as part of a new RECORD field.

Without the IF NOT EXISTS clause, if the table already contains a column with that name, the statement returns an error. If the IF NOT EXISTS clause is included and the column name already exists, no error is returned, and no action is taken.

The value of the new column for existing rows is set to one of the following:

  • NULL if the new column was added with NULLABLE mode. This is the default mode.
  • An empty ARRAY if the new column was added with REPEATED mode.

Examples

Adding columns

The following example adds the following columns to an existing table named mytable:

  • Column A of type STRING.
  • Column B of type GEOGRAPHY.
  • Column C of type NUMERIC with REPEATED mode.
  • Column D of type DATE with a description.
ALTER TABLE mydataset.mytable
  ADD COLUMN A STRING,
  ADD COLUMN IF NOT EXISTS B GEOGRAPHY,
  ADD COLUMN C ARRAY<NUMERIC>,
  ADD COLUMN D DATE OPTIONS(description="my description")

If any of the columns named A, C, or D already exist, the statement fails. If column B already exists, the statement succeeds because of the IF NOT EXISTS clause.

Adding a RECORD column

The following example adds a column named A of type STRUCT that contains the following nested columns:

  • Column B of type GEOGRAPHY.
  • Column C of type INT64 with REPEATED mode.
  • Column D of type INT64 with REQUIRED mode.
  • Column E of type TIMESTAMP with a description.
ALTER TABLE mydataset.mytable
   ADD COLUMN A STRUCT<
       B GEOGRAPHY,
       C ARRAY<INT64>,
       D INT64 NOT NULL,
       E TIMESTAMP OPTIONS(description="creation time")
       >

The query fails if the table already has a column named A, even if that column does not contain any of the nested columns that are specified.

The new STRUCT named A is nullable, but the nested column D within A is required for any STRUCT values of A.

ALTER VIEW SET OPTIONS statement

To set the options on a view in BigQuery, use the ALTER VIEW SET OPTIONS DDL statement.

ALTER VIEW [IF EXISTS] [[project_name.]dataset_name.]view_name
SET OPTIONS(view_set_options_list)

Where:

IF EXISTS: If present, the query succeeds when the specified view does not exist. If absent, the query fails when the specified view does not exist.

project_name is the name of the project containing the view you are altering. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the view you are altering. Defaults to the defaultDataset in the request.

view_name is the name of the view you're altering.

view_set_options_list

The option list allows you to set view options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a view option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

friendly_name

STRING

Example: friendly_name="my_view"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a view that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions. If the constant expression evaluates to null, the corresponding option NAME is ignored.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, and UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting the VALUE replaces the existing value of that option for the view, if there was one. Setting the VALUE to NULL clears the view's value for that option.

Examples

Setting the expiration timestamp and description on a view

The following example sets the expiration timestamp on a view to seven days from the execution time of the ALTER VIEW statement, and sets the description as well:

ALTER VIEW mydataset.myview
SET OPTIONS (
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY),
  description="View that expires seven days from now"
)

ALTER MATERIALIZED VIEW SET OPTIONS statement

To set the options on a materialized view in BigQuery, use the ALTER MATERIALIZED VIEW SET OPTIONS DDL statement.

ALTER MATERIALIZED VIEW [IF EXISTS] [[project_name.]dataset_name.]materialized_view_name
SET OPTIONS(materialized_view_set_options_list)

Where:

IF EXISTS: If present, the query succeeds when the specified view does not exist. If absent, the query fails when the specified view does not exist.

project_name is the name of the project containing the materialized view you are altering. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the materialized view you are altering. Defaults to the defaultDataset in the request.

materialized_view_name is the name of the materialized view you're altering.

materialized_view_set_options_list

The option list allows you to set materialized view options such as a whether refresh is enabled. the refresh interval, a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a materialized view option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
enable_refresh BOOLEAN

Example: enable_refresh=false

refresh_interval_minutes FLOAT64

Example: refresh_interval_minutes=20

expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

friendly_name

STRING

Example: friendly_name="my_mv"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a materialized view that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

Setting the VALUE replaces the existing value of that option for the materialized view, if there was one. Setting the VALUE to NULL clears the materialized view's value for that option.

Examples

Setting the enable refresh state and refresh interval on a materialized view

The following example enables refresh and sets the refresh interval to 20 minutes on a materialized view:

ALTER MATERIALIZED VIEW mydataset.my_mv
SET OPTIONS (
  enable_refresh=true,
  refresh_interval_minutes=20
)

DROP TABLE statement

To delete a table in BigQuery, use the DROP TABLE DDL statement.

DROP TABLE [IF EXISTS] [[project_name.]dataset_name.]table_name

Where:

IF EXISTS: If present, the query succeeds when the specified table does not exist. If absent, the query fails when the specified table does not exist.

project_name is the name of the project containing the table to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the table to delete. Defaults to the defaultDataset in the request.

table_name: The name of the table to delete.

Examples

Deleting a table

The following example deletes a table named mytable in the mydataset:

DROP TABLE mydataset.mytable

If the table name does not exist in the dataset, the following error is returned:

Error: Not found: Table myproject:mydataset.mytable

Deleting a table only if the table exists

The following example deletes a table named mytable in mydataset only if the table exists. If the table name does not exist in the dataset, no error is returned, and no action is taken.

DROP TABLE IF EXISTS mydataset.mytable

DROP EXTERNAL TABLE statement

The DROP EXTERNAL TABLE statement deletes an external table.

DROP EXTERNAL TABLE [IF EXISTS] [[project_name.]dataset_name.]table_name

Where:

  • project_name is the name of the project containing the table. Defaults to the project that runs this DDL query.

  • dataset_name is the name of the dataset containing the table.

  • table_name is the name of the table to delete.

Without the IF EXISTS clause, if the external table does not exist, the statement returns an error. If the IF EXISTS clause is included and the table does not exist, no error is returned, and no action is taken.

If table_name exists but is not an external table, the statement returns the following error:

Cannot drop table_name which has type TYPE. An external table was expected.

The DROP EXTERNAL statement only removes the external table definition from BigQuery. The data stored in the external location is not affected.

Examples

The following example drops the external table named external_table from the dataset mydataset. It returns an error if the external table does not exist.

DROP EXTERNAL TABLE mydataset.external_table

The following example drops the external table named external_table from the dataset mydataset. If the external table does not exist, no error is returned.

DROP EXTERNAL TABLE IF EXISTS mydataset.external_table

DROP VIEW statement

To delete a view in BigQuery, use the DROP VIEW DDL statement.

DROP VIEW [IF EXISTS] [[project_name.]dataset_name.]view_name

Where:

IF EXISTS: If present, the query succeeds when the specified view does not exist. If absent, the query fails when the specified view does not exist.

project_name is the name of the project containing the view to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the view to delete. Defaults to the defaultDataset in the request.

view_name is the name of the view you're deleting.

Examples

Deleting a view

The following example deletes a view named myview in mydataset:

DROP VIEW mydataset.myview

If the view name does not exist in the dataset, the following error is returned:

Error: Not found: Table myproject:mydataset.myview

Deleting a view only if the view exists

The following example deletes a view named myview in mydataset only if the view exists. If the view name does not exist in the dataset, no error is returned, and no action is taken.

DROP VIEW IF EXISTS mydataset.myview

DROP MATERIALIZED VIEW statement

To delete a materialized view in BigQuery, use the DROP MATERIALIZED VIEW DDL statement.

DROP MATERIALIZED VIEW [IF EXISTS] [[project_name.]dataset_name.]mv_name

Where:

IF EXISTS: If present, the query succeeds when the specified materialized view does not exist. If absent, the query fails when the specified materialized view does not exist.

project_name is the name of the project containing the materialized view to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the materialized view to delete. Defaults to the defaultDataset in the request.

mv_name is the name of the materialized view you're deleting.

Examples

Deleting a materialized view

The following example deletes a materialized view named my_mv in mydataset:

DROP MATERIALIZED VIEW mydataset.my_mv

If the materialized view name does not exist in the dataset, the following error is returned:

Error: Not found: Table myproject:mydataset.my_mv

If you are deleting a materialized view in another project, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view` (including the backticks if project_id contains special characters); for example, `myproject.mydataset.my_mv`.

Deleting a materialized view only if it exists

The following example deletes a materialized view named my_mv in mydataset only if the materialized view exists. If the materialized view name does not exist in the dataset, no error is returned, and no action is taken.

DROP MATERIALIZED VIEW IF EXISTS mydataset.my_mv

If you are deleting a materialized view in another project, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view`, (including the backticks if project_id contains special characters); for example, `myproject.mydataset.my_mv`.

DROP FUNCTION statement

DROP FUNCTION [IF EXISTS] [[project_name.]dataset_name.]function_name

Where:

IF EXISTS: If present, the query succeeds when the specified function does not exist. If absent, the query fails when the specified function does not exist.

project_name is the name of the project containing the function to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the function to delete. Defaults to the defaultDataset in the request.

function_name is the name of the function you're deleting.

Examples

The following example statement deletes the function parseJsonAsStruct contained in the dataset mydataset.

DROP FUNCTION mydataset.parseJsonAsStruct;

The following example statement deletes the function parseJsonAsStruct from the dataset sample_dataset in the project other_project.

DROP FUNCTION `other_project`.sample_dataset.parseJsonAsStruct;

DROP PROCEDURE statement

DROP PROCEDURE [IF EXISTS] [[project_name.]dataset_name.]procedure_name

Where:

IF EXISTS: If present, the query succeeds when the specified procedure does not exist. If absent, the query fails when the specified procedure does not exist.

project_name is the name of the project containing the procedure to delete. Defaults to the project that runs this DDL query. If the project name contains special characters such as colons, it should be quoted in backticks ` (example: `google.com:my_project`).

dataset_name is the name of the dataset containing the procedure to delete. Defaults to the defaultDataset in the request.

procedure_name is the name of the procedure you're deleting.

Examples

The following example statement deletes the procedure myprocedure contained in the dataset mydataset.

DROP PROCEDURE mydataset.myProcedure;

The following example statement deletes the procedure myProcedure from the dataset sample_dataset in the project other_project.

DROP PROCEDURE `other-project`.sample_dataset.myprocedure;