This document describes how to manually change the schema definition for existing BigQuery tables. Many schema changes are not natively supported in BigQuery and require manual workarounds. These unsupported schema modifications include the following:
- Changing a column's name.
- Changing a column's data type.
- Changing a column's mode (aside from relaxing
REQUIRED
columns toNULLABLE
). - Deleting a column.
For more information about supported schema changes in BigQuery, see Modifying table schemas.
Changing a column's name
Renaming a column is not supported by the Cloud Console, the bq
command-line tool,
or the API. If you attempt to update a table schema using a renamed column, the
following error is returned:
BigQuery error in update operation: Provided Schema does not match Table
project_id:dataset.table
.
There are two ways to manually rename a column:
- Using a SQL query: choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
- Recreating the table: choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.
Option 1: Using a query
To change a column's name using a SQL query, select all the columns in the table and alias the column you need to rename. You can use the query result to overwrite the existing table or to create a new destination table. When you alias the column with a new name, it must adhere to BigQuery's rules for column names.
Pros
- Using a query to write the data to a new destination table preserves your original data.
- If you use the query job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- Renaming a column using a query requires that you scan the entire table. The query charges can be significant if the table is very large.
- If you write the query results to a new destination table, you incur storage costs for both the old table and the new one (unless you delete the old one).
Option 2: Exporting your data and loading it into a new table
You can also rename a column by exporting your table data to Cloud Storage, and then loading the data into a new table with a schema definition containing the correct column name. You can also use the load job to overwrite the existing table.
Pros
- You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
- If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
- You incur costs for storing the exported data in Cloud Storage.
Column alias example
The following example shows a standard SQL query that selects all the data in
mytable
except for 2 columns that require renaming. An alias is used to
generate new names for the 2 columns. column_one
is renamed newcolumn_one
,
and column_two
is renamed newcolumn_two
. The query result is used to
overwrite the existing table.
Console
In the Cloud Console, select the Query editor.
In the Query editor, enter the following query to select all of the data in
mydataset.mytable
except for the two columns that need to be renamed.mydataset.mytable
is in your default project. The query uses an alias to change the name ofcolumn_one
tonewcolumn_one
and to change the name ofcolumn_two
tonewcolumn_two
.SELECT * EXCEPT(column_one, column_two), column_one AS newcolumn_one, column_two AS newcolumn_two FROM mydataset.mytable
Click More and select Query settings.
In the Destination section, check Set a destination table for query results.
In the fields below:
For Project name, leave the value set to your default project. This is the project that contains
mydataset.mytable
.For Dataset name, choose
mydataset
.In the Table name field, enter
mytable
.Click OK.
In the Destination table write preference section, for Write Preference, choose Overwrite table. This will overwrite
mytable
using the query results.(Optional) For Processing location, choose your data's location.
Click Save to update the settings, then in the Query editor click Run. When the query job completes, the columns in
mytable
have new names.
bq
Enter the following bq query
command to select all of the data in
mydataset.mytable
except for the two columns that need to be renamed.
mydataset.mytable
is in your default project. The query uses an alias to
change the name of column_one
to newcolumn_one
and to change the name of
column_two
to newcolumn_two
.
Write the query results to
mydataset.mytable
using the --destination_table
flag, and specify the
--replace
flag to overwrite mytable
. Specify the use_legacy_sql=false
flag to use standard SQL syntax.
(Optional) Supply the --location
flag and set the value to your
location.
bq query \
--destination_table mydataset.mytable \
--replace \
--use_legacy_sql=false \
'SELECT
* EXCEPT(column_one,
column_two),
column_one AS newcolumn_one,
column_two AS newcolumn_two
FROM
mydataset.mytable'
API
To change the name of column_one
to newcolumn_one
and to change the name
of column_two
to newcolumn_two
, call the
jobs.insert
method and
configure a query
job. (Optional) Specify your location in the location
property in the jobReference
section.
The SQL query used in the query job would be: SELECT * EXCEPT(column_one,
column_two), column_one AS newcolumn_one, column_two AS newcolumn_two FROM
mydataset.mytable
. This query selects all the data in
mytable
except for 2 columns that require renaming. An alias is used to
generate new names for the 2 columns.
To overwrite mytable
with the query results, include mydataset.mytable
in the configuration.query.destinationTable
property, and specify
WRITE_TRUNCATE
in the configuration.query.writeDisposition
property. To
specify a new destination table, enter the table name in the
configuration.query.destinationTable
property.
Changing a column's data type
Changing a column's data type isn't supported by the Cloud Console, the
bq
command-line tool, or the API. If you attempt to update a table by applying a schema
that specifies a new data type for a column, the following error is returned:
BigQuery error in update operation: Provided
Schema does not match Table project_id:dataset.table.
There are two ways to manually change a column's data type:
- Using a SQL query: choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
- Recreating the table: choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.
Option 1: Using a query
Use a SQL query to select all the table data and to cast the relevant column as a different data type. You can use the query results to overwrite the table or to create a new destination table.
Pros
- Using a query to write the data to a new destination table preserves your original data.
- If you use the query job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- Changing a column's data type using a query requires you to scan the entire table. The query charges can be significant if the table is very large.
- If you write the query results to a new destination table, you incur storage costs for both the old table and the new one (unless you delete the old one).
Option 2: Exporting your data and loading it into a new table
You can also change a column's data type by exporting your table data to Cloud Storage, and then loading the data into a new table with a schema definition that specifies the correct data type for the column. You can also use the load job to overwrite the existing table.
Pros
- You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
- If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
- You incur costs for storing the exported data in Cloud Storage.
CAST
example
The following example shows a standard SQL query that selects all the data from
column_two
and column_three
in mydataset.mytable
and casts column_one
from DATE
to STRING
. The query result is used to overwrite the existing
table. The overwritten table will store column_one
as a STRING
data type.
When using CAST
, a query can fail if BigQuery is unable to
perform the cast. For details on casting rules in standard SQL, see
Casting
in the functions and operators reference documentation.
Console
In the Cloud Console, click Compose new query.
In the Query editor, enter the following query to select all of the data from
column_two
andcolumn_three
inmydataset.mytable
and to castcolumn_one
fromDATE
toSTRING
. The query uses an alias to castcolumn_one
with the same name.mydataset.mytable
is in your default project.SELECT column_two, column_three, CAST(column_one AS STRING) AS column_one FROM mydataset.mytable
Click More and select Query settings.
In the Destination section, check Set a destination table for query results.
In the fields below:
For Project name, leave the value set to your default project. This is the project that contains
mydataset.mytable
.For Dataset name, choose
mydataset
.In the Table name field, enter
mytable
.Click OK.
In the Destination table write preference section, for Write Preference, choose Overwrite table. This will overwrite
mytable
using the query results.(Optional) For Processing location, click Auto-select and choose your data's location.
Click Save to update the settings, then in the Query editor click Run. When the query job completes, the data type of
column_one
isSTRING
.
bq
Enter the following bq query
command to select all of the data from
column_two
and column_three
in mydataset.mytable
and to cast
column_one
from DATE
to STRING
. The query uses an alias to cast
column_one
with the same name. mydataset.mytable
is in your default
project.
The query results are written to mydataset.mytable
using the
--destination_table
flag, and the --replace
flag is used to overwrite
mytable
. Specify the use_legacy_sql=false
flag to use standard SQL
syntax.
(Optional) Supply the --location
flag and set the value to your
location.
bq query \
--destination_table mydataset.mytable \
--replace \
--use_legacy_sql=false \
'SELECT
column_two,
column_three,
CAST(column_one AS STRING) AS column_one
FROM
mydataset.mytable'
API
To select all of the data from column_two
and column_three
in
mydataset.mytable
and to cast column_one
from DATE
to STRING
, call
the jobs.insert
method and configure a query
job. (Optional) Specify your location in the
location
property in the jobReference
section.
The SQL query used in the query job would be: SELECT column_two,
column_three, CAST(column_one AS STRING) AS column_one FROM
mydataset.mytable
. The query uses an alias to cast column_one
with the
same name.
To overwrite mytable
with the query results, include mydataset.mytable
in the configuration.query.destinationTable
property, and specify
WRITE_TRUNCATE
in the configuration.query.writeDisposition
property.
Changing a column's mode
Currently, the only supported modification you can make to a column's mode is
changing it from REQUIRED
to NULLABLE
. Changing a column's mode from
REQUIRED
to NULLABLE
is also called column relaxation. For information on
relaxing REQUIRED
columns to NULLABLE
, see
Relaxing a column's mode.
If you attempt to apply an unsupported change to a column mode, an error like
the following is returned. In this example, an attempt was made to change a
column mode from NULLABLE
to REPEATED
: BigQuery error in update
operation: Provided Schema does not match Table
project_id:dataset.table. Field
field has changed mode from NULLABLE to REPEATED.
Exporting your data and loading it into a new table
You can manually change a column's mode by exporting your table data to Cloud Storage, and then loading the data into a new table with a schema definition that specifies the correct mode for the column. You can also use the load job to overwrite the existing table.
Pros
- You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
- If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
- You incur costs for storing the exported data in Cloud Storage.
Deleting a column from a table schema
Deleting a column from an existing table's schema isn't supported by the
Cloud Console, the bq
command-line tool, or the API. If you attempt to update a
table by applying a schema that removes a column, the following error is
returned: BigQuery error in update operation: Provided Schema does not
match Table project_id:dataset.table.
There are two ways to manually delete a column:
- Using a SQL query: choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
- Recreating the table: choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.
Option 1: Using a query
Use a SELECT * EXCEPT
query that excludes the column (or columns) you want to
remove and use the query result to overwrite the table or to
create a new destination table.
Pros
- Using a query to write the data to a new destination table preserves your original data.
- If you use the query job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- Deleting a column using a query requires you to scan the data in all columns except the one you're removing. The query charges can be significant if the table is very large.
- If you write the query results to a new destination table, you incur storage costs for both the old table and the new one (unless you delete the old one).
Option 2: Exporting your data and loading it into a new table
You can also remove a column by exporting your table data to Cloud Storage, deleting the data corresponding to the column (or columns) you want to remove, and then loading the data into a new table with a schema definition that does not include the removed column(s). You can also use the load job to overwrite the existing table.
Pros
- You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
- If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.
Cons
- If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
- You incur costs for storing the exported data in Cloud Storage.
SELECT * EXCEPT
example
The following example shows a standard SQL query that selects all the data from
mydataset.mytable
except column_two
. The query result is used to overwrite
the existing table.
Console
In the Cloud Console, click Compose new query.
In the Query editor, enter the following query to select all the data from
mydataset.mytable
exceptcolumn_two
.mydataset.mytable
is in your default project.SELECT * EXCEPT(column_two) FROM mydataset.mytable
Click More and select Query settings.
In the Destination section, check Set a destination table for query results.
In the fields below:
For Project name, leave the value set to your default project. This is the project that contains
mydataset.mytable
.For Dataset name, choose
mydataset
.In the Table name field, enter
mytable
.Click OK.
In the Destination table write preference section, for Write Preference, choose Overwrite table. This will overwrite
mytable
using the query results.(Optional) For Processing location, click Auto-select and choose your data's location.
Click Save to update the settings, then in the Query editor click Run. When the query job completes, the table now includes all columns except
column_two
.
bq
Enter the following bq query
command to select all the data from
mydataset.mytable
except column_two
. mydataset.mytable
is in your
default project. The query results are written to mydataset.mytable
using
the --destination_table
flag, and the --replace
flag is used to
overwrite mytable
. Specify the use_legacy_sql=false
flag to use standard
SQL syntax.
(Optional) Supply the --location
flag and set the value to your
location.
bq query \
--destination_table mydataset.mytable \
--replace \
--use_legacy_sql=false \
'SELECT
* EXCEPT(column_two)
FROM
mydataset.mytable'
API
To select all the data from mydataset.mytable
except column_two
, call
the jobs.insert
method and
configure a query
job. (Optional) Specify your location in the location
property in the jobReference
section.
The SQL query used in the query job would be: SELECT * EXCEPT(column_two)
FROM mydataset.mytable
.
To overwrite mytable
with the query results, include mydataset.mytable
in the configuration.query.destinationTable
property, and specify
WRITE_TRUNCATE
in the configuration.query.writeDisposition
property.