Manually Changing Table Schemas

This document describes how to manually change the schema definition for existing BigQuery tables. Many schema changes are not natively supported in BigQuery and require manual workarounds. These unsupported schema modifications include:

  • Changing a column's name
  • Changing a column's data type
  • Changing a column's mode (aside from relaxing REQUIRED columns to NULLABLE)
  • Deleting a column

For information on supported schema changes in BigQuery, see Modifying Table Schemas.

Changing a column's name

Renaming a column is not supported by the BigQuery web UI, the command-line tool, or the API. If you attempt to update a table schema using a renamed column, the following error is returned: BigQuery error in update operation: Provided Schema does not match Table [PROJECT_ID]:[DATASET].[TABLE].

There are two ways to manually rename a column:

  • Using a SQL query — Choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
  • Recreating the table — Choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.

Option 1: Using a query

To change a column's name using a SQL query, select all the columns in the table and alias the column you need to rename. You can use the query result to overwrite the existing table or to create a new destination table. When you alias the column with a new name, it must adhere to BigQuery's rules for column names.

Pros

  • Using a query to write the data to a new destination table preserves your original data.
  • If you use the query job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • Renaming a column using a query requires you to scan the entire table — The query charges can be significant if the table is very large.
  • If you write the query results to a new destination table, you incur storage costs for both the old table and the new one (unless you delete the old one).

Option 2: Exporting your data and loading it into a new table

You can also rename a column by exporting your table data to Google Cloud Storage, and then loading the data into a new table with a schema definition containing the correct column name. You can also use the load job to overwrite the existing table.

Pros

  • You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
  • If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
  • You incur costs for storing the exported data in Cloud Storage.

Column alias example

The following example shows a standard SQL query that selects all the data in mytable except for 2 columns that require renaming. An alias is used to generate new names for the 2 columns. column_one is renamed newcolumn_one, and column_two is renamed newcolumn_two. The query result is used to overwrite the existing table.

Web UI

  1. In the BigQuery web UI, click Compose Query.

  2. In the New Query box, enter the following query to select all of the data in mydataset.mytable except for the two columns that need to be renamed. mydataset.mytable is in your default project. The query uses an alias to change the name of column_one to newcolumn_one and to change the name of column_two to newcolumn_two.

       #standardSQL
       SELECT
         * EXCEPT(column_one, column_two),
         column_one AS newcolumn_one, column_two AS newcolumn_two
       FROM
         mydataset.mytable
       

  3. Click Show Options.

  4. In the Destination Table section, click Select Table.

  5. In the Select Destination Table dialog:

    1. For Project, leave the value set to your default project. This is the project that contains mydataset.mytable.

    2. For Dataset, choose mydataset.

    3. In the Table ID field, enter mytable.

    4. Click OK.

  6. In the Destination Table section, for Write Preference, choose Overwrite table. This will overwrite mytable using the query results.

  7. For Processing Location, click Unspecified and choose your data's location. You can leave processing location set to unspecified if your data is in the US or EU multi-region location. When your data is in the US or the EU, the processing location is automatically detected.

  8. Click Run query. When the query job completes, the columns in mytable have new names.

CLI

Enter the following bq query command to select all of the data in mydataset.mytable except for the two columns that need to be renamed. mydataset.mytable is in your default project. The query uses an alias to change the name of column_one to newcolumn_one and to change the name of column_two to newcolumn_two.

Write the query results to mydataset.mytable using the --destination_table flag, and specify the --replace flag to overwrite mytable. Specify the use_legacy_sql=false flag to use standard SQL syntax.

Supply the --location flag and set the value to your location.

bq --location=[LOCATION] query --destination_table mydataset.mytable --replace --use_legacy_sql=false 'SELECT * EXCEPT(column_one, column_two), column_one AS newcolumn_one, column_two AS newcolumn_two FROM mydataset.mytable'

API

To change the name of column_one to newcolumn_one and to change the name of column_two to newcolumn_two, call the jobs.insert method and configure a query job. Specify your region in the location property in the jobReference section.

The SQL query used in the query job would be: SELECT * EXCEPT(column_one, column_two), column_one AS newcolumn_one, column_two AS newcolumn_two FROM mydataset.mytable. This query selects all the data in mytable except for 2 columns that require renaming. An alias is used to generate new names for the 2 columns.

To overwrite mytable with the query results, include mydataset.mytable in the configuration.query.destinationTable property, and specify WRITE_TRUNCATE in the configuration.query.writeDisposition property. To specify a new destination table, enter the table name in the configuration.query.destinationTable property.

Changing a column's data type

Changing a column's data type is not supported by the BigQuery web UI, the command-line tool, or the API. If you attempt to update a table by applying a schema that specifies a new data type for a column, the following error is returned: BigQuery error in update operation: Provided Schema does not match Table [PROJECT_ID]:[DATASET].[TABLE].

There are two ways to manually change a column's data type:

  • Using a SQL query — Choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
  • Recreating the table — Choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.

Option 1: Using a query

Use a SQL query to select all the table data and to cast the relevant column as a different data type. You can use the query results to overwrite the table or to create a new destination table.

Pros

  • Using a query to write the data to a new destination table preserves your original data.
  • If you use the query job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • Changing a column's data type using a query requires you to scan the entire table — The query charges can be significant if the table is very large.
  • If you write the query results to a new destination table, you incur storage costs for both the old table and the new one (unless you delete the old one).

Option 2: Exporting your data and loading it into a new table

You can also change a column's data type by exporting your table data to Google Cloud Storage, and then loading the data into a new table with a schema definition that specifies the correct data type for the column. You can also use the load job to overwrite the existing table.

Pros

  • You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
  • If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
  • You incur costs for storing the exported data in Cloud Storage.

CAST example

The following example shows a standard SQL query that selects all the data from column_two and column_three in mydataset.mytable and casts column_one from DATE to STRING. The query result is used to overwrite the existing table. The overwritten table will store column_one as a STRING data type.

When using CAST, a query can fail if BigQuery is unable to perform the cast. For details on casting rules in standard SQL, see Casting in the functions and operators reference documentation.

Web UI

  1. In the BigQuery web UI, click Compose Query.

  2. In the New Query box, enter the following query to select all of the data from column_two and column_three in mydataset.mytable and to cast column_one from DATE to STRING. The query uses an alias to cast column_one with the same name. mydataset.mytable is in your default project.

       #standardSQL
       SELECT
         column_two, column_three, CAST(column_one AS STRING) AS column_one
       FROM
         mydataset.mytable
       

  3. Click Show Options.

  4. In the Destination Table section, click Select Table.

  5. In the Select Destination Table dialog:

    1. For Project, leave the value set to your default project. This is the project that contains mydataset.mytable.

    2. For Dataset, choose mydataset.

    3. In the Table ID field, enter mytable.

    4. Click OK.

  6. In the Destination Table section, for Write Preference, choose Overwrite table. This will overwrite mytable using the query results.

  7. For Processing Location, click Unspecified and choose your data's location. You can leave processing location set to unspecified if your data is in the US or EU multi-region location. When your data is in the US or the EU, the processing location is automatically detected.

  8. Click Run query. When the query job completes, the data type of column_one is STRING.

CLI

Enter the following bq query command to to select all of the data from column_two and column_three in mydataset.mytable and to cast column_one from DATE to STRING. The query uses an alias to cast column_one with the same name. mydataset.mytable is in your default project.

The query results are written to mydataset.mytable using the --destination_table flag, and the --replace flag is used to overwrite mytable. Specify the use_legacy_sql=false flag to use standard SQL syntax.

Supply the --location flag and set the value to your location.

bq --location=[LOCATION] query --destination_table mydataset.mytable --replace --use_legacy_sql=false 'SELECT column_two, column_three, CAST(column_one AS STRING) AS column_one FROM mydataset.mytable'

API

To select all of the data from column_two and column_three in mydataset.mytable and to cast column_one from DATE to STRING, call the jobs.insert method and configure a query job. Specify your region in the location property in the jobReference section.

The SQL query used in the query job would be: SELECT column_two, column_three, CAST(column_one AS STRING) AS column_one FROM mydataset.mytable. The query uses an alias to cast column_one with the same name.

To overwrite mytable with the query results, include mydataset.mytable in the configuration.query.destinationTable property, and specify WRITE_TRUNCATE in the configuration.query.writeDisposition property.

Changing a column's mode

Currently, the only supported modification you can make to a column's mode is changing it from REQUIRED to NULLABLE. Changing a column's mode from REQUIRED to NULLABLE is also called column relaxation. For information on relaxing REQUIRED columns to NULLABLE, see Relaxing a column's mode.

If you attempt to apply an unsupported change to a column mode, an error like the following is returned. In this example, an attempt was made to change a column mode from NULLABLE to REPEATED: BigQuery error in update operation: Provided Schema does not match Table [PROJECT_ID]:[DATASET].[TABLE]. Field [FIELD] has changed mode from NULLABLE to REPEATED.

Exporting your data and loading it into a new table

You can manually change a column's mode by exporting your table data to Google Cloud Storage, and then loading the data into a new table with a schema definition that specifies the correct mode for the column. You can also use the load job to overwrite the existing table.

Pros

  • You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
  • If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
  • You incur costs for storing the exported data in Cloud Storage.

Deleting a column from a table schema

Deleting a column from an existing table's schema is not supported by the BigQuery web UI, the command-line tool, or the API. If you attempt to update a table by applying a schema that removes a column, the following error is returned: BigQuery error in update operation: Provided Schema does not match Table [PROJECT_ID]:[DATASET].[TABLE].

There are two ways to manually delete a column:

  • Using a SQL query — Choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
  • Recreating the table — Choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.

Option 1: Using a query

Use a SELECT * EXCEPT query that excludes the column (or columns) you want to remove and use the query result to overwrite the table or to create a new destination table.

Pros

  • Using a query to write the data to a new destination table preserves your original data.
  • If you use the query job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • Deleting a column using a query requires you to scan the data in all columns except the one you're removing. The query charges can be significant if the table is very large.
  • If you write the query results to a new destination table, you incur storage costs for both the old table and the new one (unless you delete the old one).

Option 2: Exporting your data and loading it into a new table

You can also remove a column by exporting your table data to Google Cloud Storage, deleting the data corresponding to the column (or columns) you want to remove, and then loading the data into a new table with a schema definition that does not include the removed column(s). You can also use the load job to overwrite the existing table.

Pros

  • You are not charged for the export job or the load job. Currently, BigQuery load and export jobs are free.
  • If you use the load job to overwrite the original table, you incur storage costs for one table instead of two, but you lose the original data.

Cons

  • If you load the data into a new table, you incur storage costs for the original table and the new table (unless you delete the old one).
  • You incur costs for storing the exported data in Cloud Storage.

SELECT * EXCEPT example

The following example shows a standard SQL query that selects all the data from mydataset.mytable except column_two. The query result is used to overwrite the existing table.

Web UI

  1. In the BigQuery web UI, click Compose Query.

  2. In the New Query box, enter the following query to select all the data from mydataset.mytable except column_two. mydataset.mytable is in your default project.

       #standardSQL
       SELECT
         * EXCEPT(column_two)
       FROM
         mydataset.mytable
       

  3. Click Show Options.

  4. In the Destination Table section, click Select Table.

  5. In the Select Destination Table dialog:

    1. For Project, leave the value set to your default project. This is the project that contains mydataset.mytable.

    2. For Dataset, choose mydataset.

    3. In the Table ID field, enter mytable.

    4. Click OK.

  6. In the Destination Table section, for Write Preference, choose Overwrite table. This will overwrite mytable using the query results.

  7. For Processing Location, click Unspecified and choose your data's location. You can leave processing location set to unspecified if your data is in the US or EU multi-region location. When your data is in the US or the EU, the processing location is automatically detected.

  8. Click Run query. When the query job completes, the table now includes all columns except column_two.

CLI

Enter the following bq query command to select all the data from mydataset.mytable except column_two. mydataset.mytable is in your default project. The query results are written to mydataset.mytable using the --destination_table flag, and the --replace flag is used to overwrite mytable. Specify the use_legacy_sql=false flag to use standard SQL syntax.

Supply the --location flag and set the value to your location.

bq --location=[LOCATION] query --destination_table mydataset.mytable --replace --use_legacy_sql=false 'SELECT * EXCEPT(column_two) FROM mydataset.mytable'

API

To select all the data from mydataset.mytable except column_two, call the jobs.insert method and configure a query job. Specify your region in the location property in the jobReference section.

The SQL query used in the query job would be: SELECT * EXCEPT(column_two) FROM mydataset.mytable.

To overwrite mytable with the query results, include mydataset.mytable in the configuration.query.destinationTable property, and specify WRITE_TRUNCATE in the configuration.query.writeDisposition property.

¿Te sirvió esta página? Envíanos tu opinión:

Enviar comentarios sobre…

¿Necesitas ayuda? Visita nuestra página de asistencia.