Updating table data using the Data Manipulation Language

This page introduces you to updating and deleting data in BigQuery tables using DML. This page does not cover adding rows to existing tables using DML. For information on adding rows using DML, see INSERT statement in the DML syntax reference.

Note that DML in BigQuery has some limitations. DML also has its own pricing.

Updating data

Follow along with the instructions below by using this sample file, which represents a table with an IP address column that you'd like to mask for anonymization:

The following steps load the sample data into a table and update the values in the ip_address column:

Step 1. Load the JSON file into a UserSessions table.

Step 2. To mask the last octet in every row's ip_address column, run the following DML query:

UPDATE sample_db.UserSessions
SET ip_address = REGEXP_REPLACE(ip_address, r"(\.[0-9]+)$", ".0")
WHERE TRUE

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FormatOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableDataWriteChannel;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableResult;
import com.google.cloud.bigquery.WriteChannelConfiguration;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.channels.Channels;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.UUID;

// Sample to update data in BigQuery tables using DML query
public class UpdateTableDML {

  public static void runUpdateTableDML() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    updateTableDML(datasetName, tableName);
  }

  public static void updateTableDML(String datasetName, String tableName)
      throws IOException, InterruptedException {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      // Load JSON file into UserSessions table
      TableId tableId = TableId.of(datasetName, tableName);

      WriteChannelConfiguration writeChannelConfiguration =
          WriteChannelConfiguration.newBuilder(tableId)
              .setFormatOptions(FormatOptions.json())
              .build();

      // Imports a local JSON file into a table.
      Path jsonPath =
          FileSystems.getDefault().getPath("src/test/resources", "userSessionsData.json");

      // The location and JobName must be specified; other fields can be auto-detected.
      String jobName = "jobId_" + UUID.randomUUID().toString();
      JobId jobId = JobId.newBuilder().setLocation("us").setJob(jobName).build();

      try (TableDataWriteChannel writer = bigquery.writer(jobId, writeChannelConfiguration);
          OutputStream stream = Channels.newOutputStream(writer)) {
        Files.copy(jsonPath, stream);
      }

      // Get the Job created by the TableDataWriteChannel and wait for it to complete.
      Job job = bigquery.getJob(jobId);
      Job completedJob = job.waitFor();
      if (completedJob == null) {
        System.out.println("Job not executed since it no longer exists.");
        return;
      } else if (completedJob.getStatus().getError() != null) {
        System.out.println(
            "BigQuery was unable to load local file to the table due to an error: \n"
                + job.getStatus().getError());
        return;
      }

      System.out.println(
          job.getStatistics().toString() + " userSessionsData json uploaded successfully");

      // Write a DML query to modify UserSessions table
      // To create DML query job to mask the last octet in every row's ip_address column
      String dmlQuery =
          String.format(
              "UPDATE `%s.%s` \n"
                  + "SET ip_address = REGEXP_REPLACE(ip_address, r\"(\\.[0-9]+)$\", \".0\")\n"
                  + "WHERE TRUE",
              datasetName, tableName);

      QueryJobConfiguration dmlQueryConfig = QueryJobConfiguration.newBuilder(dmlQuery).build();

      // Execute the query.
      TableResult result = bigquery.query(dmlQueryConfig);

      // Print the results.
      result.iterateAll().forEach(rows -> rows.forEach(row -> System.out.println(row.getValue())));

      System.out.println("Table updated successfully using DML");
    } catch (BigQueryException e) {
      System.out.println("Table update failed \n" + e.toString());
    }
  }
}

Deleting data

Follow along with the instructions below by using the sample files, which represent a dataset with multiple tables of analytics for user sessions and a table of users to be deleted.

The following steps load the data into three tables, and then delete the users listed in the DeletedUsers table.

Step 1. Load the JSON files into the DeletedUsers, Users, and UserSessions tables, respectively.

Console

  1. Open the Cloud Console.
  2. Select your dataset from the Resources list.
  3. Click Create table.

    Create table

  4. For Create table from select Upload.

  5. For Select file, browse for and select the file you downloaded.

    Browse files

  6. Select JSON (newline delimited) for the File format.

  7. Choose an appropriate Table name.

  8. Under Schema click Add field and enter a Name for each column in the table and select the appropriate Type.

    • Click Add field and repeat until you have entered all the columns in your table.
  9. Click Create table.

The schemas for the sample tables are:

  • DeletedUsers
    • Name id, Type INTEGER
  • Users
    • Name id, Type INTEGER
    • Name date_joined, Type TIMESTAMP
  • UserSessions
    • Name id, Type STRING
    • Name user_id, Type INTEGER
    • Name login_time, Type TIMESTAMP
    • Name logout_time, Type TIMESTAMP
    • Name ip_address, Type STRING

bq

To use the bq command-line tool to create the tables, use the bq load command. Supply the --location flag and set the value to your location. The --location flag is optional. For example, if you are using BigQuery in the asia-northeast1 (Tokyo) region, the load command would look like the following:

bq --location=asia-northeast1 load ...

To create the DeleteUsers table

bq --location=asia-northeast1 load \
--source_format=NEWLINE_DELIMITED_JSON \
sample_db.DeletedUsers \
deletedUsersData.json \
id:integer

To create the Users table

bq --location=asia-northeast1 load \
--source_format=NEWLINE_DELIMITED_JSON \
sample_db.Users \
usersData.json \
id:integer,date_joined:timestamp

To create the UserSessions table

bq --location=asia-northeast1 load \
--source_format=NEWLINE_DELIMITED_JSON \
sample_db.UserSessions \
userSessionsData.json \
id:string,user_id:integer,login_time:timestamp,logout_time:timestamp,ip_address:string

Step 2. To delete the information about the users in the DeletedUsers table. Run the following DML queries:

  • Delete from UsersSessions

    DELETE FROM sample_db.UserSessions
    WHERE user_id in (SELECT id from sample_db.DeletedUsers)
    
  • Delete from Users

    DELETE FROM sample_db.Users
    WHERE id in (SELECT id from sample_db.DeletedUsers)
    

What's next