Updating Table Data Using the Data Manipulation Language

This page introduces you to updating and deleting data in BigQuery tables using DML. This page does not cover adding rows to existing tables using DML. For information on adding rows using DML, see INSERT statement in the DML syntax reference.

Note that DML in BigQuery has some limitations and known issues. DML also has its own quotas and pricing.

Updating data

Follow along with the instructions below by using this sample file, which represents a table with an IP address column that you'd like to mask for anonymization:

The following steps load the sample data into a table and update the values in the ip_address column:

Step 1. Load the JSON file into a UserSessions table.

Step 2. To mask the last octet in every row's ip_address column, run the following DML query:

#standardSQL
UPDATE sample_db.UserSessions
SET ip_address = REGEXP_REPLACE(ip_address, r"(\.[0-9]+)$", ".0")
WHERE TRUE

Deleting data

Follow along with the instructions below by using the sample files, which represent a dataset with multiple tables of analytics for user sessions and a table of users to be deleted.

The following steps load the data into three tables, and then delete the users listed in the DeletedUsers table.

Step 1. Load the JSON files into the DeletedUsers, Users, and UserSessions tables, respectively.

Web UI

  1. Open the Google BigQuery web UI.
  2. Hover over your dataset ID.
  3. Click the arrow next to your dataset ID. Arrow next to dataset
    name
  4. Click Create new table.
  5. Select File upload for the Location.
  6. Select JSON (newline delimited) for the File format.
  7. Choose the Table name.
  8. Under Schema enter a Name for each column in the table and select the appropriate Type.

    • Click Add field and repeat until you have entered all the columns in your table.
  9. Click Create table.

The schemas for the sample tables are:

  • DeletedUsers
    • Name id, Type INTEGER
  • Users
    • Name id, Type INTEGER
    • Name date_joined, Type TIMESTAMP
  • UserSessions
    • Name id, Type STRING
    • Name user_id, Type INTEGER
    • Name login_time, Type TIMESTAMP
    • Name logout_time, Type TIMESTAMP
    • Name ip_address, Type STRING

Command-line

To use the bq command-line tool to create the tables, use the bq load command. Supply the --location flag and set the value to your location. The --location flag is optional if your data is in the US or the EU multi-region location. For example, if you are using BigQuery in the asia-northeast1 (Tokyo) region, the load command would look like the following:

bq --location=asia-northeast1 load ...

To create the DeleteUsers table

bq --location=asia-northeast1 load --source_format=NEWLINE_DELIMITED_JSON \
    sample_db.DeletedUsers \
    deletedUsersData.json \
    id:integer

To create the Users table

bq --location=asia-northeast1 load --source_format=NEWLINE_DELIMITED_JSON \
    sample_db.Users \
    usersData.json \
    id:integer,date_joined:timestamp

To create the UserSessions table

bq --location=asia-northeast1 load --source_format=NEWLINE_DELIMITED_JSON \
    sample_db.UserSessions \
    userSessionsData.json \
    id:string,user_id:integer,login_time:timestamp,logout_time:timestamp,ip_address:string

Step 2. To delete the information about the users in the DeletedUsers table. Run the following DML queries:

  • Delete from UsersSessions

    #standardSQL
    DELETE FROM sample_db.UserSessions
    WHERE user_id in (SELECT id from sample_db.DeletedUsers)
    

  • Delete from Users

    #standardSQL
    DELETE FROM sample_db.Users
    WHERE id in (SELECT id from sample_db.DeletedUsers)
    

What's next

Envoyer des commentaires concernant…