This page introduces you to updating and deleting data in BigQuery
tables using data manipulation language (DML). This page does not cover adding
rows to existing tables using DML. For information on adding rows using DML, see
INSERT
statement
in the DML syntax reference.
Note that DML in BigQuery has some limitations. DML also has its own pricing.
Updating data
Follow along with the instructions below by using this sample file, which represents a table with an IP address column that you'd like to mask for anonymization:
The following steps load the sample data into a table and update the values
in the ip_address
column:
Step 1. Load the JSON file
into a UserSessions
table.
Step 2. To mask the last octet in every row's ip_address
column, run the
following DML query:
UPDATE sample_db.UserSessions SET ip_address = REGEXP_REPLACE(ip_address, r"(\.[0-9]+)$", ".0") WHERE TRUE
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Deleting data
Follow along with the instructions below by using the sample files, which represent a dataset with multiple tables of analytics for user sessions and a table of users to be deleted.
The following steps load the data into three tables, and then delete the users listed in the DeletedUsers table.
Step 1. Load the JSON files into the DeletedUsers, Users, and UserSessions tables, respectively.
Console
In the Explorer panel, expand your project and select a dataset.
In the details panel, click Create table.
For Create table from, select Upload.
For Select file, browse for and select the file you downloaded.
Select JSON (newline delimited) for the File format.
Choose an appropriate Table name.
Under Schema click Add field and enter a Name for each column in the table and select the appropriate Type.
- Click Add field and repeat until you have entered all the columns in your table.
Click Create table.
The schemas for the sample tables are:
- DeletedUsers
- Name
id
, TypeINTEGER
- Name
- Users
- Name
id
, TypeINTEGER
- Name
date_joined
, TypeTIMESTAMP
- Name
- UserSessions
- Name
id
, TypeSTRING
- Name
user_id
, TypeINTEGER
- Name
login_time
, TypeTIMESTAMP
- Name
logout_time
, TypeTIMESTAMP
- Name
ip_address
, TypeSTRING
- Name
bq
To use the bq
command-line tool to create
the tables, use the bq load
command. Supply the --location
flag
and set the value to your location. The
--location
flag is optional. For example, if you are using
BigQuery in the asia-northeast1
(Tokyo) region, the load
command would look like the following:
bq --location=asia-northeast1 load ...
To create the DeleteUsers
table
bq --location=asia-northeast1 load \
--source_format=NEWLINE_DELIMITED_JSON \
sample_db.DeletedUsers \
deletedUsersData.json \
id:integer
To create the Users
table
bq --location=asia-northeast1 load \
--source_format=NEWLINE_DELIMITED_JSON \
sample_db.Users \
usersData.json \
id:integer,date_joined:timestamp
To create the UserSessions
table
bq --location=asia-northeast1 load \
--source_format=NEWLINE_DELIMITED_JSON \
sample_db.UserSessions \
userSessionsData.json \
id:string,user_id:integer,login_time:timestamp,logout_time:timestamp,ip_address:string
Step 2. To delete the information about the users in the DeletedUsers table. Run the following DML queries:
Delete from
UsersSessions
DELETE FROM sample_db.UserSessions WHERE user_id in (SELECT id from sample_db.DeletedUsers)
Delete from
Users
DELETE FROM sample_db.Users WHERE id in (SELECT id from sample_db.DeletedUsers)
What's next
- See the DML reference page.
- See DML syntax and samples on the DML syntax page.