Loading Data with SQL DML

Google BigQuery supports loading data using the insert command from the SQL Data Manipulation Language (DML).

For large databases, load data using a file, instead. The number of INSERT statements that can be processed is limited to 1,000 per day per table. DML statements that modify partitioned tables are not yet supported.

Before you begin

To load data into BigQuery, you must set up billing and ensure that you have write access to the table.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform Console project.

    Go to the Projects page

  3. Enable billing for your project.

    Enable billing

  4. Ensure that you have write access to your table. If you are the owner of the dataset that contains your table, you probably have write access.

    To set write access to a dataset in BigQuery:

    1. Go to the BigQuery web UI.
      Go to the BigQuery web UI
    2. In the navigation, hover on a dataset ID and click the down arrow icon down arrow icon image next to the ID and click Share dataset.
    3. Add a person and give that person edit access, then click Save changes.

This how-to guide uses a MySQL database. If you want to try the steps here on a test database, you can generate one using the populate_db.py script in the sample code.

Creating a SQL file export

To export a MySQL database, use the mysqldump command.

Note that the mysqldump command supports exporting to CSV files using the --tab=dir_name, --fields-terminated-by=,, and --fields-enclosed-by='"' options. You can then load these CSV files directly into BigQuery rather than following this guide.

The following command exports the sample database into a SQL format that the sample code can send to BigQuery.

mysqldump \
    --user=root \
    --password='db-password' \
    --host= \
    --no-create-info \
    --complete-insert \
    sample_db \
    --skip-add-locks > sample_db_export.sql

The --no-create-info flag specifies not to add the statements to create the table. BigQuery doesn't support the SQL data definition language (DDL), so you need to create the schemas using the BigQuery UI or command-line.

The --complete-insert flag will include the column names in the insert statements. Use the same column names in BigQuery as were used in MySQL. The exported SQL file should look something like:

-- MySQL dump 10.13

/*!40000 ALTER TABLE `UserSessions` DISABLE KEYS */;
INSERT INTO `UserSesssions` (`id`, ...) VALUES (1,...,'2016-08-16 06:24:20',...),(2,...

Each INSERT statement appears on its own line.

Creating a BigQuery dataset

First, make a dataset to hold the tables.

Web UI

  1. Open the BigQuery web UI.
  2. Click the arrow next to your project name. Arrow next to project
  3. Click Create new dataset.
  4. Enter a Dataset ID (Example: sample_db).
  5. Click OK.


To use the bq command-line tool to create a dataset, use the bq mk command.

bq mk sample_db

Next, make tables within the dataset you just created.

Web UI

  1. Open the BigQuery web UI.
  2. Hover over your dataset ID.
  3. Click the arrow next to your dataset ID. Arrow next to dataset
  4. Click Create new table.
  5. Select None (create empty table) for the Location.
  6. Choose the Table name to be identical to the name from your exported SQL database.
  7. Under Schema enter a Name for each column in the table and select the appropriate Type.

    • Click Add field and repeat until you have entered all the columns in your table.
  8. Click Create table.

The database created by the sample code has the following tables and columns:

  • Users
    • Name id, Type INTEGER
    • Name date_joined, Type TIMESTAMP
  • UserSessions
    • Name id, Type STRING
    • Name user_id, Type INTEGER
    • Name login_time, Type TIMESTAMP
    • Name logout_time, Type TIMESTAMP
    • Name ip_address, Type STRING


To use the bq command-line tool to create an empty table, use the bq mk command.

To create the Users table for the database created by the sample code

bq mk --schema id:integer,date_joined:timestamp -t sample_db.Users

Create an empty UserSessions table with the command:

bq mk \
    --schema id:string,user_id:integer,login_time:timestamp,logout_time:timestamp,ip_address:string \
    -t sample_db.UserSessions

Loading data using SQL DML

Now that you have an exported database, you can import the rows into BigQuery. To use DML in a query, you must enable standard SQL syntax. This sample runs a standard SQL query for each query line in a SQL file.

Web UI

The BigQuery web UI does not provide a way to load data from a SQL file directly, but you can run individual DML queries by enabling standard SQL in the query options.

See the DML reference page for details.


To run a DML query using the bq command-line tool use the bq query command and specify --nouse_legacy_sql to use standard SQL syntax, which is required to use DML.

Since the INSERT queries may not contain the dataset name in the table identifier, use the --dataset_id parameter to specify the default target dataset.

The following Bash shell commands read all the lines beginning with INSERT and run the queries using the bq command-line tool.

while read -r q; do
    bq query --project_id=my-proj --dataset_id=sample_db --nouse_legacy_sql "$q"
done < <(grep '^INSERT' sample_db_export.sql)

These PowerShell commands also read lines beginning with INSERT and run the queries using the bq command-line tool.

Select-String -pattern '^INSERT' ./sample_db_export.sql |
    %{ bq query --project=my-proj --dataset_id=sample_db --nouse_legacy_sql $_.Line }


import argparse

from google.cloud import bigquery

def insert_sql(project, default_dataset, sql_path):
    """Run all the SQL statements in a SQL file."""

    client = bigquery.Client(project=project)

    with open(sql_path) as f:
        for line in f:
            line = line.strip()

            if not line.startswith('INSERT'):

            print('Running query: {}{}'.format(
                '...' if len(line) > 60 else ''))
            query = client.run_sync_query(line)

            # Set use_legacy_sql to False to enable standard SQL syntax.
            # This is required to use the Data Manipulation Language features.
            # For more information about enabling standard SQL, see:
            # https://cloud.google.com/bigquery/sql-reference/enabling-standard-sql
            query.use_legacy_sql = False
            query.default_dataset = client.dataset(default_dataset)

if __name__ == "__main__":
        parser = argparse.ArgumentParser(
        parser.add_argument('project', help='Google Cloud project name')
                'default_dataset', help='Default BigQuery dataset name')
        parser.add_argument('sql_path', help='Path to SQL file')

        args = parser.parse_args()

        insert_sql(args.project, args.default_dataset, args.sql_path)

Next steps

Send feedback about...

BigQuery Documentation