Running interactive and batch query jobs

This document describes how to run interactive (on-demand) and batch query jobs.

Required permissions

At a minimum, to run a query job, you must be granted bigquery.jobs.create permissions. In order for the query job to complete successfully, you must also be granted access to the tables or views referenced by the query. Access to the tables or views can be granted at the following levels, listed in order of range of resources allowed (largest to smallest):

The following predefined IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to query tables and views in the dataset.

For more information on IAM roles in BigQuery, see Predefined roles and permissions.

Running interactive queries

By default, BigQuery runs interactive (on-demand) query jobs, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit.

Query results are always saved to either a temporary or permanent table. You can choose whether to append or overwrite data in an existing table or whether to create a new table if none exists with the same name.

To run an interactive query that writes to a temporary table:

Console

  1. Open the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. Click Compose new query.

    Compose new query

  3. Enter a valid BigQuery SQL query in the Query editor text area.

  4. (Optional) To change the data processing location, click More, then Query settings. Under Processing location, click Auto-select and choose your data's location. Finally, click Save to update the query settings.

  5. Click Run.

This creates a query job that writes the output to a temporary table.

bq

Enter the bq query command and include your query text.

(Optional) Supply the --location flag and set the value to your location.

You can specify the following optional flags. This list includes some of the most common flags. For a complete list of query command flags, see bq query in the bq command-line tool reference.

Specify the:

  • --destination_table flag to create a permanent table based on the query results. To write the query results to a table that is not in your default project, add the project ID to the dataset name in the following format: project_id:dataset. If --destination_table is unspecified, a query job is generated that writes the output to a temporary (cache) table.
  • --append_table flag to append the query results to a destination table.
  • --destination_kms_key flag to use a Cloud Key Management Service key to encrypt the destination table data.
  • --use_legacy_sql=false flag to use standard SQL syntax. You can set a default syntax for the bq command-line tool using the .bigqueryrc file.
  • --label flag to apply a label to the query job in the form key:value. Repeat this flag to specify multiple labels.
  • --max_rows or -n flag to specify the number of rows to return in the query results.
  • --maximum_bytes_billed flag to limit the bytes billed for the query. If the query goes beyond the limit, it fails (without incurring a charge). If not specified, the bytes billed is set to the project default.
  • --udf_resource flag to load and evaluate a code file to be used as a user-defined function resource. You can specify a Cloud Storage URI or the path to a local code file. Repeat this flag to specify multiple files.

Enter the following command to run an interactive query using standard SQL syntax:

bq --location=location query \
--use_legacy_sql=false \
'query'

Where:

  • location is the name of the location where the query is processed. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • query is a query in standard SQL syntax.

Examples:

Enter the following command to write interactive query results to a destination table named mytable in mydataset. The dataset is in your default project. The query retrieves data from a the USA Name Data public dataset.

bq query \
--destination_table mydataset.mytable \
--use_legacy_sql=false \
'SELECT
  name,
  number
FROM
  `bigquery-public-data.usa_names.usa_1910_current`
WHERE
  gender = "M"
ORDER BY
  number DESC'

Enter the following command to write interactive query results to a destination table named mytable in mydataset. The dataset is in myotherproject, not your default project. The query retrieves data from a non-partitioned table — the USA Name Data public dataset.

bq query \
--destination_table myotherproject:mydataset.mytable \
--use_legacy_sql=false \
'SELECT
  name,
  number
FROM
  `bigquery-public-data.usa_names.usa_1910_current`
WHERE
  gender = "M"
ORDER BY
  number DESC'

API

To run a query using the API, insert a new job and populate the jobs#configuration.query property. Specify your location in the location property in the jobReference section of the job resource.

Poll for results by calling getQueryResults. Poll until jobComplete equals true. Check for errors and warnings in the errors list.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery C# API reference documentation.


using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryQuery
{
    public void Query(
        string projectId = "your-project-id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        string query = @"
            SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013`
            WHERE state = 'TX'
            LIMIT 100";
        BigQueryJob job = client.CreateQueryJob(
            sql: query,
            parameters: null,
            options: new QueryOptions { UseQueryCache = false });
        // Wait for the job to complete.
        job.PollUntilCompleted();
        // Display the results
        foreach (BigQueryRow row in client.GetQueryResults(job.Reference))
        {
            Console.WriteLine($"{row["name"]}");
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

// queryBasic demonstrates issuing a query and reading results.
func queryBasic(w io.Writer, projectID string) error {
	// projectID := "my-project-id"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	q := client.Query(
		"SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` " +
			"WHERE state = \"TX\" " +
			"LIMIT 100")
	// Location must match that of the dataset(s) referenced in the query.
	q.Location = "US"
	// Run the query and print results when the query job is completed.
	job, err := q.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}
	if err := status.Err(); err != nil {
		return err
	}
	it, err := job.Read(ctx)
	for {
		var row []bigquery.Value
		err := it.Next(&row)
		if err == iterator.Done {
			break
		}
		if err != nil {
			return err
		}
		fmt.Fprintln(w, row)
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;

public class SimpleQuery {

  public static void runSimpleQuery() {
    // TODO(developer): Replace this query before running the sample.
    String query = "SELECT corpus FROM `bigquery-public-data.samples.shakespeare` GROUP BY corpus;";
    simpleQuery(query);
  }

  public static void simpleQuery(String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      // Create the query job.
      QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();

      // Execute the query.
      TableResult result = bigquery.query(queryConfig);

      // Print the results.
      result.iterateAll().forEach(rows -> rows.forEach(row -> System.out.println(row.getValue())));

      System.out.println("Query ran successfully");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Query did not run \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library using default credentials
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
async function query() {
  // Queries the U.S. given names dataset for the state of Texas.

  const query = `SELECT name
    FROM \`bigquery-public-data.usa_names.usa_1910_2013\`
    WHERE state = 'TX'
    LIMIT 100`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
    // Location must match that of the dataset(s) referenced in the query.
    location: 'US',
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);
  console.log(`Job ${job.id} started.`);

  // Wait for the query to finish
  const [rows] = await job.getQueryResults();

  // Print the results
  console.log('Rows:');
  rows.forEach(row => console.log(row));
}

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery PHP API reference documentation.

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Core\ExponentialBackoff;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $query = 'SELECT id, view_count FROM `bigquery-public-data.stackoverflow.posts_questions`';

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$jobConfig = $bigQuery->query($query);
$job = $bigQuery->startQuery($jobConfig);

$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
    print('Waiting for job to complete' . PHP_EOL);
    $job->reload();
    if (!$job->isComplete()) {
        throw new Exception('Job has not yet completed', 500);
    }
});
$queryResults = $job->queryResults();

$i = 0;
foreach ($queryResults as $row) {
    printf('--- Row %s ---' . PHP_EOL, ++$i);
    foreach ($row as $column => $value) {
        printf('%s: %s' . PHP_EOL, $column, json_encode($value));
    }
}
printf('Found %s row(s)' . PHP_EOL, $i);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Ruby API reference documentation.

require "google/cloud/bigquery"

def query
  bigquery = Google::Cloud::Bigquery.new
  sql = "SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` " \
        "WHERE state = 'TX' " \
        "LIMIT 100"

  # Location must match that of the dataset(s) referenced in the query.
  results = bigquery.query sql do |config|
    config.location = "US"
  end

  results.each do |row|
    puts row.inspect
  end
end

Running batch queries

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf, and starts the query as soon as idle resources are available in the BigQuery shared resource pool. This usually occurs within a few minutes. If BigQuery hasn't started the query within 24 hours, BigQuery changes the job priority to interactive.

Batch queries don't count towards your concurrent rate limit, which can make it easier to start many queries at once. Batch queries use the same resources as interactive (on-demand) queries. If you use flat-rate pricing batch queries and interactive queries share your allocated slots.

To run a batch query:

Console

  1. Open the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. Click the Compose new query button.

    Compose new query

  3. Enter a valid SQL query in the Query editor text area.

  4. Click the More button, then Query settings.

    Query settings

  5. Select the Batch option in the Job priority section.

    Batch execution

  6. (Optional) For Processing location, click Unspecified and choose your data's location.

  7. Click Save to update the query settings.

  8. Click Run.

bq

Enter the bq query command and include your query text. Specify the -- batch flag to run a batch query.

(Optional) Supply the --location flag and set the value to your location.

You can specify the following optional flags. This list includes some of the most common flags. For a complete list of query command flags, see bq query in the bq command-line tool reference.

Specify the:

  • --destination_table flag to create a permanent table based on the query results. To write the query results to a table that is not in your default project, add the project ID to the dataset name in the following format: project_id:dataset. If --destination_table is unspecified, a query job is generated that writes the output to a temporary (cache) table.
  • --append_table flag to append the query results to a destination table.
  • --destination_kms_key flag to use a Cloud Key Management Service key to encrypt the destination table data.
  • --use_legacy_sql=false flag to use standard SQL syntax. You can set a default syntax for the bq command-line tool using the .bigqueryrc file.
  • --label flag to apply a label to the query job in the form key:value. Repeat this flag to specify multiple labels.
  • --max_rows or -n flag to specify the number of rows to return in the query results.
  • --maximum_bytes_billed flag to limit the bytes billed for the query. If the query goes beyond the limit, it fails (without incurring a charge). If not specified, the bytes billed is set to the project default.
  • --udf_resource flag to load and evaluate a code file to be used as a user-defined function resource. You can specify a Cloud Storage URI or the path to a local code file. Repeat this flag to specify multiple files.

Enter the following command to run a batch query using standard SQL syntax:

bq --location=location query \
--batch \
--use_legacy_sql=false \
'query'

Where:

  • location is the name of the location where the query is processed. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • query is a query in standard SQL syntax.

Examples:

Enter the following command to write batch query results to a destination table named mytable in mydataset. The dataset is in your default project. The query retrieves data from a the USA Name Data public dataset.

bq query \
--batch \
--destination_table mydataset.mytable \
--use_legacy_sql=false \
'SELECT
  name,
  number
FROM
  `bigquery-public-data.usa_names.usa_1910_current`
WHERE
  gender = "M"
ORDER BY
  number DESC'

Enter the following command to write batch query results to a destination table named mytable in mydataset. The dataset is in myotherproject, not your default project. The query retrieves data from a non-partitioned table — the USA Name Data public dataset.

bq query \
--batch \
--destination_table myotherproject:mydataset.mytable \
--use_legacy_sql=false \
'SELECT
  name,
  number
FROM
  `bigquery-public-data.usa_names.usa_1910_current`
WHERE
  gender = "M"
ORDER BY
  number DESC'

API

To run a query using the API, insert a new job and populate the query job configuration property. (Optional) Specify your location in the location property in the jobReference section of the job resource.

When you populate the query job properties, include the configuration.query.priority property, with the value set to BATCH.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"
	"time"

	"cloud.google.com/go/bigquery"
)

// queryBatch demonstrates issuing a query job using batch priority.
func queryBatch(w io.Writer, projectID, dstDatasetID, dstTableID string) error {
	// projectID := "my-project-id"
	// dstDatasetID := "mydataset"
	// dstTableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	// Build an aggregate table.
	q := client.Query(`
		SELECT
  			corpus,
  			SUM(word_count) as total_words,
  			COUNT(1) as unique_words
		FROM ` + "`bigquery-public-data.samples.shakespeare`" + `
		GROUP BY corpus;`)
	q.Priority = bigquery.BatchPriority
	q.QueryConfig.Dst = client.Dataset(dstDatasetID).Table(dstTableID)

	// Start the job.
	job, err := q.Run(ctx)
	if err != nil {
		return err
	}
	// Job is started and will progress without interaction.
	// To simulate other work being done, sleep a few seconds.
	time.Sleep(5 * time.Second)
	status, err := job.Status(ctx)
	if err != nil {
		return err
	}

	state := "Unknown"
	switch status.State {
	case bigquery.Pending:
		state = "Pending"
	case bigquery.Running:
		state = "Running"
	case bigquery.Done:
		state = "Done"
	}
	// You can continue to monitor job progress until it reaches
	// the Done state by polling periodically.  In this example,
	// we print the latest status.
	fmt.Fprintf(w, "Job %s in Location %s currently in state: %s\n", job.ID(), job.Location(), state)

	return nil

}

Java

To run a batch query, set the query priority to QueryJobConfiguration.Priority.BATCH when creating a QueryJobConfiguration.

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;

// Sample to query batch in a table
public class QueryBatch {

  public static void runQueryBatch() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String query =
        "SELECT corpus"
            + " FROM `"
            + projectId
            + "."
            + datasetName
            + "."
            + tableName
            + " GROUP BY corpus;";
    queryBatch(query);
  }

  public static void queryBatch(String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration queryConfig =
          QueryJobConfiguration.newBuilder(query)
              // Run at batch priority, which won't count toward concurrent rate limit.
              .setPriority(QueryJobConfiguration.Priority.BATCH)
              .build();

      TableResult results = bigquery.query(queryConfig);

      results
          .iterateAll()
          .forEach(row -> row.forEach(val -> System.out.printf("%s,", val.toString())));

      System.out.println("Query batch performed successfully.");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Query batch not performed \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function queryBatch() {
  // Runs a query at batch priority.

  // Create query job configuration. For all options, see
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery
  const queryJobConfig = {
    query: `SELECT corpus
            FROM \`bigquery-public-data.samples.shakespeare\` 
            LIMIT 10`,
    useLegacySql: false,
    priority: 'BATCH',
  };

  // Create job configuration. For all options, see
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfiguration
  const jobConfig = {
    // Specify a job configuration to set optional job resource properties.
    configuration: {
      query: queryJobConfig,
    },
  };

  // Make API request.
  const [job] = await bigquery.createJob(jobConfig);

  const jobId = job.metadata.id;
  const state = job.metadata.status.state;
  console.log(`Job ${jobId} is currently in state ${state}`);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

job_config = bigquery.QueryJobConfig(
    # Run at batch priority, which won't count toward concurrent rate limit.
    priority=bigquery.QueryPriority.BATCH
)

sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""

# Start the query, passing in the extra configuration.
query_job = client.query(sql, job_config=job_config)  # Make an API request.

# Check on the progress by getting the job's updated state. Once the state
# is `DONE`, the results are ready.
query_job = client.get_job(
    query_job.job_id, location=query_job.location
)  # Make an API request.

print("Job {} is currently in state {}".format(query_job.job_id, query_job.state))