Stay organized with collections Save and categorize content based on your preferences.

Run interactive and batch query jobs

This document shows you how to run two types of query jobs in BigQuery:

  • Interactive query jobs, which are jobs that BigQuery runs on demand.
  • Batch query jobs, which are jobs that BigQuery waits to run until idle compute resources are available.

Interactive versus batch queries

By default, BigQuery runs your queries as interactive query jobs, which are run as soon as possible. Interactive queries count toward your concurrent limit.

With batch query jobs, BigQuery queues your queries and starts them when idle resources are available in the BigQuery shared resource pool. Typically, queries are queued for only a few minutes.

Batch queries don't count toward your concurrent limit. You can run a maximum of 10 concurrent batch queries in your project. Batch queries use the same resources as interactive queries. If you use flat-rate pricing, batch queries and interactive queries share your allocated slots.

BigQuery saves query results to either a temporary table (default) or permanent table. When you specify a permanent table as the destination table for the results, you can choose whether to append or overwrite an existing table, or create a new table with a unique name.

Required roles

To get the permissions that you need to run a query job, ask your administrator to grant you the following IAM roles:

  • BigQuery Job User (roles/bigquery.jobUser) on the project.
  • BigQuery Data Viewer (roles/bigquery.dataViewer) on all tables and views that your query references. To query views, you also need this role on all underlying tables and views. If you're using authorized views or authorized datasets, you don't need access to the underlying source data.

For more information about granting roles, see Manage access.

These predefined roles contain the permissions required to run a query job. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

  • bigquery.jobs.create on the project.
  • bigquery.tables.getData on all tables and views that your query references. To query views, you also need this permission on all underlying tables and views. If you're using authorized views or authorized datasets, you don't need access to the underlying source data.

You might also be able to get these permissions with custom roles or other predefined roles.

For more information about BigQuery permissions, see Access control with IAM.

Run an interactive query

To run an interactive query, select one of the following options:

Console

  1. Go to the BigQuery page.

    Go to BigQuery

  2. Click Compose new query.

  3. In the query editor, enter a valid Google Standard SQL query.

    For example, query the BigQuery public dataset usa_names to determine the most common names in the United States between the years 1910 and 2013:

    SELECT
      name, gender,
      SUM(number) AS total
    FROM
      `bigquery-public-data.usa_names.usa_1910_2013`
    GROUP BY
      name, gender
    ORDER BY
      total DESC
    LIMIT
      10;
    
  4. Optional: Specify the destination table and location for the query results:

    1. In the query editor, click More, and then click Query settings.
    2. In the Destination section, select Set a destination table for query results.
    3. For Dataset, enter the name of an existing dataset for the destination table—for example, myProject.myDataset.
    4. For Table Id, enter a name for the destination table—for example, myTable.
    5. If the destination table is an existing table, then for Destination table write preference, select whether to append or overwrite the table with the query results.

      If the destination table is a new table, then BigQuery creates the table when you run your query.

    6. In the Additional settings section, click the Data location menu, and then select an option.

      In this example, the usa_names dataset is stored in the US multi-region location. If you specify a destination table for this query, the dataset that contains the destination table must also be in the US multi-region. You cannot query a dataset in one location and write the results to a table in another location.

    7. Click Save.

  5. Click Run.

    If you don't specify a destination table, the query job writes the output to a temporary (cache) table.

bq

Use the bq query command. In the following example, the --use_legacy_sql=false flag lets you use Google Standard SQL syntax.

bq query \
    --use_legacy_sql=false \
    'QUERY'

Replace QUERY with a valid Google Standard SQL query. For example, query the BigQuery public dataset usa_names to determine the most common names in the United States between the years 1910 and 2013:

bq query \
    --use_legacy_sql=false \
    'SELECT
      name, gender,
      SUM(number) AS total
    FROM
      `bigquery-public-data.usa_names.usa_1910_2013`
    GROUP BY
      name, gender
    ORDER BY
      total DESC
    LIMIT
      10;'

The query job writes the output to a temporary (cache) table.

Optionally, you can specify the destination table and location for the query results. To write the results to an existing table, include the appropriate flag to append (--append_table=true) or overwrite (--replace=true) the table.

bq query \
    --location=LOCATION \
    --destination_table=TABLE \
    --use_legacy_sql=false \
    'QUERY'

Replace the following:

  • LOCATION: the region or multi-region for the destination table—for example, US

    In this example, the usa_names dataset is stored in the US multi-region location. If you specify a destination table for this query, the dataset that contains the destination table must also be in the US multi-region. You cannot query a dataset in one location and write the results to a table in another location.

    You can set a default value for the location using the .bigqueryrc file.

  • TABLE: a name for the destination table—for example, myDataset.myTable

    If the destination table is a new table, then BigQuery creates the table when you run your query. However, you must specify an existing dataset.

    If the table isn't in your current project, then add the Google Cloud project ID using the format PROJECT_ID:DATASET.TABLE—for example, myProject:myDataset.myTable. If --destination_table is unspecified, a query job is generated that writes the output to a temporary table.

API

To run a query using the API, insert a new job and populate the query job configuration property. Optionally specify your location in the location property in the jobReference section of the job resource.

Poll for results by calling getQueryResults. Poll until jobComplete equals true. Check for errors and warnings in the errors list.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.


using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryQuery
{
    public void Query(
        string projectId = "your-project-id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        string query = @"
            SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013`
            WHERE state = 'TX'
            LIMIT 100";
        BigQueryJob job = client.CreateQueryJob(
            sql: query,
            parameters: null,
            options: new QueryOptions { UseQueryCache = false });
        // Wait for the job to complete.
        job.PollUntilCompleted();
        // Display the results
        foreach (BigQueryRow row in client.GetQueryResults(job.Reference))
        {
            Console.WriteLine($"{row["name"]}");
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

// queryBasic demonstrates issuing a query and reading results.
func queryBasic(w io.Writer, projectID string) error {
	// projectID := "my-project-id"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	q := client.Query(
		"SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` " +
			"WHERE state = \"TX\" " +
			"LIMIT 100")
	// Location must match that of the dataset(s) referenced in the query.
	q.Location = "US"
	// Run the query and print results when the query job is completed.
	job, err := q.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}
	if err := status.Err(); err != nil {
		return err
	}
	it, err := job.Read(ctx)
	for {
		var row []bigquery.Value
		err := it.Next(&row)
		if err == iterator.Done {
			break
		}
		if err != nil {
			return err
		}
		fmt.Fprintln(w, row)
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;

public class SimpleQuery {

  public static void runSimpleQuery() {
    // TODO(developer): Replace this query before running the sample.
    String query = "SELECT corpus FROM `bigquery-public-data.samples.shakespeare` GROUP BY corpus;";
    simpleQuery(query);
  }

  public static void simpleQuery(String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      // Create the query job.
      QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();

      // Execute the query.
      TableResult result = bigquery.query(queryConfig);

      // Print the results.
      result.iterateAll().forEach(rows -> rows.forEach(row -> System.out.println(row.getValue())));

      System.out.println("Query ran successfully");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Query did not run \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library using default credentials
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
async function query() {
  // Queries the U.S. given names dataset for the state of Texas.

  const query = `SELECT name
    FROM \`bigquery-public-data.usa_names.usa_1910_2013\`
    WHERE state = 'TX'
    LIMIT 100`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
    // Location must match that of the dataset(s) referenced in the query.
    location: 'US',
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);
  console.log(`Job ${job.id} started.`);

  // Wait for the query to finish
  const [rows] = await job.getQueryResults();

  // Print the results
  console.log('Rows:');
  rows.forEach(row => console.log(row));
}

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Core\ExponentialBackoff;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $query = 'SELECT id, view_count FROM `bigquery-public-data.stackoverflow.posts_questions`';

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$jobConfig = $bigQuery->query($query);
$job = $bigQuery->startQuery($jobConfig);

$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
    print('Waiting for job to complete' . PHP_EOL);
    $job->reload();
    if (!$job->isComplete()) {
        throw new Exception('Job has not yet completed', 500);
    }
});
$queryResults = $job->queryResults();

$i = 0;
foreach ($queryResults as $row) {
    printf('--- Row %s ---' . PHP_EOL, ++$i);
    foreach ($row as $column => $value) {
        printf('%s: %s' . PHP_EOL, $column, json_encode($value));
    }
}
printf('Found %s row(s)' . PHP_EOL, $i);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.

require "google/cloud/bigquery"

def query
  bigquery = Google::Cloud::Bigquery.new
  sql = "SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` " \
        "WHERE state = 'TX' " \
        "LIMIT 100"

  # Location must match that of the dataset(s) referenced in the query.
  results = bigquery.query sql do |config|
    config.location = "US"
  end

  results.each do |row|
    puts row.inspect
  end
end

For more information, see Interactive versus batch queries.

Run a batch query

To run a batch query, select one of the following options:

Console

  1. Go to the BigQuery page.

    Go to BigQuery

  2. Click Compose new query.

  3. In the query editor, enter a valid Google Standard SQL query.

    For example, query the BigQuery public dataset usa_names to determine the most common names in the United States between the years 1910 and 2013:

    SELECT
      name, gender,
      SUM(number) AS total
    FROM
      `bigquery-public-data.usa_names.usa_1910_2013`
    GROUP BY
      name, gender
    ORDER BY
      total DESC
    LIMIT
      10;
    
  4. Click More, and then click Query settings.

  5. In the Resource management section, select Batch.

  6. Optional: Specify the destination table and location for the query results:

    1. In the Destination section, select Set a destination table for query results.
    2. For Dataset, enter the name of an existing dataset for the destination table—for example, myProject.myDataset.
    3. For Table Id, enter a name for the destination table—for example, myTable.
    4. If the destination table is an existing table, then for Destination table write preference, select whether to append or overwrite the table with the query results.

      If the destination table is a new table, then BigQuery creates the table when you run your query.

    5. In the Additional settings section, click the Data location menu, and then select an option.

      In this example, the usa_names dataset is stored in the US multi-region location. If you specify a destination table for this query, the dataset that contains the destination table must also be in the US multi-region. You cannot query a dataset in one location and write the results to a table in another location.

  7. Click Save.

  8. Click Run.

    If you don't specify a destination table, the query job writes the output to a temporary (cache) table.

bq

Use the bq query command and specify the --batch flag. In the following example, the --use_legacy_sql=false flag lets you use Google Standard SQL syntax.

bq query \
    --batch \
    --use_legacy_sql=false \
    'QUERY'

Replace QUERY with a valid Google Standard SQL query. For example, query the BigQuery public dataset usa_names to determine the most common names in the United States between the years 1910 and 2013:

bq query \
    --batch \
    --use_legacy_sql=false \
    'SELECT
      name, gender,
      SUM(number) AS total
    FROM
      `bigquery-public-data.usa_names.usa_1910_2013`
    GROUP BY
      name, gender
    ORDER BY
      total DESC
    LIMIT
      10;'

The query job writes the output to a temporary (cache) table.

Optionally, you can specify the destination table and location for the query results. To write the results to an existing table, include the appropriate flag to append (--append_table=true) or overwrite (--replace=true) the table.

bq query \
    --batch \
    --location=LOCATION \
    --destination_table=TABLE \
    --use_legacy_sql=false \
    'QUERY'

Replace the following:

  • LOCATION: the region or multi-region for the destination table—for example, US

    In this example, the usa_names dataset is stored in the US multi-region location. If you specify a destination table for this query, the dataset that contains the destination table must also be in the US multi-region. You cannot query a dataset in one location and write the results to a table in another location.

    You can set a default value for the location using the .bigqueryrc file.

  • TABLE: a name for the destination table—for example, myDataset.myTable

    If the destination table is a new table, then BigQuery creates the table when you run your query. However, you must specify an existing dataset.

    If the table isn't in your current project, then add the Google Cloud project ID using the format PROJECT_ID:DATASET.TABLE—for example, myProject:myDataset.myTable. If --destination_table is unspecified, a query job is generated that writes the output to a temporary table.

API

To run a query using the API, insert a new job and populate the query job configuration property. Optionally specify your location in the location property in the jobReference section of the job resource.

When you populate the query job properties, include the configuration.query.priority property and set the value to BATCH.

Poll for results by calling getQueryResults. Poll until jobComplete equals true. Check for errors and warnings in the errors list.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"
	"time"

	"cloud.google.com/go/bigquery"
)

// queryBatch demonstrates issuing a query job using batch priority.
func queryBatch(w io.Writer, projectID, dstDatasetID, dstTableID string) error {
	// projectID := "my-project-id"
	// dstDatasetID := "mydataset"
	// dstTableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	// Build an aggregate table.
	q := client.Query(`
		SELECT
  			corpus,
  			SUM(word_count) as total_words,
  			COUNT(1) as unique_words
		FROM ` + "`bigquery-public-data.samples.shakespeare`" + `
		GROUP BY corpus;`)
	q.Priority = bigquery.BatchPriority
	q.QueryConfig.Dst = client.Dataset(dstDatasetID).Table(dstTableID)

	// Start the job.
	job, err := q.Run(ctx)
	if err != nil {
		return err
	}
	// Job is started and will progress without interaction.
	// To simulate other work being done, sleep a few seconds.
	time.Sleep(5 * time.Second)
	status, err := job.Status(ctx)
	if err != nil {
		return err
	}

	state := "Unknown"
	switch status.State {
	case bigquery.Pending:
		state = "Pending"
	case bigquery.Running:
		state = "Running"
	case bigquery.Done:
		state = "Done"
	}
	// You can continue to monitor job progress until it reaches
	// the Done state by polling periodically.  In this example,
	// we print the latest status.
	fmt.Fprintf(w, "Job %s in Location %s currently in state: %s\n", job.ID(), job.Location(), state)

	return nil

}

Java

To run a batch query, set the query priority to QueryJobConfiguration.Priority.BATCH when creating a QueryJobConfiguration.

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;

// Sample to query batch in a table
public class QueryBatch {

  public static void runQueryBatch() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String query =
        "SELECT corpus"
            + " FROM `"
            + projectId
            + "."
            + datasetName
            + "."
            + tableName
            + " GROUP BY corpus;";
    queryBatch(query);
  }

  public static void queryBatch(String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration queryConfig =
          QueryJobConfiguration.newBuilder(query)
              // Run at batch priority, which won't count toward concurrent rate limit.
              .setPriority(QueryJobConfiguration.Priority.BATCH)
              .build();

      TableResult results = bigquery.query(queryConfig);

      results
          .iterateAll()
          .forEach(row -> row.forEach(val -> System.out.printf("%s,", val.toString())));

      System.out.println("Query batch performed successfully.");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Query batch not performed \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function queryBatch() {
  // Runs a query at batch priority.

  // Create query job configuration. For all options, see
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery
  const queryJobConfig = {
    query: `SELECT corpus
            FROM \`bigquery-public-data.samples.shakespeare\` 
            LIMIT 10`,
    useLegacySql: false,
    priority: 'BATCH',
  };

  // Create job configuration. For all options, see
  // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfiguration
  const jobConfig = {
    // Specify a job configuration to set optional job resource properties.
    configuration: {
      query: queryJobConfig,
    },
  };

  // Make API request.
  const [job] = await bigquery.createJob(jobConfig);

  const jobId = job.metadata.id;
  const state = job.metadata.status.state;
  console.log(`Job ${jobId} is currently in state ${state}`);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

job_config = bigquery.QueryJobConfig(
    # Run at batch priority, which won't count toward concurrent rate limit.
    priority=bigquery.QueryPriority.BATCH
)

sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""

# Start the query, passing in the extra configuration.
query_job = client.query(sql, job_config=job_config)  # Make an API request.

# Check on the progress by getting the job's updated state. Once the state
# is `DONE`, the results are ready.
query_job = client.get_job(
    query_job.job_id, location=query_job.location
)  # Make an API request.

print("Job {} is currently in state {}".format(query_job.job_id, query_job.state))

For more information, see Interactive versus batch queries.

What's next