Paging Through Query Results

This document describes how to page through query results using the BigQuery REST API.

Paging through results using the API

All collection.list methods return paginated results under certain circumstances. The number of results per page is controlled by the maxResults property.

Method Pagination criteria Default maxResults value Maximum maxResults value
Tabledata.list Returns paginated results if the response size is more than 10 MB of serialized JSON or more than maxResults rows. 100,000 100,000
All other collection.list methods Returns paginated results if the response is more than maxResults rows. 50 1,000

If you set maxResults to a value greater than the maximum value listed above, the results are paginated based on the maximum value.

A page is a subset of the total number of rows. If your results are more than one page of data, the result data will have a pageToken property. To retrieve the next page of results, make another list call and include the token value as a URL parameter named pageToken.

The bigquery.tabledata.list method, which is used to page through table data, uses a row offset value or a page token. See Browsing table data for information.

The following samples demonstrate paging through bigquery results.

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public int TableDataList(
    string datasetId, string tableId, int pageSize, BigQueryClient client)
{
    int recordCount = 0;
    var result = client.ListRows(datasetId, tableId, null,
        new ListRowsOptions { PageSize = pageSize });
    // If there are more rows than were returned in the first page of results, 
    // iterating over the rows will lazily evaluate the results each time, 
    // making further requests as necessary.
    foreach (var row in result)
    {
        Console.WriteLine($"{row["title"]}: {row["unique_words"]}");
        recordCount++;
    }
    return recordCount;
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Page<FieldValueList> tableData =
    bigquery.listTableData(datasetName, tableName, TableDataListOption.pageSize(100));
for (FieldValueList row : tableData.iterateAll()) {
  // do something with the row
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

The Google Cloud Client Library for Go automatically paginates by default, so you do not need to implement pagination yourself, for example:

table := client.Dataset(datasetID).Table(tableID)
it := table.Read(ctx)
for {
	var row []bigquery.Value
	err := it.Next(&row)
	if err == iterator.Done {
		break
	}
	if err != nil {
		return err
	}
	fmt.Println(row)
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

The Google Cloud Client Library for Node.js automatically paginates by default, so you do not need to implement pagination yourself, for example:

// Imports the Google Cloud client library
const BigQuery = require('@google-cloud/bigquery');

// The project ID to use, e.g. "your-project-id"
// const projectId = "your-project-id";

// The ID of the dataset of the table to browse, e.g. "my_dataset"
// const datasetId = "my_dataset";

// The ID of the table to browse, e.g. "my_table"
// const tableId = "my_table";

// Instantiates a client
const bigquery = BigQuery({
  projectId: projectId
});

// Lists rows in the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .getRows()
  .then((results) => {
    const rows = results[0];
    console.log('Rows:');
    rows.forEach((row) => console.log(row));
  })
  .catch((err) => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Pagination happens automatically in the Google Cloud Client Library for PHP using the generator function rows, which fetches the next page of results during iteration.

use Google\Cloud\BigQuery\BigQueryClient;

/**
 * Browse a bigquery table.
 * Example:
 * ```
 * browse_table($projectId, $datasetId, $tableId);
 * ```
 *
 * @param string $projectId  The Google project ID.
 * @param string $datasetId  The BigQuery dataset ID.
 * @param string $tableId    The BigQuery table ID.
 * @param string $maxResults The number of results to return at a time.
 * @param string $startIndex The row index to start on.
 *
 * @return int number of rows returned
 */
function browse_table($projectId, $datasetId, $tableId, $maxResults = 10, $startIndex = 0)
{
    $options = [
        'maxResults' => $maxResults,
        'startIndex' => $startIndex
    ];
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    $numRows = 0;
    foreach ($table->rows($options) as $row) {
        print('---');
        foreach ($row as $column => $value) {
            printf('%s: %s' . PHP_EOL, $column, $value);
        }
        $numRows++;
    }

    return $numRows;
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def list_rows(dataset_id, table_id, project=None):
    """Prints rows in the given table.

    Will print 25 rows at most for brevity as tables can contain large amounts
    of rows.

    If no project is specified, then the currently active project is used.
    """
    bigquery_client = bigquery.Client(project=project)
    dataset_ref = bigquery_client.dataset(dataset_id)
    table_ref = dataset_ref.table(table_id)

    # Get the table from the API so that the schema is available.
    table = bigquery_client.get_table(table_ref)

    # Load at most 25 results.
    rows = bigquery_client.list_rows(table, max_results=25)

    # Use format to create a simple table.
    format_string = '{!s:<16} ' * len(table.schema)

    # Print schema field names
    field_names = [field.name for field in table.schema]
    print(format_string.format(*field_names))

    for row in rows:
        print(format_string.format(*row))

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Pagination happens automatically in the Google Cloud Client Library for Ruby using Table#data and Data#next.

# project_id = "Your Google Cloud project ID"
# dataset_id = "ID of the dataset containing table"
# table_id   = "ID of the table to display data for"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id
dataset  = bigquery.dataset dataset_id
table    = dataset.table table_id

table.data.each do |row|
  row.each      do |column_name, value|
    puts "#{column_name} = #{value}"
  end
end

Requesting arbitrary pages and avoiding redundant list calls

When you page backwards or jump to arbitrary pages using cached pageToken values, it is possible that the data in your pages might have changed since it was last viewed but there is no clear indication that the data might have changed. To mitigate this, you can use the Etag property.

Every collection.list method (except for Tabledata) returns an Etag property in the result. This property is a hash of the page results that can be used to verify whether the page has changed since last request. When you make a request to BigQuery with an Etag value, BigQuery compares the Etag value to the ETag value returned by the API and responds based on whether the ETag values match. You can use ETags to help avoid redundant list calls in the following ways:

  • If you only want to return list values if the values have changed:

    If you only want to return a page of list values if the values have changed, you can make a list call with a previously-stored ETag using the HTTP "if-none-match" header. If the ETag you provide doesn't match the ETag on the server, BigQuery returns a page of new list values. If the ETags do match, BigQuery returns a HTTP 304 "Not Modified" result and no values. An example of this might be a webpage where users might periodically fill in information that is stored in BigQuery. You can avoid making redundant list calls to BigQuery if there are no changes to your data by using the if-none-match header with ETags.

  • If you only want to return list values if the values have not changed:

    If you only want to return a page of list values if the list values have not changed, you can use the HTTP "if-match" header. BigQuery matches the ETag values and returns the page of results if the results have not changed or returns a 412 "Precondition Failed" result if the page has changed.

Note: Although ETags are a great way to avoid making redundant list calls, you can apply the same methods to identifying if any objects have changed. For example, you can perform a Get request for a specific table and use ETags to determine if the table has changed before returning the full response.


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...