ページ分けを使用して BigQuery API でデータを読み取る

このドキュメントでは、BigQuery API でページ分けを使用してテーブルデータとクエリ結果を読み取る方法について説明します。

API を使用して結果をページ分割する

すべての *collection*.list メソッドから返される結果は、特定の状況においてページ分割されます。maxResults プロパティは、ページあたりの結果の数を制限します。

メソッド	ページ分け基準	`maxResults` のデフォルト値	`maxResults` の最大値	`maxFieldValues` の最大値
`tabledata.list`	レスポンスサイズが 10 MB¹ を超えるデータ、または `maxResults` 行を超える場合は、ページ分割された結果を返します。	無制限	無制限	無制限
その他すべての `collection.list` メソッド	レスポンスが `maxResults` 行を超え、かつ上限を下回っている場合は、ページ分割された結果を返します。	10,000	無制限	300,000

結果がバイトまたはフィールドの上限を超えると、結果は上限に合わせてカットされます。1 行がバイトまたはフィールドの上限より大きい場合、tabledata.list は最大 100 MB のデータ¹を返すことができます。これは、クエリ結果の最大行数制限に準拠しています。ページあたりの最小サイズはなく、一部のページは他のページよりも多くの行を返すことがあります。

¹ 行サイズは行データの内部表現に基づくため、概算値です。行の最大サイズに対する上限は、クエリジョブ実行の特定の段階で適用されます。

jobs.getQueryResults は、サポートを通じてより大きなデータサイズを明示的にリクエストした場合を除いて、20 MB のデータを返すことができます。

ページは全行数のサブセットです。結果が 1 ページ分のデータを超える場合、結果データには pageToken プロパティが含まれます。次のページの結果を取得するには、別の list 呼び出しを行い、トークン値を pageToken という名前の URL パラメータとして指定します。

テーブルデータのページ分けに使用する tabledata.list メソッドでは、行オフセット値またはページトークンを使用します。詳しくは、テーブルデータの閲覧をご覧ください。

クライアントライブラリの結果を反復処理する

Cloud クライアントライブラリは API のページ分けの低レベルの詳細を処理し、ページレスポンス内の個々の要素の操作を簡素化する、よりイテレータに似たエクスペリエンスを提供します。

BigQuery テーブルデータをページ分割するサンプルを以下に示します。

C#

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある C# の設定手順を完了してください。詳細については、BigQuery C# API のリファレンスドキュメントをご覧ください。

BigQuery に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、クライアントライブラリの認証を設定するをご覧ください。


using Google.Api.Gax;
using Google.Apis.Bigquery.v2.Data;
using Google.Cloud.BigQuery.V2;
using System;
using System.Linq;

public class BigQueryBrowseTable
{
    public void BrowseTable(
        string projectId = "your-project-id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        TableReference tableReference = new TableReference()
        {
            TableId = "shakespeare",
            DatasetId = "samples",
            ProjectId = "bigquery-public-data"
        };
        // Load all rows from a table
        PagedEnumerable<TableDataList, BigQueryRow> result = client.ListRows(
            tableReference: tableReference,
            schema: null
        );
        // Print the first 10 rows
        foreach (BigQueryRow row in result.Take(10))
        {
            Console.WriteLine($"{row["corpus"]}: {row["word_count"]}");
        }
    }
}

Java

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある Java の設定手順を完了してください。詳細については、BigQuery Java API のリファレンスドキュメントをご覧ください。

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQuery.TableDataListOption;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableResult;

// Sample to directly browse a table with optional paging
public class BrowseTable {

  public static void runBrowseTable() {
    // TODO(developer): Replace these variables before running the sample.
    String table = "MY_TABLE_NAME";
    String dataset = "MY_DATASET_NAME";
    browseTable(dataset, table);
  }

  public static void browseTable(String dataset, String table) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      // Identify the table itself
      TableId tableId = TableId.of(dataset, table);

      // Page over 100 records. If you don't need pagination, remove the pageSize parameter.
      TableResult result = bigquery.listTableData(tableId, TableDataListOption.pageSize(100));

      // Print the records
      result
          .iterateAll()
          .forEach(
              row -> {
                row.forEach(fieldValue -> System.out.print(fieldValue.toString() + ", "));
                System.out.println();
              });

      System.out.println("Query ran successfully");
    } catch (BigQueryException e) {
      System.out.println("Query failed to run \n" + e.toString());
    }
  }
}

Go

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある Go の設定手順を完了してください。詳細については、BigQuery Go API のリファレンスドキュメントをご覧ください。

Go 用の Cloud クライアントライブラリでは、デフォルトで自動的にページ分けされるため、ページ分けを自分で実装する必要はありません。次に例を示します。

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

// browseTable demonstrates reading data from a BigQuery table directly without the use of a query.
// For large tables, we also recommend the BigQuery Storage API.
func browseTable(w io.Writer, projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	// tableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	table := client.Dataset(datasetID).Table(tableID)
	it := table.Read(ctx)
	for {
		var row []bigquery.Value
		err := it.Next(&row)
		if err == iterator.Done {
			break
		}
		if err != nil {
			return err
		}
		fmt.Fprintln(w, row)
	}
	return nil
}

Node.js

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある Node.js の設定手順を完了してください。詳細については、BigQuery Node.js API のリファレンスドキュメントをご覧ください。

Node.js 用の Cloud クライアントライブラリでは、デフォルトで自動的にページ分けされるため、ページ分けを自分で実装する必要はありません。次に例を示します

// Import the Google Cloud client library using default credentials
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function browseTable() {
  // Retrieve a table's rows using manual pagination.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = 'my_dataset'; // Existing dataset
  // const tableId = 'my_table'; // Table to create

  const query = `SELECT name, SUM(number) as total_people
    FROM \`bigquery-public-data.usa_names.usa_1910_2013\`
    GROUP BY name 
    ORDER BY total_people 
    DESC LIMIT 100`;

  // Create table reference.
  const dataset = bigquery.dataset(datasetId);
  const destinationTable = dataset.table(tableId);

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery
  const queryOptions = {
    query: query,
    destination: destinationTable,
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(queryOptions);

  // For all options, see https://cloud.google.com/bigquery/docs/reference/v2/jobs/getQueryResults
  const queryResultsOptions = {
    // Retrieve zero resulting rows.
    maxResults: 0,
  };

  // Wait for the job to finish.
  await job.getQueryResults(queryResultsOptions);

  function manualPaginationCallback(err, rows, nextQuery) {
    rows.forEach(row => {
      console.log(`name: ${row.name}, ${row.total_people} total people`);
    });

    if (nextQuery) {
      // More results exist.
      destinationTable.getRows(nextQuery, manualPaginationCallback);
    }
  }

  // For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list
  const getRowsOptions = {
    autoPaginate: false,
    maxResults: 20,
  };

  // Retrieve all rows.
  destinationTable.getRows(getRowsOptions, manualPaginationCallback);
}
browseTable();

PHP

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある PHP の設定手順を完了してください。詳細については、BigQuery PHP API のリファレンスドキュメントをご覧ください。

PHP 用の Cloud クライアントライブラリでジェネレータ関数 rows を使用すると、ページ分けが自動的に行われます。この関数はイテレーション中に結果の次のページをフェッチします。

use Google\Cloud\BigQuery\BigQueryClient;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $datasetId = 'The BigQuery dataset ID';
// $tableId   = 'The BigQuery table ID';
// $maxResults = 10;

$maxResults = 10;
$startIndex = 0;

$options = [
    'maxResults' => $maxResults,
    'startIndex' => $startIndex
];
$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
$numRows = 0;
foreach ($table->rows($options) as $row) {
    print('---');
    foreach ($row as $column => $value) {
        printf('%s: %s' . PHP_EOL, $column, $value);
    }
    $numRows++;
}

Python

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある Python の設定手順を完了してください。詳細については、BigQuery Python API のリファレンスドキュメントをご覧ください。

Python 用の Cloud クライアントライブラリでは、デフォルトで自動的にページ分けされるため、ページ分けを自分で実装する必要はありません。次に例を示します。


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to browse data rows.
# table_id = "your-project.your_dataset.your_table_name"

# Download all rows from a table.
rows_iter = client.list_rows(table_id)  # Make an API request.

# Iterate over rows to make the API requests to fetch row data.
rows = list(rows_iter)
print("Downloaded {} rows from table {}".format(len(rows), table_id))

# Download at most 10 rows.
rows_iter = client.list_rows(table_id, max_results=10)
rows = list(rows_iter)
print("Downloaded {} rows from table {}".format(len(rows), table_id))

# Specify selected fields to limit the results to certain columns.
table = client.get_table(table_id)  # Make an API request.
fields = table.schema[:2]  # First two columns.
rows_iter = client.list_rows(table_id, selected_fields=fields, max_results=10)
rows = list(rows_iter)
print("Selected {} columns from table {}.".format(len(rows_iter.schema), table_id))
print("Downloaded {} rows from table {}".format(len(rows), table_id))

# Print row data in tabular format.
rows = client.list_rows(table, max_results=10)
format_string = "{!s:<16} " * len(rows.schema)
field_names = [field.name for field in rows.schema]
print(format_string.format(*field_names))  # Prints column headers.
for row in rows:
    print(format_string.format(*row))  # Prints row data.

Ruby

このサンプルを試す前に、クライアントライブラリを使用した BigQuery クイックスタートにある Ruby の設定手順を完了してください。詳細については、BigQuery Ruby API のリファレンスドキュメントをご覧ください。

Ruby 用 Cloud クライアントライブラリで Table#data と Data#next を使用すると、ページ分けが自動的に行われます。

require "google/cloud/bigquery"

def browse_table
  bigquery = Google::Cloud::Bigquery.new project_id: "bigquery-public-data"
  dataset  = bigquery.dataset "samples"
  table    = dataset.table "shakespeare"

  # Load all rows from a table
  rows = table.data

  # Load the first 10 rows
  rows = table.data max: 10

  # Print row data
  rows.each { |row| puts row }
end

任意のページをリクエストして、冗長リスト呼び出しを回避する

前のページに戻る場合、またはキャッシュに保存された pageToken の値を使用して任意のページにジャンプする場合、ページのデータが最後に表示された後で変更されている可能性がありますが、データが変更されたことは明確には示されません。これを確認するには、etag プロパティを使用できます。

すべての collection.list メソッド（テーブルデータ用を除く）は、結果で etag プロパティを返します。このプロパティは、ページが最後のリクエスト以降に変更されているかどうかを確認するために使用できるページ結果のハッシュです。ETag 値を使用して BigQuery にリクエストを行うと、BigQuery はその ETag 値と API によって返された ETag 値を比較し、ETag 値が一致しているかどうかに基づいて応答します。ETag を次のように使用して冗長リスト呼び出しを回避できます。

値が変更されている場合にリスト値を返す。

値が変更されている場合にのみリスト値のページを返す場合は、HTTP の If-None-Match ヘッダーを使用し、保存しておいた ETag を指定してリスト呼び出しを行うことができます。指定した ETag がサーバー上の ETag と一致しない場合、BigQuery は、新しいリスト値のページを返します。ETag が一致する場合、BigQuery は HTTP 304 Not Modified ステータスを返し、値を返しません。この例として、ユーザーが BigQuery に格納されている情報を定期的に入力するウェブページがあります。ETag と If-None-Match ヘッダーを使用することにより、データに変更がない場合に冗長なリスト呼び出しを避けることができます。
値が変更されていない場合にリスト値を返す。

リストの値が変更されていない場合にのみリスト値のページを返す場合は、HTTP の If-Match ヘッダーを使用できます。BigQuery は ETag 値を比較し、結果が変更されていない場合は結果のページを返します。ページが変更されている場合は、412「Precondition Failed」という結果を返します。

注: ETag は冗長なリスト呼び出しを避けるために有効な方法ですが、同じ方法を使用して、いずれかのオブジェクトが変更されたかどうかを確認できます。たとえば、特定のテーブルに対する get リクエストを実行し、ETag を使用して、テーブルが完全なレスポンスを返す前に変更されているかどうかを判断できます。

クエリ結果をページ分割する

各クエリは宛先テーブルに書き込みます。宛先テーブルが指定されていない場合、BigQuery API は宛先テーブルプロパティに一時匿名テーブルへの参照を自動的に挿入します。

API

jobs.config.query.destinationTable フィールドを読み取り、クエリ結果が書き込まれたテーブルを確認します。クエリ結果を読み取るには、tabledata.list を呼び出します。

Java

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableResult;

// Sample to run query with pagination.
public class QueryPagination {

  public static void main(String[] args) {
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String query =
        "SELECT name, SUM(number) as total_people"
            + " FROM `bigquery-public-data.usa_names.usa_1910_2013`"
            + " GROUP BY name"
            + " ORDER BY total_people DESC"
            + " LIMIT 100";
    queryPagination(datasetName, tableName, query);
  }

  public static void queryPagination(String datasetName, String tableName, String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);
      QueryJobConfiguration queryConfig =
          QueryJobConfiguration.newBuilder(query)
              // save results into a table.
              .setDestinationTable(tableId)
              .build();

      bigquery.query(queryConfig);

      TableResult results =
          bigquery.listTableData(tableId, BigQuery.TableDataListOption.pageSize(20));

      // First Page
      results
          .getValues()
          .forEach(row -> row.forEach(val -> System.out.printf("%s,\n", val.toString())));

      while (results.hasNextPage()) {
        // Remaining Pages
        results = results.getNextPage();
        results
            .getValues()
            .forEach(row -> row.forEach(val -> System.out.printf("%s,\n", val.toString())));
      }

      System.out.println("Query pagination performed successfully.");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Query not performed \n" + e.toString());
    }
  }
}

各ページで返される行数を設定するには、GetQueryResults ジョブを使用し、渡す QueryResultsOption オブジェクトの pageSize オプションを設定します。次の例をご覧ください。

TableResult result = job.getQueryResults();
QueryResultsOption queryResultsOption = QueryResultsOption.pageSize(20);

TableResult result = job.getQueryResults(queryResultsOption);

Node.js

// Import the Google Cloud client library using default credentials
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function queryPagination() {
  // Run a query and get rows using automatic pagination.

  const query = `SELECT name, SUM(number) as total_people
  FROM \`bigquery-public-data.usa_names.usa_1910_2013\`
  GROUP BY name
  ORDER BY total_people DESC
  LIMIT 100`;

  // Run the query as a job.
  const [job] = await bigquery.createQueryJob(query);

  // Wait for job to complete and get rows.
  const [rows] = await job.getQueryResults();

  console.log('Query results:');
  rows.forEach(row => {
    console.log(`name: ${row.name}, ${row.total_people} total people`);
  });
}
queryPagination();

Python

QueryJob.result メソッドは、クエリ結果のイテラブルを返します。または

QueryJob.destination プロパティを読み取ります。このプロパティが構成されていない場合は、API によって一時匿名テーブルへの参照が設定されます。
Client.get_table メソッドでテーブルスキーマを取得します。
Client.list_rows メソッドを使用して、宛先テーブルのすべての行に対するイテラブルを作成します。


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    GROUP BY name
    ORDER BY total_people DESC
"""
query_job = client.query(query)  # Make an API request.
query_job.result()  # Wait for the query to complete.

# Get the destination table for the query results.
#
# All queries write to a destination table. If a destination table is not
# specified, the BigQuery populates it with a reference to a temporary
# anonymous table after the query completes.
destination = query_job.destination

# Get the schema (and other properties) for the destination table.
#
# A schema is useful for converting from BigQuery types to Python types.
destination = client.get_table(destination)

# Download rows.
#
# The client library automatically handles pagination.
print("The query data:")
rows = client.list_rows(destination, max_results=20)
for row in rows:
    print("name={}, count={}".format(row["name"], row["total_people"]))

ページ分けを使用して BigQuery API でデータを読み取る

API を使用して結果をページ分割する

クライアント ライブラリの結果を反復処理する

C#

Java

Go

Node.js

PHP

Python

Ruby

任意のページをリクエストして、冗長リスト呼び出しを回避する

クエリ結果をページ分割する

API

Java

Node.js

Python

クライアントライブラリの結果を反復処理する