BigQuery Storage API Client Libraries

This page shows how to get started with the Cloud Client Libraries for the BigQuery Storage API. Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained.

Installing the client library


For more information, see Setting Up a Go Development Environment.

go get -u


For more information, see Setting Up a Java Development Environment.

如果您使用的是 Maven,请将以下代码添加到您的 pom.xml 文件中。如需详细了解 BOM,请参阅 Google Cloud Platform 库 BOM



如果您使用的是 Gradle,请将以下代码添加到您的依赖项中:

implementation platform('')

compile ''

如果您使用的是 sbt,请将以下代码添加到您的依赖项中:

libraryDependencies += "" % "google-cloud-bigquerystorage" % "1.20.2"

如果您使用的是 IntelliJ 或 Eclipse,请通过以下 IDE 插件将客户端库添加到您的项目中:



For more information, see Setting Up a Python Development Environment.

pip install --upgrade google-cloud-bigquery-storage

Setting up authentication

To run the client library, you must first set up authentication by creating a service account and setting an environment variable. Complete the following steps to set up authentication. For other ways to authenticate, see the GCP authentication documentation.

Cloud Console


  1. 在 Cloud Console 中,转到创建服务帐号页面。

  2. 选择一个项目。
  3. 服务帐号名称字段中,输入一个名称。 Cloud Console 会根据此名称填充服务帐号 ID 字段。

    服务帐号说明字段中,输入说明。例如,Service account for quickstart

  4. 点击创建
  5. 点击选择角色字段。


  6. 点击继续
  7. 点击完成以完成服务帐号的创建过程。



  1. 在 Cloud Console 中,点击您创建的服务帐号的电子邮件地址。
  2. 点击密钥
  3. 依次点击添加密钥创建新密钥
  4. 点击创建。JSON 密钥文件将下载到您的计算机上。
  5. 点击关闭


您可以使用本地机器上的 Cloud SDK 或在 Cloud Shell 中运行以下命令。

  1. 创建服务帐号。将 NAME 替换为服务帐号的名称。

    gcloud iam service-accounts create NAME
  2. 向服务帐号授予权限。将 PROJECT_ID 替换为您的项目 ID。

    gcloud projects add-iam-policy-binding PROJECT_ID --member="" --role="roles/owner"
  3. 生成密钥文件。将 FILE_NAME 替换为密钥文件的名称。

    gcloud iam service-accounts keys create FILE_NAME.json

通过设置环境变量 GOOGLE_APPLICATION_CREDENTIALS 向应用代码提供身份验证凭据。 此变量仅适用于当前的 Shell 会话,因此,如果您打开新的会话,请重新设置该变量。

Linux 或 macOS


KEY_PATH 替换为包含您的服务帐号密钥的 JSON 文件的路径。


export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"


对于 PowerShell:


KEY_PATH 替换为包含您的服务帐号密钥的 JSON 文件的路径。





KEY_PATH 替换为包含您的服务帐号密钥的 JSON 文件的路径。

Using the client library

The following example shows basic interactions with the BigQuery Storage API.


To use this sample, prepare your machine for Go development, and complete the BigQuery Storage API quickstart. For more information, see the BigQuery Storage API Go API reference documentation.

// The bigquery_storage_quickstart application demonstrates usage of the
// BigQuery Storage read API.  It demonstrates API features such as column
// projection (limiting the output to a subset of a table's columns),
// column filtering (using simple predicates to filter records on the server
// side), establishing the snapshot time (reading data from the table at a
// specific point in time), and decoding Avro row blocks using the third party
// "" library.
package main

import (

	bqStorage ""
	gax ""
	goavro ""
	bqStoragepb ""

// rpcOpts is used to configure the underlying gRPC client to accept large
// messages.  The BigQuery Storage API may send message blocks up to 128MB
// in size.
var rpcOpts = gax.WithGRPCOptions(
	grpc.MaxCallRecvMsgSize(1024 * 1024 * 129),

// Command-line flags.
var (
	projectID = flag.String("project_id", "",
		"Cloud Project ID, used for session creation.")
	snapshotMillis = flag.Int64("snapshot_millis", 0,
		"Snapshot time to use for reads, represented in epoch milliseconds format.  Default behavior reads current data.")

func main() {
	ctx := context.Background()
	bqReadClient, err := bqStorage.NewBigQueryReadClient(ctx)
	if err != nil {
		log.Fatalf("NewBigQueryStorageClient: %v", err)
	defer bqReadClient.Close()

	// Verify we've been provided a parent project which will contain the read session.  The
	// session may exist in a different project than the table being read.
	if *projectID == "" {
		log.Fatalf("No parent project ID specified, please supply using the --project_id flag.")

	// This example uses baby name data from the public datasets.
	srcProjectID := "bigquery-public-data"
	srcDatasetID := "usa_names"
	srcTableID := "usa_1910_current"
	readTable := fmt.Sprintf("projects/%s/datasets/%s/tables/%s",

	// We limit the output columns to a subset of those allowed in the table,
	// and set a simple filter to only report names from the state of
	// Washington (WA).
	tableReadOptions := &bqStoragepb.ReadSession_TableReadOptions{
		SelectedFields: []string{"name", "number", "state"},
		RowRestriction: `state = "WA"`,

	createReadSessionRequest := &bqStoragepb.CreateReadSessionRequest{
		Parent: fmt.Sprintf("projects/%s", *projectID),
		ReadSession: &bqStoragepb.ReadSession{
			Table: readTable,
			// This API can also deliver data serialized in Apache Arrow format.
			// This example leverages Apache Avro.
			DataFormat:  bqStoragepb.DataFormat_AVRO,
			ReadOptions: tableReadOptions,
		MaxStreamCount: 1,

	// Set a snapshot time if it's been specified.
	if *snapshotMillis > 0 {
		ts, err := ptypes.TimestampProto(time.Unix(0, *snapshotMillis*1000))
		if err != nil {
			log.Fatalf("Invalid snapshot millis (%d): %v", *snapshotMillis, err)
		createReadSessionRequest.ReadSession.TableModifiers = &bqStoragepb.ReadSession_TableModifiers{
			SnapshotTime: ts,

	// Create the session from the request.
	session, err := bqReadClient.CreateReadSession(ctx, createReadSessionRequest, rpcOpts)
	if err != nil {
		log.Fatalf("CreateReadSession: %v", err)
	fmt.Printf("Read session: %s\n", session.GetName())

	if len(session.GetStreams()) == 0 {
		log.Fatalf("no streams in session.  if this was a small query result, consider writing to output to a named table.")

	// We'll use only a single stream for reading data from the table.  Because
	// of dynamic sharding, this will yield all the rows in the table. However,
	// if you wanted to fan out multiple readers you could do so by having a
	// increasing the MaxStreamCount.
	readStream := session.GetStreams()[0].Name

	ch := make(chan *bqStoragepb.AvroRows)

	// Use a waitgroup to coordinate the reading and decoding goroutines.
	var wg sync.WaitGroup

	// Start the reading in one goroutine.
	go func() {
		defer wg.Done()
		if err := processStream(ctx, bqReadClient, readStream, ch); err != nil {
			log.Fatalf("processStream failure: %v", err)

	// Start Avro processing and decoding in another goroutine.
	go func() {
		defer wg.Done()
		err := processAvro(ctx, session.GetAvroSchema().GetSchema(), ch)
		if err != nil {
			log.Fatalf("Error processing avro: %v", err)

	// Wait until both the reading and decoding goroutines complete.


// printDatum prints the decoded row datum.
func printDatum(d interface{}) {
	m, ok := d.(map[string]interface{})
	if !ok {
		log.Printf("failed type assertion: %v", d)
	// Go's map implementation returns keys in a random ordering, so we sort
	// the keys before accessing.
	keys := make([]string, len(m))
	i := 0
	for k := range m {
		keys[i] = k
	for _, key := range keys {
		fmt.Printf("%s: %-20v ", key, valueFromTypeMap(m[key]))

// valueFromTypeMap returns the first value/key in the type map.  This function
// is only suitable for simple schemas, as complex typing such as arrays and
// records necessitate a more robust implementation.  See the goavro library
// and the Avro specification for more information.
func valueFromTypeMap(field interface{}) interface{} {
	m, ok := field.(map[string]interface{})
	if !ok {
		return nil
	for _, v := range m {
		// Return the first key encountered.
		return v
	return nil

// processStream reads rows from a single storage Stream, and sends the Avro
// data blocks to a channel. This function will retry on transient stream
// failures and bookmark progress to avoid re-reading data that's already been
// successfully transmitted.
func processStream(ctx context.Context, client *bqStorage.BigQueryReadClient, st string, ch chan<- *bqStoragepb.AvroRows) error {
	var offset int64

	// Streams may be long-running.  Rather than using a global retry for the
	// stream, implement a retry that resets once progress is made.
	retryLimit := 3

	for {
		retries := 0
		// Send the initiating request to start streaming row blocks.
		rowStream, err := client.ReadRows(ctx, &bqStoragepb.ReadRowsRequest{
			ReadStream: st,
			Offset:     offset,
		}, rpcOpts)
		if err != nil {
			return fmt.Errorf("Couldn't invoke ReadRows: %v", err)

		// Process the streamed responses.
		for {
			r, err := rowStream.Recv()
			if err == io.EOF {
				return nil
			if err != nil {
				if retries >= retryLimit {
					return fmt.Errorf("processStream retries exhausted: %v", err)
				// break the inner loop, and try to recover by starting a new streaming
				// ReadRows call at the last known good offset.

			rc := r.GetRowCount()
			if rc > 0 {
				// Bookmark our progress in case of retries and send the rowblock on the channel.
				offset = offset + rc
				// We're making progress, reset retries.
				retries = 0
				ch <- r.GetAvroRows()

// processAvro receives row blocks from a channel, and uses the provided Avro
// schema to decode the blocks into individual row messages for printing.  Will
// continue to run until the channel is closed or the provided context is
// cancelled.
func processAvro(ctx context.Context, schema string, ch <-chan *bqStoragepb.AvroRows) error {
	// Establish a decoder that can process blocks of messages using the
	// reference schema. All blocks share the same schema, so the decoder
	// can be long-lived.
	codec, err := goavro.NewCodec(schema)
	if err != nil {
		return fmt.Errorf("couldn't create codec: %v", err)

	for {
		select {
		case <-ctx.Done():
			// Context was cancelled.  Stop.
			return nil
		case rows, ok := <-ch:
			if !ok {
				// Channel closed, no further avro messages.  Stop.
				return nil
			undecoded := rows.GetSerializedBinaryRows()
			for len(undecoded) > 0 {
				datum, remainingBytes, err := codec.NativeFromBinary(undecoded)

				if err != nil {
					if err == io.EOF {
					return fmt.Errorf("decoding error with %d bytes remaining: %v", len(undecoded), err)
				undecoded = remainingBytes


Before trying this sample, follow the Java setup instructions in Setting Up a Java Development Environment. For more information, see the BigQuery Storage API Java API reference documentation.

import java.util.ArrayList;
import java.util.List;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.FieldVector;
import org.apache.arrow.vector.VectorLoader;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ReadChannel;
import org.apache.arrow.vector.ipc.message.MessageSerializer;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel;

public class StorageArrowSample {

   * SimpleRowReader handles deserialization of the Apache Arrow-encoded row batches transmitted
   * from the storage API using a generic datum decoder.
  private static class SimpleRowReader implements AutoCloseable {

    BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);

    // Decoder object will be reused to avoid re-allocation and too much garbage collection.
    private final VectorSchemaRoot root;
    private final VectorLoader loader;

    public SimpleRowReader(ArrowSchema arrowSchema) throws IOException {
      Schema schema =
              new ReadChannel(
                  new ByteArrayReadableSeekableByteChannel(
      List<FieldVector> vectors = new ArrayList<>();
      for (Field field : schema.getFields()) {
      root = new VectorSchemaRoot(vectors);
      loader = new VectorLoader(root);

     * Sample method for processing Arrow data which only validates decoding.
     * @param batch object returned from the ReadRowsResponse.
    public void processRows(ArrowRecordBatch batch) throws IOException {
      org.apache.arrow.vector.ipc.message.ArrowRecordBatch deserializedBatch =
              new ReadChannel(
                  new ByteArrayReadableSeekableByteChannel(

      // Release buffers from batch (they are still held in the vectors in root).
      // Release buffers from vectors in root.

    public void close() {

  public static void main(String... args) throws Exception {
    // Sets your Google Cloud Platform project ID.
    // String projectId = "YOUR_PROJECT_ID";
    String projectId = args[0];
    Integer snapshotMillis = null;
    if (args.length > 1) {
      snapshotMillis = Integer.parseInt(args[1]);

    try (BigQueryReadClient client = BigQueryReadClient.create()) {
      String parent = String.format("projects/%s", projectId);

      // This example uses baby name data from the public datasets.
      String srcTable =
              "bigquery-public-data", "usa_names", "usa_1910_current");

      // We specify the columns to be projected by adding them to the selected fields,
      // and set a simple filter to restrict which rows are transmitted.
      TableReadOptions options =
              .setRowRestriction("state = \"WA\"")

      // Start specifying the read session we want created.
      ReadSession.Builder sessionBuilder =
              // This API can also deliver data serialized in Apache Avro format.
              // This example leverages Apache Arrow.

      // Optionally specify the snapshot time.  When unspecified, snapshot time is "now".
      if (snapshotMillis != null) {
        Timestamp t =
                .setSeconds(snapshotMillis / 1000)
                .setNanos((int) ((snapshotMillis % 1000) * 1000000))
        TableModifiers modifiers = TableModifiers.newBuilder().setSnapshotTime(t).build();

      // Begin building the session creation request.
      CreateReadSessionRequest.Builder builder =

      ReadSession session = client.createReadSession(;
      // Setup a simple reader and start a read session.
      try (SimpleRowReader reader = new SimpleRowReader(session.getArrowSchema())) {

        // Assert that there are streams available in the session.  An empty table may not have
        // data available.  If no sessions are available for an anonymous (cached) table, consider
        // writing results of a query to a named table rather than consuming cached results
        // directly.
        Preconditions.checkState(session.getStreamsCount() > 0);

        // Use the first stream to perform reading.
        String streamName = session.getStreams(0).getName();

        ReadRowsRequest readRowsRequest =

        // Process each block of rows as they arrive and decode using our simple row reader.
        ServerStream<ReadRowsResponse> stream = client.readRowsCallable().call(readRowsRequest);
        for (ReadRowsResponse response : stream) {


Before trying this sample, follow the Python setup instructions in Setting Up a Python Development Environment. For more information, see the BigQuery Storage API Python API reference documentation.

from import BigQueryReadClient
from import types

# TODO(developer): Set the project_id variable.
# project_id = 'your-project-id'
# The read session is created in this project. This project can be
# different from that which contains the table.

client = BigQueryReadClient()

# This example reads baby name data from the public datasets.
table = "projects/{}/datasets/{}/tables/{}".format(
    "bigquery-public-data", "usa_names", "usa_1910_current"

requested_session = types.ReadSession()
requested_session.table = table
# This API can also deliver data serialized in Apache Arrow format.
# This example leverages Apache Avro.
requested_session.data_format = types.DataFormat.AVRO

# We limit the output columns to a subset of those allowed in the table,
# and set a simple filter to only report names from the state of
# Washington (WA).
requested_session.read_options.selected_fields = ["name", "number", "state"]
requested_session.read_options.row_restriction = 'state = "WA"'

# Set a snapshot time if it's been specified.
if snapshot_millis > 0:
    snapshot_time = types.Timestamp()
    requested_session.table_modifiers.snapshot_time = snapshot_time

parent = "projects/{}".format(project_id)
session = client.create_read_session(
    # We'll use only a single stream for reading data from the table. However,
    # if you wanted to fan out multiple readers you could do so by having a
    # reader process each individual stream.
reader = client.read_rows(session.streams[0].name)

# The read stream contains blocks of Avro-encoded bytes. The rows() method
# uses the fastavro library to parse these blocks as an iterable of Python
# dictionaries. Install fastavro with the following command:
# pip install google-cloud-bigquery-storage[fastavro]
rows = reader.rows(session)

# Do any local processing by iterating over the rows. The
# google-cloud-bigquery-storage client reconnects to the API after any
# transient network errors or timeouts.
names = set()
states = set()

for row in rows:

print("Got {} unique names in states: {}".format(len(names), states))

Additional resources

What's next?

For users of the pandas and the pandas-gbq integration to BigQuery, see the tutorial Downloading BigQuery data to pandas using the BigQuery Storage API for more information about leveraging the storage API.