[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[[["\u003cp\u003eFavor using the default stream for streaming scenarios due to fewer quota limitations and better scalability, and if application-created streams are necessary, maximize throughput on each before creating more.\u003c/p\u003e\n"],["\u003cp\u003eLimit the frequency of \u003ccode\u003eCreateWriteStream\u003c/code\u003e calls to avoid high latency and potential quota issues, especially during application cold starts, and implement a gradual ramp-up for stream creation.\u003c/p\u003e\n"],["\u003cp\u003eUtilize connection multiplexing when dealing with over 20 concurrent connections, particularly when using the default stream, to improve throughput and resource utilization, available in Java and Go, and for the Beam connector.\u003c/p\u003e\n"],["\u003cp\u003eManage stream offsets carefully when using application-created streams to ensure exactly-once write semantics, and be aware of \u003ccode\u003eALREADY_EXISTS\u003c/code\u003e and \u003ccode\u003eOUT_OF_RANGE\u003c/code\u003e errors, but consider if these semantics are necessary.\u003c/p\u003e\n"],["\u003cp\u003eHandle schema updates by first updating the BigQuery table schema, and be aware that the Storage Write API detects schema changes within minutes, then close existing connections and open new ones with the updated schema, or use the Java client's \u003ccode\u003eJsonStreamWriter\u003c/code\u003e for automatic reconnection.\u003c/p\u003e\n"]]],[],null,["# BigQuery Storage Write API best practices\n=========================================\n\nThis document gives best practices for using the BigQuery Storage Write API. Before\nreading this document, read\n[Overview of the BigQuery Storage Write API](/bigquery/docs/write-api#overview).\n\nLimit the rate of stream creation\n---------------------------------\n\nBefore creating a stream, consider whether you can use the\n[default stream](/bigquery/docs/write-api#default_stream). For streaming\nscenarios, the default stream has fewer quota limitations and can scale better\nthan using application-created streams. If you use an application-created\nstream, then make sure to utilize the maximum throughput on each stream before\ncreating additional streams. For example, use\n[asynchronous writes](#do_not_block_on_appendrows_calls).\n\nFor application-created streams, avoid calling `CreateWriteStream` at a high\nfrequency. Generally, if you exceed 40-50 calls per second, the latency of the\nAPI calls grows substantially (\\\u003e25s). Make sure your application can accept a\ncold start and ramp up the number of streams gradually, and limit the rate of\n`CreateWriteStream` calls. You might also set a larger deadline to wait for the\ncall to complete, so that it doesn't fail with a `DeadlineExceeded` error. There\nis also a longer-term [quota](/bigquery/quotas#createwritestream) on the maximum\nrate of `CreateWriteStream` calls. Creating streams is a resource-intensive\nprocess, so reducing the rate of stream creations and fully utilizing existing\nstreams is the best way to not run over this limit.\n\nConnection pool management\n--------------------------\n\nThe `AppendRows` method creates a bidirectional connection to a stream. You can\nopen multiple connections on the default stream, but only a single active\nconnection on application-created streams.\n\nWhen using the default stream, you can use Storage Write API\nmultiplexing to write to multiple destination tables with shared connections.\nMultiplexing pools connections for better throughput and utilization of\nresources. If your workflow has over 20 concurrent connections, we recommend\nthat you use multiplexing. Multiplexing is available in Java\nand Go. For Java implementation details, see\n[Use multiplexing](/bigquery/docs/write-api-streaming#use_multiplexing). For Go\nimplementation details, see\n[Connection Sharing (Multiplexing)](https://pkg.go.dev/cloud.google.com/go/bigquery/storage/managedwriter#hdr-Connection_Sharing__Multiplexing_). If you use the [Beam connector with at-least-once semantics](https://beam.apache.org/documentation/io/built-in/google-bigquery/#at-least-once-semantics),\nyou can enable multiplexing through\n[UseStorageApiConnectionPool](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.html#setUseStorageApiConnectionPool-java.lang.Boolean-). Dataproc Spark\nconnector has Multiplexing enabled by default.\n\nFor best performance, use one connection for as many data writes as possible.\nDon't use one connection for just a single write, or open and close streams for\nmany small writes.\n\nThere is a quota on the number of\n[concurrent connections](/bigquery/quotas#concurrent_connections) that can be\nopen at the same time per project. Above the limit, calls to `AppendRows` fail.\nHowever, the quota for concurrent connections can be increased and should not\nnormally be a limiting factor for scaling.\n\nEach call to `AppendRows` creates a new data writer object. So,\nwhen using an application-created stream, the number of connections corresponds\nto the number of streams that have been created. Generally, a single connection\nsupports at least 1MBps of throughput. The upper bound depends on several\nfactors, such as network bandwidth, the schema of the data, and server load, but\ncan exceed 10MBps.\n\nThere is also a quota on the\n[total throughput per project](/bigquery/quotas#writeapi_throughput). This\nrepresents the bytes per second across all connections flowing through the\nStorage Write API service. If your project exceeds this quota, you\ncan [request a quota adjustment](/docs/quotas/help/request_increase).\nTypically this involves raising accompanying quotas, like the concurrent\nconnections quota, in an equal ratio.\n\nManage stream offsets to achieve exactly-once semantics\n-------------------------------------------------------\n\nThe Storage Write API only allows writes to the current end of the\nstream, which moves as data is appended. The current position in the stream is\nspecified as an offset from the start of the stream.\n\nWhen you write to an application-created stream, you can specify the stream\noffset to achieve exactly-once write semantics.\n\nWhen you specify an offset, the write operation is idempotent, which makes it\nsafe to retry due to network errors or unresponsiveness from the server.\nHandle the following errors related to offsets:\n\n- `ALREADY_EXISTS` (`StorageErrorCode.OFFSET_ALREADY_EXISTS`): The row was already written. You can safely ignore this error.\n- `OUT_OF_RANGE` (`StorageErrorCode.OFFSET_OUT_OF_RANGE`): A previous write operation failed. Retry from the last successful write.\n\nNote that these errors can also happen if you set the wrong offset value, so you\nhave to manage offsets carefully.\n\nBefore using stream offsets, consider whether you need exactly-once semantics.\nFor example, if your upstream data pipeline only guarantees at-least-once\nwrites, or if you can easily detect duplicates after data ingestion, then you\nmight not require exactly-once writes. In that case, we recommend using the\ndefault stream, which does not require keeping track of row offsets.\n\nDo not block on `AppendRows` calls\n----------------------------------\n\nThe `AppendRows` method is asynchronous. You can send a series of writes without\nblocking on a response for each write individually. The response messages on the\nbidirectional connection arrive in the same order as the requests were enqueued.\nFor the highest throughput, call `AppendRows` without blocking to wait on the\nresponse.\n\nHandle schema updates\n---------------------\n\nFor data streaming scenarios, table schemas are usually managed outside of the\nstreaming pipeline. It's common for the schema to evolve over time, for example\nby adding new nullable fields. A robust pipeline must handle out-of-band schema\nupdates.\n\nThe Storage Write API supports table schemas as follows:\n\n- The first write request includes the schema.\n- You send each row of data as a binary protocol buffer. BigQuery [maps](/bigquery/docs/write-api#data_type_conversions) the data to the schema.\n- You can omit nullable fields, but you cannot include any fields that are not present in the current schema. If you send rows with extra fields, the Storage Write API returns a [`StorageError`](/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#google.cloud.bigquery.storage.v1.StorageError) with `StorageErrorCode.SCHEMA_MISMATCH_EXTRA_FIELD`.\n\nIf you want to send new fields in the payload, you should first update the table\nschema in BigQuery. The Storage Write API detects\nschema changes after a short time, on the order of minutes. When the\nStorage Write API detects the schema change, the\n[`AppendRowsResponse`](/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#google.cloud.bigquery.storage.v1.AppendRowsResponse) response message contains a\n`TableSchema` object that describes the new schema.\n\nTo send data using the updated schema, you must close existing connections and\nopen new connections with the new schema.\n\n**Java client** . The Java client library provides some additional features for\nschema updates, through the [`JsonStreamWriter`](/java/docs/reference/google-cloud-bigquerystorage/latest/com.google.cloud.bigquery.storage.v1.JsonStreamWriter) class. After\na schema update, the `JsonStreamWriter` automatically reconnects with the\nupdated schema. You don't need to explicitly close and reopen the connection.\nTo check for schema changes programmatically, call\n[`AppendRowsResponse.hasUpdatedSchema`](/java/docs/reference/google-cloud-bigquerystorage/latest/com.google.cloud.bigquery.storage.v1.AppendRowsResponse#com_google_cloud_bigquery_storage_v1_AppendRowsResponse_getUpdatedSchema__) after the `append`\nmethod completes.\n| **Note:** Schema updates aren't immediately visible to the client library, but are detected on the order of minutes.\n\nYou can also configure the `JsonStreamWriter` to ignore unknown fields in the\ninput data. To set this behavior, call\n[`setIgnoreUnknownFields`](/java/docs/reference/google-cloud-bigquerystorage/latest/com.google.cloud.bigquery.storage.v1.JsonStreamWriter.Builder#com_google_cloud_bigquery_storage_v1_JsonStreamWriter_Builder_setIgnoreUnknownFields_boolean_). This behavior is similar to\nthe `ignoreUnknownValues` option when using the legacy\n[`tabledata.insertAll`](/bigquery/docs/reference/rest/v2/tabledata/insertAll)\nAPI. However, it can lead to unintentional data loss, because unknown fields are\nsilently dropped."]]