There are several strategies which can be used to upload a Blob to Google Cloud
Storage. This class provides factories which allow you to select the appropriate strategy for
your workload.
The network will only be used for the following operations:
Creating the Resumable Upload Session
Transmitting zero or more incremental chunks
Transmitting the final chunk and finalizing the Resumable Upload Session
If any of the above are interrupted with a retryable error, the Resumable Upload Session
will be queried to reconcile client side state with Cloud Storage
Each chunk is retried up to the limitations specified in
StorageOptions#getRetrySettings()
Buffer bytes to a temporary file on disk. On close()
upload the entire files contents to Cloud Storage. Delete the temporary file.
gRPC, HTTP
A Resumable Upload Session will be used to upload the file on disk.
If the upload is interrupted with a retryable error, the Resumable Upload Session will
be queried to restart the upload from Cloud Storage's last received byte
Upload the file in the fewest number of RPC possible retrying within the limitations
specified in StorageOptions#getRetrySettings()
Create a Resumable Upload Session, before transmitting bytes to Cloud Storage write
to a recovery file on disk. If the stream to Cloud Storage is interrupted with a
retryable error query the offset of the Resumable Upload Session, then open the recovery
file from the offset and transmit the bytes to Cloud Storage.
gRPC
The stream to Cloud Storage will be held open until a) the write is complete
b) the stream is interrupted
Because the bytes are journaled to disk, the upload to Cloud Storage can only
be as fast as the disk.
Opening the stream for upload will be retried up to the limitations specified in StorageOptions#getRetrySettings()
All bytes are buffered to disk and allow for recovery from any arbitrary offset.
Break the stream of bytes into smaller part objects uploading each part in parallel. Then
composing the parts together to make the ultimate object.
gRPC, HTTP
Performing parallel composite uploads costs more money.
Class A
operations are performed to create each part and to perform each compose. If a storage
tier other than
STANDARD
is used, early deletion fees apply to deletion of the parts.
An illustrative example. Upload a 5GiB object using 64MiB as the max size per part.
80 Parts will be created (Class A)
3 compose calls will be performed (Class A)
Delete 80 Parts along with 2 intermediary Compose objects (Free tier as long as STANDARD class)
Once the parts and intermediary compose objects are deleted, there will be no storage charges related to those temporary objects.
</li>
<li>
The service account/credentials used to perform the parallel composite upload require
<a href="https://cloud.google.com/storage/docs/access-control/iam-permissions#object_permissions"><code>storage.objects.delete</code></a>
in order to cleanup the temporary part and intermediary compose objects.
<p><i>To handle handle part and intermediary compose object deletion out of band</i>
passing <xref uid="" data-throw-if-not-resolved="false">PartCleanupStrategy#never()</xref> to <xref uid="com.google.cloud.storage.ParallelCompositeUploadBlobWriteSessionConfig.withPartCleanupStrategy*" data-throw-if-not-resolved="false">ParallelCompositeUploadBlobWriteSessionConfig#withPartCleanupStrategy(PartCleanupStrategy)</xref>
will prevent automatic cleanup.
</li>
<li>
Please see the <a href="https://cloud.google.com/storage/docs/parallel-composite-uploads">
Parallel composite uploads</a> documentation for a more in depth explanation of the
limitations of Parallel composite uploads.
</li>
<li>
A failed upload can leave part and intermediary compose objects behind which will count
as storage usage, and you will be billed for it.
<p>By default if an upload fails, an attempt to cleanup the part and intermediary compose
will be made. However if the program were to crash there is no means for the client to
perform the cleanup.
<p>Every part and intermediary compose object will be created with a name which ends in
<code>.part</code>. An Object Lifecycle Management rule can be setup on your bucket to automatically
cleanup objects with the suffix after some period of time. See
<a href="https://cloud.google.com/storage/docs/lifecycle">Object Lifecycle Management</a>
for full details and a guide on how to setup a <a href="https://cloud.google.com/storage/docs/lifecycle#delete">Delete</a>
rule with a <a href="https://cloud.google.com/storage/docs/lifecycle#matchesprefix-suffix">suffix match</a> condition.
</li>
<li>
Using parallel composite uploads are not a one size fits all solution. They have very
real overhead until uploading a large enough object. The inflection point is dependent
upon many factors, and there is no one size fits all value. You will need to experiment
with your deployment and workload to determine if parallel composite uploads are useful
to you.
</li>
</ol>
</td>
<td>
Automatic retires will be applied for the following:
<ol>
<li>Creation of each individual part</li>
<li>Performing an intermediary compose</li>
<li>Performing a delete to cleanup each part and intermediary compose object</li>
</ol>
Retrying the creation of the final object is contingent upon if an appropriate precondition
is supplied when calling <xref uid="com.google.cloud.storage.Storage.blobWriteSession*" data-throw-if-not-resolved="false">Storage#blobWriteSession(BlobInfo, BlobWriteOption...)</xref>.
Either <xref uid="" data-throw-if-not-resolved="false">BlobTargetOption#doesNotExist()</xref> or <xref uid="com.google.cloud.storage.Storage.BlobTargetOption.generationMatch(long)" data-throw-if-not-resolved="false">Storage.BlobTargetOption#generationMatch(long)</xref>
should be specified in order to make the final request idempotent.
<p>Each operation will be retried up to the limitations specified in <xref uid="com.google.cloud.storage.StorageOptions.getRetrySettings*" data-throw-if-not-resolved="false">StorageOptions#getRetrySettings()</xref>
</td>
<td>
<ul>
<li><a href="https://cloud.google.com/storage/docs/parallel-composite-uploads">Parallel composite uploads</a></li>
<li><a href="https://cloud.google.com/storage/docs/uploading-objects-from-memory">Direct uploads</a></li>
<li><a href="https://cloud.google.com/storage/docs/composite-objects">Compose</a></li>
<li><a href="https://cloud.google.com/storage/docs/deleting-objects">Object delete</a></li>
</ul>
</td>
public static BidiBlobWriteSessionConfig bidiWrite()
Factory to produce a resumable upload using a bi-directional stream. This should provide a
small performance increase compared to a regular resumable upload.
public static BufferToDiskThenUpload bufferToDiskThenUpload(Path path)
Create a new BlobWriteSessionConfig which will first buffer the content of the object
to a temporary file under the specified path.
Once the file on disk is closed, the entire file will then be uploaded to Cloud Storage.
See Also: Storage#blobWriteSession(BlobInfo, BlobWriteOption...), GrpcStorageOptions.Builder#setBlobWriteSessionConfig(BlobWriteSessionConfig)
public static BufferToDiskThenUpload bufferToDiskThenUpload(Collection<Path> paths)
Create a new BlobWriteSessionConfig which will first buffer the content of the object
to a temporary file under one of the specified paths.
Once the file on disk is closed, the entire file will then be uploaded to Cloud Storage.
The specifics of how the work is spread across multiple paths is undefined and subject to
change.
See Also: Storage#blobWriteSession(BlobInfo, BlobWriteOption...), GrpcStorageOptions.Builder#setBlobWriteSessionConfig(BlobWriteSessionConfig)
public static BlobWriteSessionConfig bufferToTempDirThenUpload()
Create a new BlobWriteSessionConfig which will first buffer the content of the object
to a temporary file under java.io.tmpdir.
Once the file on disk is closed, the entire file will then be uploaded to Cloud Storage.
See Also: Storage#blobWriteSession(BlobInfo, BlobWriteOption...), GrpcStorageOptions.Builder#setBlobWriteSessionConfig(BlobWriteSessionConfig)
public static JournalingBlobWriteSessionConfig journaling(Collection<Path> paths)
Create a new BlobWriteSessionConfig which will journal writes to a temporary file under
one of the specified paths before transmitting the bytes to Cloud Storage.
The specifics of how the work is spread across multiple paths is undefined and subject to
change.
See Also: Storage#blobWriteSession(BlobInfo, BlobWriteOption...), GrpcStorageOptions.Builder#setBlobWriteSessionConfig(BlobWriteSessionConfig)
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-11-19 UTC."],[],[]]