Class ObjectWriteStream (2.29.0-rc)

Defines a std::basic_ostream<char> to write to a GCS object.

This class is used to upload objects to GCS. It can handle objects of any size, but keep the following considerations in mind:

  • This API is designed for applications that need to stream the object payload. If you have the payload as one large buffer consider using Client::InsertObject(); it is simpler and faster in most cases.
  • This API can be used to perform unformatted I/O, as well as formatted I/O using the familiar operator<< APIs.
  • Note that formatted I/O typically implies some form of buffering and data copying.
    • For best performance, consider using the .write() member function.
  • GCS expects to receive data in multiples of the upload quantum (256KiB). Sending a buffer that is not a multiple of this quantum terminates the upload.
    • Consequently, this class must maintain an internal buffer before sending the data to the service.
    • Understanding how this buffer is used is important to get the best possible performance.
    • When using unformatted I/O, try to size your data in multiples of the upload quantum, as this often results in better performance.

The maximum size of this internal buffer is configured using UploadBufferSizeOption. As with all options, this can be set when the Client object is created. The current default value is 8 MiB, but this default value can change. If the size of this buffer is important for your application please set the value explicitly. You can also provide an override when calling Client::WriteObject(). Note that this setting is expressed in bytes, but it is always rounded (up) to a multiple of the upload quantum.

Unformatted I/O

On a .write() call this class attempts to send the data immediately. That is, without copying it to the internal buffer. If any previously buffered data and the data provided in the .write() call are larger than the maximum size of the internal buffer then the largest amount of data that is a multiple of the upload quantum is flushed. Any data in excess of a multiple of the upload quantum are buffered for the next upload.

These examples may clarify how this works:

  1. Consider a fresh ObjectWriteStream, configured to buffer at most 256KiB. Assume this stream receives a .write() call with 257 KiB of data. The first 256 KiB are immediately sent and the remaining 1 KiB is buffered for a future upload.
    • If the same stream receives another .write() call with 256 KiB then it will send the buffered 1 KiB of data and the first 255 KiB from the new buffer. The last 1 KiB is buffered for a future upload.
  2. Consider a fresh ObjectWriteStream, configured to buffer at most 256KiB. If this stream receives a .write() call with 4 MiB of data the data is sent immediately. No data is buffered, as the data size is a multiple of the upload quantum.
  3. Consider a stream configured to buffer 512 KiB before flushing. Assume this stream has 256 KiB of data in its buffer from previous buffered I/O. If this stream receives a .write() call with 1024 KiB then both the 256 KiB and the 1024 KiB of data are flushed.
Formatted I/O

When performing formatted I/O, typically used via operator<<, this class will buffer data based on the UploadBufferSizeOption setting.

Recommendations

For best performance uploading data we recommend using exclusively the unbuffered I/O API. Furthermore, we recommend that applications use data in multiples of the upload quantum in all calls to .write(). Larger buffers result in better performance. Our empirical results show that these improvements tapper off around 32MiB or so.

If you are planning to use unbuffered I/O, and you are already planning to provide large buffers in the .write() calls, then there is no need to configure a large value for UploadBufferSizeOption. As described above, calling .write() with more data than the UploadBufferSizeOption immediately flushes the data and only leaves any non-multiple of 256 KiB in the internal buffer.

Suspending Uploads

As it is customary in C++, the destructor of this class finalizes the upload. If you want to prevent the class from finalizing an upload, use the Suspend() function.

Examples
Starting a resumable upload.
  namespace gcs = ::google::cloud::storage;
  return [](gcs::Client client, std::string const& bucket_name,
            std::string const& object_name) {
    gcs::ObjectWriteStream stream = client.WriteObject(
        bucket_name, object_name, gcs::NewResumableUploadSession(),
        gcs::AutoFinalizeDisabled());
    auto session_id = stream.resumable_session_id();
    std::cout << "Created resumable upload: " << session_id << "\n";
    // Because this stream was created with `AutoFinalizeDisabled()` its
    // destructor will *not* finalize the upload, allowing a separate process or
    // function to resume and continue the upload.
    stream << "This data will not get uploaded, it is too small\n";
    return session_id;
  }
Resuming a resumable upload.
  namespace gcs = ::google::cloud::storage;
  using ::google::cloud::StatusOr;
  [](gcs::Client client, std::string const& bucket_name,
     std::string const& object_name, std::string const& session_id) {
    // Restore a resumable upload stream, the library automatically queries the
    // state of the upload and discovers the next expected byte.
    gcs::ObjectWriteStream stream =
        client.WriteObject(bucket_name, object_name,
                           gcs::RestoreResumableUploadSession(session_id));
    if (!stream.IsOpen() && stream.metadata().ok()) {
      std::cout << "The upload has already been finalized.  The object "
                << "metadata is: " << *stream.metadata() << "\n";
    }
    if (stream.next_expected_byte() == 0) {
      // In this example we create a small object, smaller than the resumable
      // upload quantum (256 KiB), so either all the data is there or not.
      // Applications use `next_expected_byte()` to find the position in their
      // input where they need to start uploading.
      stream << R"""(
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
)""";
    }

    stream.Close();

    StatusOr<gcs::ObjectMetadata> metadata = stream.metadata();
    if (!metadata) throw std::move(metadata).status();
    std::cout << "Upload completed, the new object metadata is: " << *metadata
              << "\n";
  }

Constructors

ObjectWriteStream()

Creates a stream not associated with any buffer.

Attempts to use this stream will result in failures.

ObjectWriteStream(std::unique_ptr< internal::ObjectWriteStreambuf >)

Creates a stream associated with the give request.

Reading from the stream will result in http requests to get more data from the GCS object.

Parameter
Name Description
buf std::unique_ptr< internal::ObjectWriteStreambuf >

an initialized ObjectWriteStreambuf to upload the data.

ObjectWriteStream(ObjectWriteStream &&)

Parameter
Name Description
rhs ObjectWriteStream &&

ObjectWriteStream(ObjectWriteStream const &)

Parameter
Name Description
ObjectWriteStream const &

Operators

operator=(ObjectWriteStream &&)

Parameter
Name Description
rhs ObjectWriteStream &&
Returns
Type Description
ObjectWriteStream &

operator=(ObjectWriteStream const &)

Parameter
Name Description
ObjectWriteStream const &
Returns
Type Description
ObjectWriteStream &

Functions

metadata() const &

Access the upload results.

Note that calling these member functions before Close() is undefined behavior.

Returns
Type Description
StatusOr< ObjectMetadata > const &

metadata() &&

Access the upload results.

Note that calling these member functions before Close() is undefined behavior.

Returns
Type Description
StatusOr< ObjectMetadata > &&

received_hash() const

The received CRC32C checksum and the MD5 hash values as reported by GCS.

When the upload is finalized (via Close()) the GCS server reports the CRC32C checksum and, if the object is not a composite object, the MDF hash of the uploaded data. This class compares the reported hashes against locally computed hash values, and reports an error if they do not match.

The values are reported as comma separated tag=value pairs, e.g. crc32c=AAAAAA==,md5=1B2M2Y8AsgTpgAmY7PhCfg==. The format of this string is subject to change without notice, they are provided for informational purposes only.

See Also

https://cloud.google.com/storage/docs/hashes-etags for more information on checksums and hashes in GCS.

Returns
Type Description
std::string const &

computed_hash() const

The locally computed checksum and hashes, as a string.

This object computes the CRC32C checksum and MD5 hash of the uploaded data. There are several cases where these values may be empty or irrelevant, for example:

  • When performing resumable uploads the stream may not have had access to the full data.
  • The application may disable the CRC32C and/or the MD5 hash computation.

The string has the same format as the value returned by received_hash(). Note that the format of this string is also subject to change without notice.

See Also

https://cloud.google.com/storage/docs/hashes-etags for more information on checksums and hashes in GCS.

Returns
Type Description
std::string const &

headers() const

The headers (if any) returned by the service.

For debugging only.

Returns
Type Description
HeadersMap const &

payload() const

The returned payload as a raw string, for debugging only.

Returns
Type Description
std::string const &

swap(ObjectWriteStream &)

Parameter
Name Description
rhs ObjectWriteStream &
Returns
Type Description
void

IsOpen() const

Return true if the stream is open to write more data.

auto stream = client.WriteObject(...,
    gcs::RestoreResumableUploadSession(session_id));
if (!stream.IsOpen() && stream.metadata().ok()) {
  std::cout << "Yay! The upload was finalized previously.\n";
  return;
}
Returns
Type Description
bool

Close()

Close the stream, finalizing the upload.

Closing a stream completes an upload and creates the uploaded object. On failure it sets the badbit of the stream.

The metadata of the uploaded object, or a detailed error status, is accessible via the metadata() member function. Note that the metadata may be empty if the application creates a stream with the Fields("") parameter, applications cannot assume that all fields in the metadata are filled on success.

Exceptions
Type Description
std::ios_base::failure if the application has enabled the exception mask.
Returns
Type Description
void

resumable_session_id() const

Returns the resumable upload session id for this upload.

Note that this is an empty string for uploads that do not use resumable upload session ids. Client::WriteObject() enables resumable uploads based on the options set by the application.

Returns
Type Description
std::string const &

next_expected_byte() const

Returns the next expected byte.

For non-resumable uploads this is always zero. Applications that use resumable uploads can use this value to resend any data not committed in the GCS.

Returns
Type Description
std::uint64_t

Suspend() &&

Suspends an upload.

This is a destructive operation. Using this object after calling this function results in undefined behavior. Applications should copy any necessary state (such as the value resumable_session_id()) before calling this function.

  namespace gcs = ::google::cloud::storage;
  return [](gcs::Client client, std::string const& bucket_name,
            std::string const& object_name) {
    gcs::ObjectWriteStream stream = client.WriteObject(
        bucket_name, object_name, gcs::NewResumableUploadSession());
    auto session_id = stream.resumable_session_id();
    std::cout << "Created resumable upload: " << session_id << "\n";
    // As it is customary in C++, the destructor automatically closes the
    // stream, that would finish the upload and create the object. For this
    // example we want to restore the session as-if the application had crashed,
    // where no destructors get called.
    stream << "This data will not get uploaded, it is too small\n";
    std::move(stream).Suspend();
    return session_id;
  }
Returns
Type Description
void

last_status() const

Returns the status of partial errors.

Application may write multiple times before closing the stream, this function gives the capability to find out status even before stream closure.

This function is different than metadata() as calling metadata() before Close() is undefined.

Returns
Type Description
Status