Streaming downloads

Cloud Storage supports streaming data from a bucket to a process without requiring that the data first be saved to a file.

Using checksum validation when streaming

You should not use a streaming download if you require checksum validation prior to the data becoming accessible. This is because streaming downloads use the Range header, and Cloud Storage does not perform checksum validation on such requests.

It's recommended that you always use checksum validation, and you can manually do so after a streaming download completes; however, validating after the download completes means that any corrupted data is accessible during the time it takes to confirm the corruption and remove it.

Prerequisites

Prerequisites can vary based on the tool used:

Console

In order to complete this guide using the Google Cloud console, you must have the proper IAM permissions. If the bucket you want to access for streaming exists in a project that you did not create, you might need the project owner to give you a role that contains the necessary permissions.

For a list of permissions required for specific actions, see IAM permissions for the Google Cloud console.

For a list of relevant roles, see Cloud Storage roles. Alternatively, you can create a custom role that has specific, limited permissions.

Command line

In order to complete this guide using a command-line utility, you must have the proper IAM permissions. If the bucket you want to access for streaming exists in a project that you did not create, you might need the project owner to give you a role that contains the necessary permissions.

For a list of permissions required for specific actions, see IAM permissions for gcloud storage commands.

For a list of relevant roles, see Cloud Storage roles. Alternatively, you can create a custom role that has specific, limited permissions.

Client libraries

In order to complete this guide using the Cloud Storage client libraries, you must have the proper IAM permissions. If the bucket you want to access for streaming exists in a project that you did not create, you might need the project owner to give you a role that contains the necessary permissions.

Unless otherwise noted, client library requests are made through the JSON API and require permissions as listed in IAM permissions for JSON methods. To see which JSON API methods are invoked when you make requests using a client library, log the raw requests.

For a list of relevant IAM roles, see Cloud Storage roles. Alternatively, you can create a custom role that has specific, limited permissions.

REST APIs

JSON API

In order to complete this guide using the JSON API, you must have the proper IAM permissions. If the bucket you want to access for streaming exists in a project that you did not create, you might need the project owner to give you a role that contains the necessary permissions.

For a list of permissions required for specific actions, see IAM permissions for JSON methods.

For a list of relevant roles, see Cloud Storage roles. Alternatively, you can create a custom role that has specific, limited permissions.

Stream a download

The following examples show how to perform a download from a Cloud Storage object to a process:

Console

The Google Cloud console does not support streaming downloads. Use the gcloud CLI instead.

Command line

  1. Run the gcloud storage cp command using a dash for the destination URL, then pipe the data to the process:

    gcloud storage cp gs://BUCKET_NAME/OBJECT_NAME - | PROCESS_NAME

    Where:

    • BUCKET_NAME is the name of the bucket containing the object. For example, my_app_bucket.
    • OBJECT_NAME is the name of the object that you are streaming to the process. For example, data_measurements.
    • PROCESS_NAME is the name of the process into which you are feeding data. For example, analyze_data.

You can also stream data from a Cloud Storage object to a standard Linux command like sort:

gcloud storage cp gs://my_app_bucket/data_measurements - | sort

Client libraries

C++

For more information, see the Cloud Storage C++ API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

namespace gcs = ::google::cloud::storage;
[](gcs::Client client, std::string const& bucket_name,
   std::string const& object_name) {
  gcs::ObjectReadStream stream = client.ReadObject(bucket_name, object_name);

  int count = 0;
  std::string line;
  while (std::getline(stream, line, '\n')) {
    ++count;
  }
  if (stream.bad()) throw google::cloud::Status(stream.status());

  std::cout << "The object has " << count << " lines\n";
}

C#

For more information, see the Cloud Storage C# API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


using Google.Cloud.Storage.V1;
using System;
using System.IO;

public class DownloadFileSample
{
    public void DownloadFile(
        string bucketName = "your-unique-bucket-name",
        string objectName = "my-file-name",
        string localPath = "my-local-path/my-file-name")
    {
        var storage = StorageClient.Create();
        using var outputFile = File.OpenWrite(localPath);
        storage.DownloadObject(bucketName, objectName, outputFile);
        Console.WriteLine($"Downloaded {objectName} to {localPath}.");
    }
}

Go

For more information, see the Cloud Storage Go API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import (
	"context"
	"fmt"
	"io"
	"io/ioutil"
	"time"

	"cloud.google.com/go/storage"
)

// downloadFileIntoMemory downloads an object.
func downloadFileIntoMemory(w io.Writer, bucket, object string) ([]byte, error) {
	// bucket := "bucket-name"
	// object := "object-name"
	ctx := context.Background()
	client, err := storage.NewClient(ctx)
	if err != nil {
		return nil, fmt.Errorf("storage.NewClient: %w", err)
	}
	defer client.Close()

	ctx, cancel := context.WithTimeout(ctx, time.Second*50)
	defer cancel()

	rc, err := client.Bucket(bucket).Object(object).NewReader(ctx)
	if err != nil {
		return nil, fmt.Errorf("Object(%q).NewReader: %w", object, err)
	}
	defer rc.Close()

	data, err := ioutil.ReadAll(rc)
	if err != nil {
		return nil, fmt.Errorf("ioutil.ReadAll: %w", err)
	}
	fmt.Fprintf(w, "Blob %v downloaded.\n", object)
	return data, nil
}

Java

For more information, see the Cloud Storage Java API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.cloud.ReadChannel;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import com.google.common.io.ByteStreams;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class StreamObjectDownload {

  public static void streamObjectDownload(
      String projectId, String bucketName, String objectName, String targetFile)
      throws IOException {
    // The ID of your GCP project
    // String projectId = "your-project-id";

    // The ID of your GCS bucket
    // String bucketName = "your-unique-bucket-name";

    // The ID of your GCS object
    // String objectName = "your-object-name";

    // The path to the file to download the object to
    // String targetFile = "path/to/your/file";
    Path targetFilePath = Paths.get(targetFile);

    Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
    try (ReadChannel reader = storage.reader(BlobId.of(bucketName, objectName));
        FileChannel targetFileChannel =
            FileChannel.open(targetFilePath, StandardOpenOption.WRITE)) {

      ByteStreams.copy(reader, targetFileChannel);

      System.out.println(
          "Downloaded object "
              + objectName
              + " from bucket "
              + bucketName
              + " to "
              + targetFile
              + " using a ReadChannel.");
    }
  }
}

Node.js

For more information, see the Cloud Storage Node.js API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';

// The ID of your GCS file
// const fileName = 'your-file-name';

// The filename and file path where you want to download the file
// const destFileName = '/local/path/to/file.txt';

// Imports the Google Cloud client library
const {Storage} = require('@google-cloud/storage');

// Creates a client
const storage = new Storage();

async function streamFileDownload() {
  // The example below demonstrates how we can reference a remote file, then
  // pipe its contents to a local file.
  // Once the stream is created, the data can be piped anywhere (process, sdout, etc)
  await storage
    .bucket(bucketName)
    .file(fileName)
    .createReadStream() //stream is created
    .pipe(fs.createWriteStream(destFileName))
    .on('finish', () => {
      // The file download is complete
    });

  console.log(
    `gs://${bucketName}/${fileName} downloaded to ${destFileName}.`
  );
}

streamFileDownload().catch(console.error);

PHP

For more information, see the Cloud Storage PHP API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

use Google\Cloud\Storage\StorageClient;

/**
 * Download an object from Cloud Storage and save it as a local file.
 *
 * @param string $bucketName The name of your Cloud Storage bucket.
 *        (e.g. 'my-bucket')
 * @param string $objectName The name of your Cloud Storage object.
 *        (e.g. 'my-object')
 * @param string $destination The local destination to save the object.
 *        (e.g. '/path/to/your/file')
 */
function download_object(string $bucketName, string $objectName, string $destination): void
{
    $storage = new StorageClient();
    $bucket = $storage->bucket($bucketName);
    $object = $bucket->object($objectName);
    $object->downloadToFile($destination);
    printf(
        'Downloaded gs://%s/%s to %s' . PHP_EOL,
        $bucketName,
        $objectName,
        basename($destination)
    );
}

Python

For more information, see the Cloud Storage Python API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.cloud import storage


def download_blob_to_stream(bucket_name, source_blob_name, file_obj):
    """Downloads a blob to a stream or other file-like object."""

    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The ID of your GCS object (blob)
    # source_blob_name = "storage-object-name"

    # The stream or file (file-like object) to which the blob will be written
    # import io
    # file_obj = io.BytesIO()

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    # Construct a client-side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` in that it doesn't
    # retrieve metadata from Google Cloud Storage. As we don't use metadata in
    # this example, using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_file(file_obj)

    print(f"Downloaded blob {source_blob_name} to file-like object.")

    return file_obj
    # Before reading from file_obj, remember to rewind with file_obj.seek(0).

Ruby

For more information, see the Cloud Storage Ruby API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# Downloads a blob to a stream or other file-like object.

# The ID of your GCS bucket
# bucket_name = "your-unique-bucket-name"

# Name of a file in the Storage bucket
# file_name   = "some_file.txt"

# The stream or file (file-like object) to which the contents will be written
# local_file_obj = StringIO.new

require "google/cloud/storage"

storage = Google::Cloud::Storage.new
bucket  = storage.bucket bucket_name
file    = bucket.file file_name

file.download local_file_obj, verify: :none

# rewind the object before starting to read the downloaded contents
local_file_obj.rewind
puts "The full downloaded file contents are: #{local_file_obj.read.inspect}"

REST APIs

JSON API

To perform a streaming download, follow the instructions for downloading an object with the following considerations:

  • Before beginning the download, retrieve the object's metadata and save the object's generation number. Include this generation number in each of your requests to ensure that you don't download data from two different generations in the event the original gets overwritten.

  • Use the Range header in your request to retrieve a piece of the overall object, which you can send to the desired local process.

  • Continue making requests for successive pieces of the object, until the entire object has been retrieved.

XML API

To perform a streaming download, follow the instructions for downloading an object with the following considerations:

  • Before beginning the download, retrieve the object's metadata and save the object's generation number. Include this generation number in each of your requests to ensure that you don't download data from two different generations in the event the original gets overwritten.

  • Use the Range header in your request to retrieve a piece of the overall object, which you can send to the desired local process.

  • Continue making requests for successive pieces of the object, until the entire object has been retrieved.

What's next