Streaming downloads

Cloud Storage supports streaming data from a bucket to a process without requiring that the data first be saved to a file.

Using checksum validation when streaming

You shouldn't use a streaming download if you require checksum validation prior to the data becoming accessible. This is because streaming downloads use the Range header, and Cloud Storage does not return checksums in the response that apply to only the requested portion of object data.

It's recommended that you always use checksum validation, and you can do so after a streaming download completes; however, validating after the download completes means that any corrupted data is accessible during the time it takes to confirm the corruption and remove it.

Required roles

In order to get the required permissions for streaming downloads, ask your administrator to grant you the Storage Object Viewer (roles/storage.objectViewer) role on the bucket.

This role contains the permission required to stream downloads. To see the exact permission that's required, expand the Required permissions section:

Required permissions

  • storage.objects.get

You might also be able to get this permission with other predefined roles or custom roles.

For instructions on granting roles on buckets, see Use IAM with buckets.

Stream a download

The following examples show how to perform a download from a Cloud Storage object to a process:

Console

The Google Cloud console does not support streaming downloads. Use the Google Cloud CLI instead.

Command line

  1. Run the gcloud storage cp command using a dash for the destination URL, then pipe the data to the process:

    gcloud storage cp gs://BUCKET_NAME/OBJECT_NAME - | PROCESS_NAME

    Where:

    • BUCKET_NAME is the name of the bucket containing the object. For example, my_app_bucket.
    • OBJECT_NAME is the name of the object that you are streaming to the process. For example, data_measurements.
    • PROCESS_NAME is the name of the process into which you are feeding data. For example, analyze_data.

You can also stream data from a Cloud Storage object to a standard Linux command like sort:

gcloud storage cp gs://my_app_bucket/data_measurements - | sort

Client libraries

C++

For more information, see the Cloud Storage C++ API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

namespace gcs = ::google::cloud::storage;
[](gcs::Client client, std::string const& bucket_name,
   std::string const& object_name) {
  gcs::ObjectReadStream stream = client.ReadObject(bucket_name, object_name);

  int count = 0;
  std::string line;
  while (std::getline(stream, line, '\n')) {
    ++count;
  }
  if (stream.bad()) throw google::cloud::Status(stream.status());

  std::cout << "The object has " << count << " lines\n";
}

C#

For more information, see the Cloud Storage C# API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


using Google.Cloud.Storage.V1;
using System;
using System.IO;

public class DownloadFileSample
{
    public void DownloadFile(
        string bucketName = "your-unique-bucket-name",
        string objectName = "my-file-name",
        string localPath = "my-local-path/my-file-name")
    {
        var storage = StorageClient.Create();
        using var outputFile = File.OpenWrite(localPath);
        storage.DownloadObject(bucketName, objectName, outputFile);
        Console.WriteLine($"Downloaded {objectName} to {localPath}.");
    }
}

Go

For more information, see the Cloud Storage Go API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import (
	"context"
	"fmt"
	"io"
	"io/ioutil"
	"time"

	"cloud.google.com/go/storage"
)

// downloadFileIntoMemory downloads an object.
func downloadFileIntoMemory(w io.Writer, bucket, object string) ([]byte, error) {
	// bucket := "bucket-name"
	// object := "object-name"
	ctx := context.Background()
	client, err := storage.NewClient(ctx)
	if err != nil {
		return nil, fmt.Errorf("storage.NewClient: %w", err)
	}
	defer client.Close()

	ctx, cancel := context.WithTimeout(ctx, time.Second*50)
	defer cancel()

	rc, err := client.Bucket(bucket).Object(object).NewReader(ctx)
	if err != nil {
		return nil, fmt.Errorf("Object(%q).NewReader: %w", object, err)
	}
	defer rc.Close()

	data, err := ioutil.ReadAll(rc)
	if err != nil {
		return nil, fmt.Errorf("ioutil.ReadAll: %w", err)
	}
	fmt.Fprintf(w, "Blob %v downloaded.\n", object)
	return data, nil
}

Java

For more information, see the Cloud Storage Java API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.cloud.ReadChannel;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import com.google.common.io.ByteStreams;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class StreamObjectDownload {

  public static void streamObjectDownload(
      String projectId, String bucketName, String objectName, String targetFile)
      throws IOException {
    // The ID of your GCP project
    // String projectId = "your-project-id";

    // The ID of your GCS bucket
    // String bucketName = "your-unique-bucket-name";

    // The ID of your GCS object
    // String objectName = "your-object-name";

    // The path to the file to download the object to
    // String targetFile = "path/to/your/file";
    Path targetFilePath = Paths.get(targetFile);

    Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
    try (ReadChannel reader = storage.reader(BlobId.of(bucketName, objectName));
        FileChannel targetFileChannel =
            FileChannel.open(targetFilePath, StandardOpenOption.WRITE)) {

      ByteStreams.copy(reader, targetFileChannel);

      System.out.println(
          "Downloaded object "
              + objectName
              + " from bucket "
              + bucketName
              + " to "
              + targetFile
              + " using a ReadChannel.");
    }
  }
}

Node.js

For more information, see the Cloud Storage Node.js API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';

// The ID of your GCS file
// const fileName = 'your-file-name';

// The filename and file path where you want to download the file
// const destFileName = '/local/path/to/file.txt';

// Imports the Google Cloud client library
const {Storage} = require('@google-cloud/storage');

// Creates a client
const storage = new Storage();

async function streamFileDownload() {
  // The example below demonstrates how we can reference a remote file, then
  // pipe its contents to a local file.
  // Once the stream is created, the data can be piped anywhere (process, sdout, etc)
  await storage
    .bucket(bucketName)
    .file(fileName)
    .createReadStream() //stream is created
    .pipe(fs.createWriteStream(destFileName))
    .on('finish', () => {
      // The file download is complete
    });

  console.log(
    `gs://${bucketName}/${fileName} downloaded to ${destFileName}.`
  );
}

streamFileDownload().catch(console.error);

PHP

For more information, see the Cloud Storage PHP API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

use Google\Cloud\Storage\StorageClient;

/**
 * Download an object from Cloud Storage and save it as a local file.
 *
 * @param string $bucketName The name of your Cloud Storage bucket.
 *        (e.g. 'my-bucket')
 * @param string $objectName The name of your Cloud Storage object.
 *        (e.g. 'my-object')
 * @param string $destination The local destination to save the object.
 *        (e.g. '/path/to/your/file')
 */
function download_object(string $bucketName, string $objectName, string $destination): void
{
    $storage = new StorageClient();
    $bucket = $storage->bucket($bucketName);
    $object = $bucket->object($objectName);
    $object->downloadToFile($destination);
    printf(
        'Downloaded gs://%s/%s to %s' . PHP_EOL,
        $bucketName,
        $objectName,
        basename($destination)
    );
}

Python

For more information, see the Cloud Storage Python API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.cloud import storage


def download_blob_to_stream(bucket_name, source_blob_name, file_obj):
    """Downloads a blob to a stream or other file-like object."""

    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The ID of your GCS object (blob)
    # source_blob_name = "storage-object-name"

    # The stream or file (file-like object) to which the blob will be written
    # import io
    # file_obj = io.BytesIO()

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    # Construct a client-side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` in that it doesn't
    # retrieve metadata from Google Cloud Storage. As we don't use metadata in
    # this example, using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_file(file_obj)

    print(f"Downloaded blob {source_blob_name} to file-like object.")

    return file_obj
    # Before reading from file_obj, remember to rewind with file_obj.seek(0).

Ruby

For more information, see the Cloud Storage Ruby API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# Downloads a blob to a stream or other file-like object.

# The ID of your GCS bucket
# bucket_name = "your-unique-bucket-name"

# Name of a file in the Storage bucket
# file_name   = "some_file.txt"

# The stream or file (file-like object) to which the contents will be written
# local_file_obj = StringIO.new

require "google/cloud/storage"

storage = Google::Cloud::Storage.new
bucket  = storage.bucket bucket_name
file    = bucket.file file_name

file.download local_file_obj, verify: :none

# rewind the object before starting to read the downloaded contents
local_file_obj.rewind
puts "The full downloaded file contents are: #{local_file_obj.read.inspect}"

REST APIs

JSON API

To perform a streaming download, follow the instructions for downloading an object with the following considerations:

  • Before beginning the download, retrieve the object's metadata and save the object's generation number. Include this generation number in each of your requests to ensure that you don't download data from two different generations in the event the original gets overwritten.

  • Use the Range header in your request to retrieve a piece of the overall object, which you can send to a local process.

  • Continue making requests for successive pieces of the object, until the entire object has been retrieved.

XML API

To perform a streaming download, follow the instructions for downloading an object with the following considerations:

  • Before beginning the download, retrieve the object's metadata and save the object's generation number. Include this generation number in each of your requests to ensure that you don't download data from two different generations in the event the original gets overwritten.

  • Use the Range header in your request to retrieve a piece of the overall object, which you can send to a local process.

  • Continue making requests for successive pieces of the object, until the entire object has been retrieved.

What's next