Loading Data from Cloud Storage

This page describes how to load data from Google Cloud Storage into BigQuery.

Before you begin

To load data into BigQuery, you must set up billing and ensure that you have read access to the data source and write access to the destination table.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Projects page

  3. Enable billing for your project.

    Enable billing

  4. Ensure that you note the path to the Google Cloud Storage bucket containing the data you are loading into BigQuery. The path takes the following form: gs://[BUCKET]/[OBJECT]. For information on viewing buckets and objects in Google Cloud Storage, see Quickstart Using the Console.
  5. Ensure that you have read access to your data source. If you are loading content from Google Cloud Storage, and you are an owner of the project that contains your data source, you probably have read access.

    To set READ access on a Cloud Storage object, see Creating and Managing Access Control Lists (ACLs) in the Cloud Storage documentation.

  6. Ensure that you have write access to your destination table. If you are the owner of the dataset that contains your destination table, you probably have write access.

    To set write access to a dataset in BigQuery:

    1. Go to the BigQuery web UI.
      Go to the BigQuery web UI
    2. In the navigation, hover on a dataset ID and click the down arrow icon down arrow icon image next to the ID and click Share dataset.
    3. Add a person and give that person edit access, then click Save changes.

Loading data from Google Cloud Storage

BigQuery supports loading data from these storage classes:

  • Multi-Regional
  • Regional
  • Nearline
  • Standard
  • Durable Reduced Availability

To load data from Google Cloud Storage:

Web UI

  1. Go to the BigQuery web UI. Go to the {{ bq_ui }}
  2. In the navigation, hover on a dataset ID and click the down arrow icon down arrow icon image next to the ID and click Create new table.
  3. Under Source Data, select Google Cloud Storage for the Location.
  4. Specify the location of the source data using the path:
    gs://[BUCKET]/[OBJECT]
  5. Under Destination Table, enter a value for the destination table name.
  6. In the Schema section, input the table's schema.
  7. Click the Create Table button.

Command-line

Use the bq load command and include a Cloud Storage URI for the source argument:

bq load [DATASET].[TABLE_NAME] [PATH_TO_SOURCE] [SCHEMA]

where:

  • [DATASET].[TABLE_NAME] is a fully qualified table name, where [DATASET] represents an existing dataset.
  • [PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
  • [SCHEMA] is a valid schema. The schema can be a local file, or it can be typed as part of the command.

For example, the following examples all load data from Cloud Storage:

bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
bq load ds.small gs://mybucket/small.csv name:integer,value:string
bq load ds.small gs://mybucket/small.csv field1,field2,field3

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public void ImportDataFromCloudStorage(string projectId, string datasetId,
    string tableId, BigqueryClient client, string fileName, string folder = null)
{
    StorageClient gcsClient = StorageClient.Create();

    using (var stream = new MemoryStream())
    {
        // Set Cloud Storage Bucket name. This uses a bucket named the same as the project.
        string bucket = projectId;
        // If folder is passed in, add it to Cloud Storage File Path using "/" character
        string filePath = string.IsNullOrEmpty(folder) ? fileName : folder + "/" + fileName;
        // Download Google Cloud Storage object into stream
        gcsClient.DownloadObject(projectId, filePath, stream);

        // This example uploads data to an existing table. If the upload will create a new table
        // or if the schema in the JSON isn't identical to the schema in the table,
        // create a schema to pass into the call instead of passing in a null value.
        BigqueryJob job = client.UploadJson(datasetId, tableId, null, stream);
        // Use the job to find out when the data has finished being inserted into the table,
        // report errors etc.

        // Wait for the job to complete.
        job.PollUntilCompleted();
    }
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

// For example, "gs://data-bucket/path/to/data.csv"
gcsRef := bigquery.NewGCSReference(gcsURI)
gcsRef.AllowJaggedRows = true
// TODO: set other options on the GCSReference.

loader := client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)
loader.CreateDisposition = bigquery.CreateNever
// TODO: set other options on the Loader.

job, err := loader.Run(ctx)
if err != nil {
	return err
}
status, err := job.Wait(ctx)
if err != nil {
	return err
}
if err := status.Err(); err != nil {
	return err
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Job job = table.load(FormatOptions.csv(), sourceUri);
// Wait for the job to complete
try {
  Job completedJob = job.waitFor(WaitForOption.checkEvery(1, TimeUnit.SECONDS),
      WaitForOption.timeout(3, TimeUnit.MINUTES));
  if (completedJob != null && completedJob.getStatus().getError() == null) {
    // Job completed successfully
  } else {
    // Handle error case
  }
} catch (InterruptedException | TimeoutException e) {
  // Handle interrupted wait
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

function importFileFromGCS (datasetId, tableId, bucketName, fileName, projectId) {
  // Instantiates clients
  const bigquery = BigQuery({
    projectId: projectId
  });
  const storage = Storage({
    projectId: projectId
  });

  // References an existing dataset, e.g. "my_dataset"
  const dataset = bigquery.dataset(datasetId);
  // References an existing dataset, e.g. "my_dataset"
  const table = dataset.table(tableId);
  // References an existing bucket, e.g. "my-bucket"
  const bucket = storage.bucket(bucketName);
  // References an existing file, e.g. "file.txt"
  const file = bucket.file(fileName);

  let job;

  // Imports data from a GCS file into a table
  return table.import(file)
    .then((results) => {
      job = results[0];
      console.log(`Job ${job.id} started.`);
      return job.promise();
    })
    .then((results) => {
      console.log(`Job ${job.id} completed.`);
      return results;
    });
}

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

use Google\Cloud\ServiceBuilder;
use Google\Cloud\ExponentialBackoff;

/**
 * @param string $projectId  The Google project ID.
 * @param string $datasetId  The BigQuery dataset ID.
 * @param string $tableId    The BigQuery table ID.
 * @param string $bucketName The Cloud Storage bucket Name.
 * @param string $objectName The Cloud Storage object Name.
 */
function import_from_storage($projectId, $datasetId, $tableId, $bucketName, $objectName)
{
    // determine the import options from the object name
    $options = [];
    if ('.backup_info' === substr($objectName, -12)) {
        $options['jobConfig'] = ['sourceFormat' => 'DATASTORE_BACKUP'];
    } elseif ('.json' === substr($objectName, -5)) {
        $options['jobConfig'] = ['sourceFormat' => 'NEWLINE_DELIMITED_JSON'];
    }
    // instantiate the bigquery table service
    $builder = new ServiceBuilder([
        'projectId' => $projectId,
    ]);
    $bigQuery = $builder->bigquery();
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    // load the storage object
    $storage = $builder->storage();
    $object = $storage->bucket($bucketName)->object($objectName);
    // create the import job
    $job = $table->loadFromStorage($object, $options);
    // poll the job until it is complete
    $backoff = new ExponentialBackoff(10);
    $backoff->execute(function () use ($job) {
        print('Waiting for job to complete' . PHP_EOL);
        $job->reload();
        if (!$job->isComplete()) {
            throw new Exception('Job has not yet completed', 500);
        }
    });
    // check if the job has errors
    if (isset($job->info()['status']['errorResult'])) {
        $error = $job->info()['status']['errorResult']['message'];
        printf('Error running job: %s' . PHP_EOL, $error);
    } else {
        print('Data imported successfully' . PHP_EOL);
    }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def load_data_from_gcs(dataset_name, table_name, source):
    bigquery_client = bigquery.Client()
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)
    job_name = str(uuid.uuid4())

    job = bigquery_client.load_table_from_storage(
        job_name, table, source)

    job.begin()

    wait_for_job(job)

    print('Loaded {} rows into {}:{}.'.format(
        job.output_rows, dataset_name, table_name))
def wait_for_job(job):
    while True:
        job.reload()
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# project_id   = "Your Google Cloud project ID"
# dataset_id   = "ID of the dataset containing table"
# table_id     = "ID of the table to import data into"
# storage_path = "Storage path to file to import, eg. gs://bucket/file.csv"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id
dataset  = bigquery.dataset dataset_id
table    = dataset.table table_id

puts "Importing data from Cloud Storage file: #{storage_path}"
load_job = table.load storage_path

puts "Waiting for load job to complete: #{load_job.job_id}"
load_job.wait_until_done!

puts "Data imported"

For more information about loading data with a POST request, see loading data with a POST request.

What's next

Send feedback about...

BigQuery Documentation