Loading CSV Data from Google Cloud Storage

Loading CSV files from Cloud Storage

When you load CSV data from Cloud Storage, you can load the data into a new table or partition, or you can append to or overwrite an existing table or partition. When your data is loaded into BigQuery, it is converted into columnar format for Capacitor (BigQuery's storage format).

When you load data from Cloud Storage into a BigQuery table, the dataset that contains the table must be in the same regional or multi-regional location as the Cloud Storage bucket.

For information about loading CSV data from a local file, see Loading data into BigQuery from a local data source.

Limitations

When you load CSV data from Cloud Storage into BigQuery, note the following:

  • CSV files do not support nested or repeated data.
  • If you use gzip compression BigQuery cannot read the data in parallel. Loading compressed CSV data into BigQuery is slower than loading uncompressed data.
  • When you load CSV or JSON data, values in DATE columns must use the dash (-) separator and the must be in the following format: YYYY-MM-DD (year-month-day).
  • When you load JSON or CSV data, values in TIMESTAMP columns must use a dash (-) separator for the date portion of the timestamp, and the date must be in the following format: YYYY-MM-DD (year-month-day). The hh:mm:ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.

CSV encoding

BigQuery expects CSV data to be UTF-8 encoded. If you have CSV files with data encoded in ISO-8859-1 (also known as Latin-1) format, you should explicitly specify the encoding when you load your data so it can be converted to UTF-8.

Delimiters in CSV files can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF-8. BigQuery converts the string to ISO-8859-1 encoding and uses the first byte of the encoded string to split the data in its raw, binary state.

Required permissions

When you load data into BigQuery, you need project or dataset-level permissions that allow you to load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need access to the bucket that contains your data.

BigQuery permissions

When you are loading data into BigQuery from Cloud Storage, you must be granted the bigquery.dataOwner or bigquery.dataEditor role at the project level or at the dataset level. Both roles grant users and groups permission to load data into a new table or to append to or overwrite an existing table.

Granting the roles at the project level gives the user or group permission to load data into tables in every dataset in the project. Granting the roles at the dataset level gives the user or group the ability to load data only into tables in that dataset.

For more information on configuring dataset access, see Assigning access controls to datasets. For more information on IAM roles in BigQuery, see Access Control.

Cloud Storage permissions

In order to load data from a Cloud Storage bucket, you must be granted storage.objects.get permissions at the project level or on that individual bucket. If you are using a URI wildcard, you must also have storage.objects.list permissions.

The predefined IAM role storage.objectViewer can be granted to provide storage.objects.getand storage.objects.list permissions.

Loading CSV data into a new table

To load CSV data from Google Cloud Storage into a new BigQuery table:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. In the navigation panel, hover on a dataset, click the down arrow icon down arrow icon image, and click Create new table. The process for loading data is the same as the process for creating an empty table.

  3. On the Create Table page, in the Source Data section:

    • For Location, select Google Cloud Storage and in the source field, enter the Cloud Storage URI. Note that you cannot include multiple URIs in the BigQuery web UI, but wildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you're creating.
    • For File format, select Comma-separated values (CSV).
  4. On the Create Table page, in the Destination Table section:
    • For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're creating in BigQuery.
    • Verify that Table type is set to Native table.
  5. In the Schema section, enter the schema definition.

    • For CSV files, you can check the Auto-detect option to enable schema auto-detection.

      auto detect link

    • You can also enter schema information manually by:

      • Clicking Edit as text and entering the table schema as a JSON array:

        Add schema as JSON array

      • Using Add Field to manually input the schema:

        Add schema using add fields

  6. Select applicable items in the Options section and then click Create Table. For information on the available options, see CSV options.

Command-line

Use the bq load command, specify CSV as the source_format, and include a Cloud Storage URI. You can include a single URI, a comma-separated list of URIs or a URI containing a wildcard.

Supply the --location flag and set the value to your location.

bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

where:

  • [LOCATION] is your location. The --location flag is optional if your data is in the US or the EU multi-region location. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • [FORMAT] is CSV.
  • [DATASET] is an existing dataset.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.
  • [SCHEMA] is a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. You can also use the --autodetect flag instead of supplying a schema definition.

In addition, you can add flags for CSV options that allow you to control how BigQuery parses your data. For example, you can use the --skip_leading_rows flag to ignore header rows in the CSV file, and you can use the --encoding flag to identify the character encoding of the data.

Examples:

  • The following command loads data from gs://mybucket/mydata.csv into a table named mytable in mydataset. The schema is defined in a local schema file named myschema.json. mybucket and mydataset were created in the US multi-region location.

    bq --location=US load --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json
    
  • The following command loads data from gs://mybucket/mydata.csv into a table named mytable in mydataset. The schema is defined inline in the format [FIELD]:[DATA_TYPE], [FIELD]:[DATA_TYPE]. mybucket and mydataset were created in the US multi-region location.

    bq --location=US load --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv qtr:STRING,sales:FLOAT,year:STRING
    

    When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a field description, and you cannot specify the field mode. All field modes default to NULLABLE. To include field descriptions, modes, and RECORD types, supply a JSON schema file instead.

  • The following command loads data from gs://mybucket/mydata.csv into a table named mytable in mydataset. The schema is defined using schema auto-detection. mybucket and mydataset were created in the US multi-region location.

    bq --location=US load --autodetect --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv
    
  • The following command loads data from multiple files in gs://mybucket/ into a table named mytable in mydataset. The Cloud Storage URI uses a wildcard, and the schema is defined using schema auto-detection. mybucket and mydataset were created in the asia-northeast1 region.

    bq --location=asia-northeast1 load --autodetect --source_format=CSV mydataset.mytable gs://mybucket/mydata*.csv
    
  • The following command loads data from multiple files in gs://mybucket/ into a table named mytable in mydataset. The command includes a comma-separated list of Cloud Storage URIs, and the schema is defined using schema auto-detection. mybucket and mydataset were created in the asia-northeast1 region.

    bq --location=asia-northeast1 load --autodetect --source_format=CSV mydataset.mytable "gs://mybucket/myfile.csv,gs://mybucket/myfile2.csv"
    
  • The following command loads data from gs://mybucket/mydata.csv into a table named mytable in mydataset. The schema is defined in a local schema file named myschema.json, and the --skip_leading_rows flag is used to ignore the first two header rows in the CSV file. mybucket and mydataset were created in the asia-northeast1 region.

    bq --location=asia-northeast1 load --skip_leading_rows=2 --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json
    

API

Set the following properties to load CSV data using the API.

  1. Create a load job that points to the source data in Google Cloud Storage.

  2. Specify your location in the location property in the jobReference section of the job resource.

  3. The source URIs must be fully-qualified, in the format gs://[BUCKET]/[OBJECT]. Each URI can contain one '*' wildcard character.

  4. Specify the CSV data format by setting the configuration.load.sourceFormat property to CSV.

  5. To check the job status, call jobs.get([JOB_ID]*), where [JOB_ID] is the ID of the job returned by the initial request.

    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong. When a request fails, no table is created and no data is added.
    • If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows. Non-fatal errors are listed in the returned job object's status.errors property.

API notes:

  • Load jobs are atomic and consistent; if a load job fails, none of the data is available, and if a load job succeeds, all of the data is available.

  • As a best practice, generate a unique ID and pass it as jobReference.jobId when calling jobs.insert() to create a load job. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert() on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.

Code samples

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public void ImportDataFromCloudStorage(string projectId, string datasetId,
    string tableId, BigQueryClient client, string fileName, string folder = null)
{
    StorageClient gcsClient = StorageClient.Create();

    using (var stream = new MemoryStream())
    {
        // Set Cloud Storage Bucket name. This uses a bucket named the same as the project.
        string bucket = projectId;
        // If folder is passed in, add it to Cloud Storage File Path using "/" character
        string filePath = string.IsNullOrEmpty(folder) ? fileName : folder + "/" + fileName;
        // Download Google Cloud Storage object into stream
        gcsClient.DownloadObject(projectId, filePath, stream);

        // This example uploads data to an existing table. If the upload will create a new table
        // or if the schema in the JSON isn't identical to the schema in the table,
        // create a schema to pass into the call instead of passing in a null value.
        BigQueryJob job = client.UploadJson(datasetId, tableId, null, stream);
        // Use the job to find out when the data has finished being inserted into the table,
        // report errors etc.

        // Wait for the job to complete.
        job.PollUntilCompleted();
    }
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.csv")
gcsRef.SkipLeadingRows = 1
gcsRef.Schema = bigquery.Schema{
	{Name: "name", Type: bigquery.StringFieldType},
	{Name: "post_abbr", Type: bigquery.StringFieldType},
}
loader := client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)
loader.WriteDisposition = bigquery.WriteEmpty

job, err := loader.Run(ctx)
if err != nil {
	return err
}
status, err := job.Wait(ctx)
if err != nil {
	return err
}

if status.Err() != nil {
	return fmt.Errorf("Job completed with error: %v", status.Err())
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Job job = table.load(FormatOptions.csv(), sourceUri);
// Wait for the job to complete
try {
  Job completedJob =
      job.waitFor(
          RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
          RetryOption.totalTimeout(Duration.ofMinutes(3)));
  if (completedJob != null && completedJob.getStatus().getError() == null) {
    // Job completed successfully
  } else {
    // Handle error case
  }
} catch (InterruptedException e) {
  // Handle interrupted wait
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Use the Table.load() method to load data from a CSV file in Cloud Storage. Supply an explicit schema definition via the metadata parameter:

// Imports the Google Cloud client libraries
const BigQuery = require('@google-cloud/bigquery');
const Storage = require('@google-cloud/storage');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = "your-project-id";
// const datasetId = "my_dataset";
// const tableId = "my_table";

/**
 * This sample loads the CSV file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.csv';

// Instantiates clients
const bigquery = new BigQuery({
  projectId: projectId,
});

const storage = new Storage({
  projectId: projectId,
});

// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load
const metadata = {
  sourceFormat: 'CSV',
  skipLeadingRows: 1,
  schema: {
    fields: [
      {name: 'name', type: 'STRING'},
      {name: 'post_abbr', type: 'STRING'},
    ],
  },
};

// Loads data from a Google Cloud Storage file into the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .load(storage.bucket(bucketName).file(filename), metadata)
  .then(results => {
    const job = results[0];

    // load() waits for the job to finish
    assert.equal(job.status.state, 'DONE');
    console.log(`Job ${job.id} completed.`);

    // Check the job's status for errors
    const errors = job.status.errors;
    if (errors && errors.length > 0) {
      throw errors;
    }
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

Set the autodetect value to true so that BigQuery infers the schema from a sample of the input data:

// Imports the Google Cloud client libraries
const BigQuery = require('@google-cloud/bigquery');
const Storage = require('@google-cloud/storage');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = "your-project-id";
// const datasetId = "my_dataset";
// const tableId = "my_table";

/**
 * This sample loads the CSV file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.csv';

// Instantiates clients
const bigquery = new BigQuery({
  projectId: projectId,
});

const storage = new Storage({
  projectId: projectId,
});

// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load
const metadata = {
  sourceFormat: 'CSV',
  skipLeadingRows: 1,
  autodetect: true,
};

// Loads data from a Google Cloud Storage file into the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .load(storage.bucket(bucketName).file(filename), metadata)
  .then(results => {
    const job = results[0];

    // load() waits for the job to finish
    assert.equal(job.status.state, 'DONE');
    console.log(`Job ${job.id} completed.`);

    // Check the job's status for errors
    const errors = job.status.errors;
    if (errors && errors.length > 0) {
      throw errors;
    }
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Core\ExponentialBackoff;

/**
 * @param string $projectId  The Google project ID.
 * @param string $datasetId  The BigQuery dataset ID.
 * @param string $tableId    The BigQuery table ID.
 * @param string $bucketName The Cloud Storage bucket Name.
 * @param string $objectName The Cloud Storage object Name.
 */
function import_from_storage($projectId, $datasetId, $tableId, $bucketName, $objectName)
{
    // instantiate the bigquery table service
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    // load the storage object
    $storage = new StorageClient([
        'projectId' => $projectId,
    ]);
    $object = $storage->bucket($bucketName)->object($objectName);
    // create the import job
    $loadConfig = $table->loadFromStorage($object);
    // determine the source format from the object name
    if ('.backup_info' === substr($objectName, -12)) {
        $loadConfig->sourceFormat('DATASTORE_BACKUP');
    } elseif ('.json' === substr($objectName, -5)) {
        $loadConfig->sourceFormat('NEWLINE_DELIMITED_JSON');
    }
    $job = $table->runJob($loadConfig);
    // poll the job until it is complete
    $backoff = new ExponentialBackoff(10);
    $backoff->execute(function () use ($job) {
        print('Waiting for job to complete' . PHP_EOL);
        $job->reload();
        if (!$job->isComplete()) {
            throw new Exception('Job has not yet completed', 500);
        }
    });
    // check if the job has errors
    if (isset($job->info()['status']['errorResult'])) {
        $error = $job->info()['status']['errorResult']['message'];
        printf('Error running job: %s' . PHP_EOL, $error);
    } else {
        print('Data imported successfully' . PHP_EOL);
    }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Use the Client.load_table_from_uri() method to load data from a CSV file in Cloud Storage. Supply an explicit schema definition by setting the LoadJobConfig.schema property to a list of SchemaField objects.

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
    bigquery.SchemaField('name', 'STRING'),
    bigquery.SchemaField('post_abbr', 'STRING')
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'

load_job = client.load_table_from_uri(
    uri,
    dataset_ref.table('us_states'),
    job_config=job_config)  # API request

assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert client.get_table(dataset_ref.table('us_states')).num_rows == 50

Set the LoadJobConfig.autodetect property to True so that BigQuery infers the schema from a sample of the input data:

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'my_dataset'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.autodetect = True
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
    uri,
    dataset_ref.table('us_states'),
    job_config=job_config)  # API request

assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert client.get_table(dataset_ref.table('us_states')).num_rows == 50

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# project_id   = "Your Google Cloud project ID"
# dataset_id   = "ID of the dataset containing table"
# table_id     = "ID of the table to import data into"
# storage_path = "Storage path to file to import, eg. gs://bucket/file.csv"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id
dataset  = bigquery.dataset dataset_id
table    = dataset.table table_id

puts "Importing data from Cloud Storage file: #{storage_path}"
load_job = table.load_job storage_path

puts "Waiting for load job to complete: #{load_job.job_id}"
load_job.wait_until_done!

puts "Data imported"

Appending to or overwriting a table with CSV data

You can load additional data into a table either from source files or by appending query results. If the schema of the data does not match the schema of the destination table or partition, you can update the schema when you append to it or overwrite it.

If you update the schema when appending data, BigQuery allows you to:

  • Add new fields
  • Relax REQUIRED fields to NULLABLE

If you are overwriting a table, the schema is always overwritten. Schema updates are not restricted when you overwrite a table.

In the BigQuery web UI, you use the Write preference option to specify what action to take when you load data from a source file or from a query result. The CLI and API include the following options:

Web UI option CLI flag BigQuery API property Description
Write if empty None WRITE_EMPTY Writes the data only if the table is empty.
Append to table --noreplace or --replace=false; if --[no]replace is unspecified, the default is append WRITE_APPEND (Default) Appends the data to the end of the table.
Overwrite table --replace or --replace=true WRITE_TRUNCATE Erases all existing data in a table before writing the new data.

To load CSV data from Google Cloud Storage and to append to or overwrite a BigQuery table:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. In the navigation panel, hover on a dataset, click the down arrow icon down arrow icon image, and click Create new table. The process for loading data is the same as the process for creating an empty table.

  3. On the Create Table page, in the Source Data section:

    • For Location, select Google Cloud Storage and in the source field, enter the Cloud Storage URI. Note that you cannot include multiple URIs in the UI, but wildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you're appending or overwriting.
    • For File format, select Comma-separated values (CSV).
  4. On the Create Table page, in the Destination Table section:
    • For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're appending or overwriting.
    • Verify that Table type is set to Native table.
  5. In the Schema section, enter the schema definition.

    • For CSV files, you can check the Auto-detect option to enable schema auto-detection.

      auto detect link

    • You can also enter schema information manually by:

      • Clicking Edit as text and entering the table schema as a JSON array:

        Add schema as JSON array

      • Using Add Field to manually input the schema:

        Add schema using add fields

  6. In the Options section, for Write preference, choose Write if empty, Append to table, or Overwrite table.

    Add schema using add fields

  7. Click Create Table.

Command-line

Enter the bq load command with the --replace flag to overwrite the table. Supply the --location flag and set the value to your location. Use the --noreplace flag to append data to the table. If no flag is specified, the default is to append data.

When appending or overwriting a table, you can use the --schema_update_option flag to update the schema of the destination table with the schema of the new data. The following options can be used with the --schema_update_option flag:

  • ALLOW_FIELD_ADDITION: Adds new fields to the schema; new fields cannot be REQUIRED
  • ALLOW_FIELD_RELAXATION: Relaxes required fields to nullable; repeat this option to specify a list of values

bq --location=[LOCATION] load --[no]replace [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

where:

  • [LOCATION] is your location. The --location flag is optional if your data is in the US or the EU multi-region location.You can set a default value for the location using the .bigqueryrc file.
  • [DATASET] is an existing dataset.
  • [TABLE] is the name of the table into which you're loading data.
  • [PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.
  • [SCHEMA] is a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. You can also use the --autodetect flag instead of supplying a schema definition.

In addition, you can add flags for CSV options that allow you to control how BigQuery parses your CSV data. For example, you can use the --skip_leading_rows flag to ignore header rows in the CSV file, and you can use the --encoding flag to identify the character encoding of the data.

Examples:

  • The following command loads data from gs://mybucket/mydata.csv and overwrites a table named mytable in mydataset. The schema is defined using schema auto-detection. mybucket and mydataset were created in the US multi-region location.

    bq --location=US load --autodetect --replace --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv
    
  • The following command loads data from gs://mybucket/mydata.csv and appends data to a table named mytable in mydataset. The schema is defined using a JSON schema file — myschema.json. mybucket and mydataset were created in the US multi-region location.

    bq --location=US load --autodetect --noreplace --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json
    
  • The following command loads data from gs://mybucket/mydata.csv and appends data to a table named mytable in mydataset. A local JSON schema file named myschema.json is used. The schema definition contains new fields not present in the destination table. mybucket and mydataset were created in the asia-northeast1 region.

    bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json
    
  • The following command loads data from gs://mybucket/mydata.csv and appends data to a table named mytable in mydataset. A local JSON schema file named myschema.json is used. The schema definition changes (relaxes) two REQUIRED fields to NULLABLE. mybucket and mydataset were created in the asia-northeast1 region.

    bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_RELAXATION --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json
    

API

Set the following properties to load CSV data using the API.

  1. Create a load job that points to the source data in Google Cloud Storage.

  2. Specify your location in the location property in the jobReference section of the job resource.

  3. The source URIs must be fully-qualified, in the format gs://[BUCKET]/[OBJECT]. You can include multiple URIs as a comma-separated list. Note that wildcards are also supported when loading CSV data from Google Cloud Storage.

  4. Specify the data format by setting the configuration.load.sourceFormat property to CSV.

  5. Specify the write preference by setting the configuration.load.writeDisposition property to WRITE_TRUNCATE, WRITE_APPEND, or WRITE_EMPTY.

  6. To update the schema in your load job, set the configuration.load.schemaUpdateOptions property to ALLOW_FIELD_ADDITION or ALLOW_FIELD_RELAXATION.

Code samples

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To append rows to an existing table, set the writeDisposition value in the metadata parameter to 'WRITE_APPEND'.

// Imports the Google Cloud client libraries
const BigQuery = require('@google-cloud/bigquery');
const Storage = require('@google-cloud/storage');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = "your-project-id";
// const datasetId = "my_dataset";
// const tableId = "my_table";

/**
 * This sample loads the CSV file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.csv';

// Instantiates clients
const bigquery = new BigQuery({
  projectId: projectId,
});

const storage = new Storage({
  projectId: projectId,
});

// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load
const metadata = {
  sourceFormat: 'CSV',
  skipLeadingRows: 1,
  schema: {
    fields: [
      {name: 'name', type: 'STRING'},
      {name: 'post_abbr', type: 'STRING'},
    ],
  },
  // Set the write disposition to append to an existing table.
  writeDisposition: 'WRITE_APPEND',
};

// Loads data from a Google Cloud Storage file into the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .load(storage.bucket(bucketName).file(filename), metadata)
  .then(results => {
    const job = results[0];

    // load() waits for the job to finish
    assert.equal(job.status.state, 'DONE');
    console.log(`Job ${job.id} completed.`);

    // Check the job's status for errors
    const errors = job.status.errors;
    if (errors && errors.length > 0) {
      throw errors;
    }
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

To replace the rows in an existing table, set the writeDisposition value in the metadata parameter to 'WRITE_TRUNCATE'.

// Imports the Google Cloud client libraries
const BigQuery = require('@google-cloud/bigquery');
const Storage = require('@google-cloud/storage');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = "your-project-id";
// const datasetId = "my_dataset";
// const tableId = "my_table";

/**
 * This sample loads the CSV file at
 * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv
 *
 * TODO(developer): Replace the following lines with the path to your file.
 */
const bucketName = 'cloud-samples-data';
const filename = 'bigquery/us-states/us-states.csv';

// Instantiates clients
const bigquery = new BigQuery({
  projectId: projectId,
});

const storage = new Storage({
  projectId: projectId,
});

// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load
const metadata = {
  sourceFormat: 'CSV',
  skipLeadingRows: 1,
  schema: {
    fields: [
      {name: 'name', type: 'STRING'},
      {name: 'post_abbr', type: 'STRING'},
    ],
  },
  // Set the write disposition to append to an existing table.
  writeDisposition: 'WRITE_TRUNCATE',
};

// Loads data from a Google Cloud Storage file into the table
bigquery
  .dataset(datasetId)
  .table(tableId)
  .load(storage.bucket(bucketName).file(filename), metadata)
  .then(results => {
    const job = results[0];

    // load() waits for the job to finish
    assert.equal(job.status.state, 'DONE');
    console.log(`Job ${job.id} completed.`);

    // Check the job's status for errors
    const errors = job.status.errors;
    if (errors && errors.length > 0) {
      throw errors;
    }
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To append rows to an existing table, set the LoadJobConfig.write_disposition property to the SourceFormat constant WRITE_APPEND.

# from google.cloud import bigquery
# client = bigquery.Client()
# table_ref = client.dataset('my_dataset').table('existing_table')

previous_rows = client.get_table(table_ref).num_rows
assert previous_rows > 0

job_config = bigquery.LoadJobConfig()
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
    uri,
    table_ref,
    job_config=job_config)  # API request

assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert client.get_table(table_ref).num_rows == previous_rows + 50

To replace the rows in an existing table, set the LoadJobConfig.write_disposition property to the SourceFormat constant WRITE_TRUNCATE.

# from google.cloud import bigquery
# client = bigquery.Client()
# table_ref = client.dataset('my_dataset').table('existing_table')

previous_rows = client.get_table(table_ref).num_rows
assert previous_rows > 0

job_config = bigquery.LoadJobConfig()
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
load_job = client.load_table_from_uri(
    uri,
    table_ref,
    job_config=job_config)  # API request

assert load_job.job_type == 'load'

load_job.result()  # Waits for table load to complete.

assert load_job.state == 'DONE'
assert client.get_table(table_ref).num_rows == 50

CSV options

To change how BigQuery parses CSV data, specify additional options in the UI, CLI, or API. For more information on the CSV format, see RFC 4180.

CSV option Web UI option CLI flag BigQuery API property Description
Field delimiter Field deimiter: Comma, Tab, Pipe, Other -F or --field_delimiter fieldDelimiter (Optional) The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF8. BigQuery converts the string to ISO-8859-1 encoding, and uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (`,`).
Header rows Header rows to skip --skip_leading_rows skipLeadingRows (Optional) An integer indicating the number of header rows in the source data.
Number of bad records allowed Number of errors allowed --max_bad_records maxBadRecords (Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
Newline characters Allow quoted newlines --allow_quoted_newlines allowQuotedNewlines (Optional) Indicates whether to allow quoted data sections that contain newline characters in a CSV file. The default value is false.
Custom null values None --null_marker nullMarker (Optional) Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
Trailing optional columns Allow jagged rows --allow_jagged_rows allowJaggedRows (Optional) Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
Unknown values Ignore unknown values --ignore_unknown_values ignoreUnknownValues (Optional) Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. The sourceFormat property determines what BigQuery treats as an extra value: CSV: Trailing columns JSON: Named values that don't match any column names
Quote None --quote quote (Optional) The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ('"'). If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
Encoding None -E or --encoding encoding (Optional) The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8. BigQuery decodes the data after the raw, binary data has been split using the values of the quote and fieldDelimiter properties.
Was this page helpful? Let us know how we did:

Send feedback about...