Loading CSV files from Cloud Storage
When you load CSV data from Cloud Storage, you can load the data into a new table or partition, or you can append to or overwrite an existing table or partition. When your data is loaded into BigQuery, it is converted into columnar format for Capacitor (BigQuery's storage format).
When you load data from Cloud Storage into a BigQuery table, the dataset that contains the table must be in the same regional or multi- regional location as the Cloud Storage bucket.
For information about loading CSV data from a local file, see Loading data into BigQuery from a local data source.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try BigQuery freeLimitations
When you load CSV data from Cloud Storage into BigQuery, note the following:
- CSV files do not support nested or repeated data.
- If you use gzip compression, BigQuery cannot read the data in parallel. Loading compressed CSV data into BigQuery is slower than loading uncompressed data. See Loading compressed and uncompressed data.
- You cannot include both compressed and uncompressed files in the same load job.
- The maximum size for a gzip file is 4 GB.
- When you load CSV or JSON data, values in
DATE
columns must use the dash (-
) separator and the date must be in the following format:YYYY-MM-DD
(year-month-day). - When you load JSON or CSV data, values in
TIMESTAMP
columns must use a dash (-
) separator for the date portion of the timestamp, and the date must be in the following format:YYYY-MM-DD
(year-month-day). Thehh:mm:ss
(hour-minute-second) portion of the timestamp must use a colon (:
) separator.
Required permissions
When you load data into BigQuery, you need permissions to run a load job and permissions that let you load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need permissions to access to the bucket that contains your data.
BigQuery permissions
At a minimum, the following permissions are required to load data into BigQuery. These permissions are required if you are loading data into a new table or partition, or if you are appending or overwriting a table or partition.
bigquery.tables.create
bigquery.tables.updateData
bigquery.jobs.create
The following predefined IAM roles include both
bigquery.tables.create
and bigquery.tables.updateData
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
The following predefined IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access lets the user create and
update tables in the dataset by using a load job.
For more information on IAM roles and permissions in BigQuery, see Access control.
Cloud Storage permissions
To load data from a Cloud Storage bucket, you must be granted
storage.objects.get
permissions. If you are using a URI wildcard,
you must also have storage.objects.list
permissions.
The predefined IAM role storage.objectViewer
can be granted to provide both storage.objects.get
and storage.objects.list
permissions.
Loading CSV data into a table
You can load CSV data from Cloud Storage into a new BigQuery table by:
- Using the Cloud Console
- Using the
bq
command-line tool'sbq load
command - Calling the
jobs.insert
API method and configuring aload
job - Using the client libraries
To load CSV data from Cloud Storage into a new BigQuery table:
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and select a dataset.
In the details panel, click Create table.
On the Create table page, in the Source section:
For Create table from, select Cloud Storage.
In the source field, browse to or enter the Cloud Storage URI. Note that you cannot include multiple URIs in the Cloud Console, but wildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you're creating.
For File format, select CSV.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
Verify that Table type is set to Native table.
In the Table name field, enter the name of the table you're creating in BigQuery.
In the Schema section, for Auto detect, check Schema and input parameters to enable schema auto detection. Alternatively, you can manually enter the schema definition by:
Enabling Edit as text and entering the table schema as a JSON array.
Using Add field to manually input the schema.
(Optional) To partition the table, choose your options in the Partition and cluster settings:
- To create a partitioned table,
click No partitioning, select Partition by field and choose a
DATE
orTIMESTAMP
column. This option is unavailable if your schema does not include aDATE
orTIMESTAMP
column. - To create an ingestion-time partitioned table, click No partitioning and select Partition by ingestion time.
- To create a partitioned table,
click No partitioning, select Partition by field and choose a
(Optional) For Partitioning filter, click the Require partition filter box to require users to include a
WHERE
clause that specifies the partitions to query. Requiring a partition filter may reduce cost and improve performance. For more information, see Querying partitioned tables. This option is unavailable if No partitioning is selected.(Optional) To cluster the table, in the Clustering order box, enter between one and four field names.
(Optional) Click Advanced options.
- For Write preference, leave Write if empty selected. This option creates a new table and loads your data into it.
- For Number of errors allowed, accept the default value of
0
or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in aninvalid
message and fail. - For Unknown values, check Ignore unknown values to ignore any values in a row that are not present in the table's schema.
- For Field delimiter, choose the character that separates the cells in your CSV file: Comma, Tab, Pipe, or Custom. If you choose Custom, enter the delimiter in the Custom field delimiter box. The default value is Comma.
- For Header rows to skip, enter the number of header rows to skip
at the top of the CSV file. The default value is
0
. - For Quoted newlines, check Allow quoted newlines to allow
quoted data sections that contain newline characters in a CSV file. The
default value is
false
. - For Jagged rows, check Allow jagged rows to accept rows in CSV
files that are missing trailing optional columns. The missing values are
treated as nulls. If unchecked, records with missing trailing columns
are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result. The default value is
false
. - For Encryption, click Customer-managed key to use a Cloud Key Management Service key. If you leave the Google-managed key setting, BigQuery encrypts the data at rest.
Click Create table.
bq
Use the bq load
command, specify CSV
using the
--source_format
flag, and include a Cloud Storage URI.
You can include a single URI, a comma-separated list of URIs, or a URI
containing a wildcard.
Supply the schema inline, in a schema definition file, or use
schema auto-detect.
(Optional) Supply the --location
flag and set the value to your
location.
Other optional flags include:
--allow_jagged_rows
: When specified, accept rows in CSV files that are missing trailing optional columns. The missing values are treated as nulls. If unchecked, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value isfalse
.--allow_quoted_newlines
: When specified, allows quoted data sections that contain newline characters in a CSV file. The default value isfalse
.--field_delimiter
: The character that indicates the boundary between columns in the data. Both\t
andtab
are allowed for tab delimiters. The default value is,
.--null_marker
: An optional custom string that represents a NULL value in CSV data.--skip_leading_rows
: Specifies the number of header rows to skip at the top of the CSV file. The default value is0
.--quote
: The quote character to use to enclose records. The default value is"
. To indicate no quote character, use an empty string.--max_bad_records
: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is0
. At most, five errors of any type are returned regardless of the--max_bad_records
value.--ignore_unknown_values
: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.--autodetect
: When specified, enable schema auto-detection for CSV and JSON data.--time_partitioning_type
: Enables time-based partitioning on a table and sets the partition type. Possible values areHOUR
,DAY
,MONTH
, andYEAR
. This flag is optional when you create a table partitioned on aDATE
,DATETIME
, orTIMESTAMP
column. The default partition type for time-based partitioning isDAY
.--time_partitioning_expiration
: An integer that specifies (in seconds) when a time-based partition should be deleted. The expiration time evaluates to the partition's UTC date plus the integer value.--time_partitioning_field
: TheDATE
orTIMESTAMP
column used to create a partitioned table. If time-based partitioning is enabled without this value, an ingestion-time partitioned table is created.--require_partition_filter
: When enabled, this option requires users to include aWHERE
clause that specifies the partitions to query. Requiring a partition filter may reduce cost and improve performance. For more information, see Querying partitioned tables.--clustering_fields
: A comma-separated list of up to four column names used to create a clustered table.--destination_kms_key
: The Cloud KMS key for encryption of the table data.For more information on the
bq load
command, see:For more information on partitioned tables, see:
For more information on clustered tables, see:
For more information on table encryption, see:
To load CSV data into BigQuery, enter the following command:
bq --location=location load \ --source_format=format \ dataset.table \ path_to_source \ schema
Where:
- location is your location. The
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file. - format is
CSV
. - dataset is an existing dataset.
- table is the name of the table into which you're loading data.
- path_to_source is a fully-qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.
- schema is a valid schema. The schema can be a local JSON file,
or it can be typed inline as part of the command. You can also use the
--autodetect
flag instead of supplying a schema definition.
Examples:
The following command loads data from gs://mybucket/mydata.csv
into a
table named mytable
in mydataset
. The schema is defined in a local
schema file named myschema.json
.
bq load \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into a
table named mytable
in mydataset
. The schema is defined in a local
schema file named myschema.json
. The CSV file includes two header rows.
If --skip_leading_rows
is unspecified, the default behavior is to assume
the file does not contain headers.
bq load \
--source_format=CSV \
--skip_leading_rows=2
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into an
ingestion-time partitioned table named mytable
in mydataset
. The schema
is defined in a local schema file named myschema.json
.
bq load \
--source_format=CSV \
--time_partitioning_type=DAY \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into a
partitioned table named mytable
in mydataset
. The table is partitioned
on the mytimestamp
column. The schema is defined in a local schema file
named myschema.json
.
bq load \
--source_format=CSV \
--time_partitioning_field mytimestamp \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into a
table named mytable
in mydataset
. The schema is auto detected.
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv
The following command loads data from gs://mybucket/mydata.csv
into a
table named mytable
in mydataset
. The schema is defined inline in the
format field:data_type,field:data_type
.
bq load \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
qtr:STRING,sales:FLOAT,year:STRING
The following command loads data from multiple files in gs://mybucket/
into a table named mytable
in mydataset
. The Cloud Storage URI uses a
wildcard. The schema is auto detected.
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata*.csv
The following command loads data from multiple files in gs://mybucket/
into a table named mytable
in mydataset
. The command includes a comma-
separated list of Cloud Storage URIs with wildcards. The schema is
defined in a local schema file named myschema.json
.
bq load \
--source_format=CSV \
mydataset.mytable \
"gs://mybucket/00/*.csv","gs://mybucket/01/*.csv" \
./myschema.json
API
Create a
load
job that points to the source data in Cloud Storage.(Optional) Specify your location in the
location
property in thejobReference
section of the job resource.The
source URIs
property must be fully-qualified, in the formatgs://bucket/object
. Each URI can contain one '*' wildcard character.Specify the CSV data format by setting the
sourceFormat
property toCSV
.To check the job status, call
jobs.get(job_id*)
, where job_id is the ID of the job returned by the initial request.- If
status.state = DONE
, the job completed successfully. - If the
status.errorResult
property is present, the request failed, and that object will include information describing what went wrong. When a request fails, no table is created and no data is loaded. - If
status.errorResult
is absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows. Non-fatal errors are listed in the returned job object'sstatus.errors
property.
- If
API notes:
Load jobs are atomic and consistent; if a load job fails, none of the data is available, and if a load job succeeds, all of the data is available.
As a best practice, generate a unique ID and pass it as
jobReference.jobId
when callingjobs.insert
to create a load job. This approach is more robust to network failure because the client can poll or retry on the known job ID.Calling
jobs.insert
on a given job ID is idempotent. You can retry as many times as you like on the same job ID, and at most one of those operations will succeed.
C#
Before trying this sample, follow the C# setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery C# API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Node.js API reference documentation.
PHP
Before trying this sample, follow the PHP setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery PHP API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Python API reference documentation.
Use the
Client.load_table_from_uri()
method to load data from a CSV file in Cloud Storage. Supply an explicit
schema definition by setting the
LoadJobConfig.schema
property to a list of
SchemaField
objects.
Ruby
Before trying this sample, follow the Ruby setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Ruby API reference documentation.
Loading CSV data into a table that uses column-based time partitioning
To load CSV data from Cloud Storage into a BigQuery table that uses column-based time partitioning:
Go
Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Appending to or overwriting a table with CSV data
You can load additional data into a table either from source files or by appending query results.
In the Cloud Console, use the Write preference option to specify what action to take when you load data from a source file or from a query result.
You have the following options when you load additional data into a table:
Console option | bq tool flag |
BigQuery API property | Description |
---|---|---|---|
Write if empty | None | WRITE_EMPTY |
Writes the data only if the table is empty. |
Append to table | --noreplace or --replace=false ; if
--[no]replace is unspecified, the default is append |
WRITE_APPEND |
(Default) Appends the data to the end of the table. |
Overwrite table | --replace or --replace=true |
WRITE_TRUNCATE |
Erases all existing data in a table before writing the new data. This action also deletes the table schema and removes any Cloud KMS key. |
If you load data into an existing table, the load job can append the data or overwrite the table.
You can append or overwrite a table by:
- Using the Cloud Console
- Using the
bq
command-line tool'sbq load
command - Calling the
jobs.insert
API method and configuring aload
job - Using the client libraries
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and select a dataset.
In the details panel, click Create table.
On the Create table page, in the Source section:
For Create table from, select Cloud Storage.
In the source field, browse to or enter the Cloud Storage URI. Note that you cannot include multiple URIs in the Cloud Console, but wildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you're appending or overwriting.
For File format, select CSV.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Table name field, enter the name of the table you're appending or overwriting in BigQuery.
Verify that Table type is set to Native table.
In the Schema section, for Auto detect, check Schema and input parameters to enable schema auto detection. Alternatively, you can manually enter the schema definition by:
Enabling Edit as text and entering the table schema as a JSON array.
Using Add field to manually input the schema.
For Partition and cluster settings, leave the default values. You cannot convert a table to a partitioned or clustered table by appending or overwriting it, and the Cloud Console does not support appending to or overwriting partitioned or clustered tables in a load job.
Click Advanced options.
- For Write preference, choose Append to table or Overwrite table.
- For Number of errors allowed, accept the default value of
0
or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in aninvalid
message and fail. - For Unknown values, check Ignore unknown values to ignore any values in a row that are not present in the table's schema.
- For Field delimiter, choose the character that separates the cells in your CSV file: Comma, Tab, Pipe, or Custom. If you choose Custom, enter the delimiter in the Custom field delimiter box. The default value is Comma.
- For Header rows to skip, enter the number of header rows to skip
at the top of the CSV file. The default value is
0
. - For Quoted newlines, check Allow quoted newlines to allow
quoted data sections that contain newline characters in a CSV file. The
default value is
false
. - For Jagged rows, check Allow jagged rows to accept rows in CSV
files that are missing trailing optional columns. The missing values are
treated as nulls. If unchecked, records with missing trailing columns
are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result. The default value is
false
. For Encryption, click Customer-managed key to use a Cloud Key Management Service key. If you leave the Google-managed key setting, BigQuery encrypts the data at rest.
Click Create table.
bq
Use the bq load
command, specify CSV
using the
--source_format
flag, and include a Cloud Storage URI.
You can include a single URI, a comma-separated list of URIs, or a URI
containing a wildcard.
Supply the schema inline, in a schema definition file, or use schema auto-detect.
Specify the --replace
flag to overwrite the
table. Use the --noreplace
flag to append data to the table. If no flag is
specified, the default is to append data.
It is possible to modify the table's schema when you append or overwrite it. For more information on supported schema changes during a load operation, see Modifying table schemas.
(Optional) Supply the --location
flag and set the value to your
location.
Other optional flags include:
--allow_jagged_rows
: When specified, accept rows in CSV files that are missing trailing optional columns. The missing values are treated as nulls. If unchecked, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value isfalse
.--allow_quoted_newlines
: When specified, allows quoted data sections that contain newline characters in a CSV file. The default value isfalse
.--field_delimiter
: The character that indicates the boundary between columns in the data. Both\t
andtab
are allowed for tab delimiters. The default value is,
.--null_marker
: An optional custom string that represents a NULL value in CSV data.--skip_leading_rows
: Specifies the number of header rows to skip at the top of the CSV file. The default value is0
.--quote
: The quote character to use to enclose records. The default value is"
. To indicate no quote character, use an empty string.--max_bad_records
: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is0
. At most, five errors of any type are returned regardless of the--max_bad_records
value.--ignore_unknown_values
: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.--autodetect
: When specified, enable schema auto-detection for CSV and JSON data.--destination_kms_key
: The Cloud KMS key for encryption of the table data.
bq --location=location load \ --[no]replace \ --source_format=format \ dataset.table \ path_to_source \ schema
where:
- location is your location.
The
--location
flag is optional. You can set a default value for the location using the .bigqueryrc file. - format is
CSV
. - dataset is an existing dataset.
- table is the name of the table into which you're loading data.
- path_to_source is a fully-qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.
- schema is a valid schema. The schema can be a local JSON file,
or it can be typed inline as part of the command. You can also use the
--autodetect
flag instead of supplying a schema definition.
Examples:
The following command loads data from gs://mybucket/mydata.csv
and
overwrites a table named mytable
in mydataset
. The schema is defined
using schema auto-detection.
bq load \
--autodetect \
--replace \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv
The following command loads data from gs://mybucket/mydata.csv
and
appends data to a table named mytable
in mydataset
. The schema is
defined using a JSON schema file — myschema.json
.
bq load \
--noreplace \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
API
Create a
load
job that points to the source data in Cloud Storage.(Optional) Specify your location in the
location
property in thejobReference
section of the job resource.The
source URIs
property must be fully-qualified, in the formatgs://bucket/object
. You can include multiple URIs as a comma-separated list. Note that wildcards are also supported.Specify the data format by setting the
configuration.load.sourceFormat
property toCSV
.Specify the write preference by setting the
configuration.load.writeDisposition
property toWRITE_TRUNCATE
orWRITE_APPEND
.
Go
Before trying this sample, follow the Go setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the
BigQuery Quickstart Using Client Libraries.
For more information, see the
BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
To replace the rows in an existing table, set the writeDisposition
value in the metadata
parameter to 'WRITE_TRUNCATE'
.
Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery PHP API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
To replace the rows in an existing table, set the LoadJobConfig.write_disposition
property to the SourceFormat
constant WRITE_TRUNCATE
.
Loading hive-partitioned CSV data
BigQuery supports loading hive-partitioned CSV data stored on Cloud Storage and will populate the hive partitioning columns as columns in the destination BigQuery managed table. For more information, see Loading Externally Partitioned Data from Cloud Storage.
Details of loading CSV data
This section describes how BigQuery handles various CSV formatting options.
Encoding
BigQuery expects CSV data to be UTF-8 encoded. If you have CSV files with data encoded in ISO-8859-1 (also known as Latin-1) format, you should explicitly specify the encoding so that BigQuery can properly convert the data to UTF-8.
If you don't specify an encoding, or if you specify UTF-8 encoding when the CSV
file is not UTF-8 encoded, BigQuery attempts to convert the data
to UTF-8. Generally, your data will be loaded successfully, but it may not match
byte-for-byte what you expect. To avoid this, specify the correct encoding by
using the --encoding
flag.
If BigQuery can't convert a character other than the ASCII 0
character, BigQuery converts the character to the standard
Unicode replacement character: �.
Field delimiters
Delimiters in CSV files can be any single-byte character. If the source file uses ISO-8859-1 encoding, any character can be a delimiter. If the source file uses UTF-8 encoding, any character in the decimal range 1-127 (U+0001-U+007F) can be used without modification. You can insert an ISO-8859-1 character outside of this range as a delimiter, and BigQuery will interpret it correctly. However, if you use a multibyte character as a delimiter, some of the bytes will be interpreted incorrectly as part of the field value.
Generally, it's a best practice to use a standard delimiter, such as a tab, pipe, or comma. The default is a comma.
Data types
Boolean. BigQuery can parse any of the following pairs for Boolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all case insensitive). Schema autodetection will automatically detect any of these except 0 and 1.
Date. Columns with DATE types must be in the form YYYY-MM-DD
.
Datetime. Columns with DATETIME types must be in the form YYYY-MM-DD
HH:MM:SS[.SSSSSS]
.
Time. Columns with TIME types must be in the form HH:MM:SS[.SSSSSS]
.
Timestamp. BigQuery accepts a variety of timestamp formats. The timestamp must include a date portion and a time portion.
The date portion can be formatted as
YYYY-MM-DD
orYYYY/MM/DD
.The timestamp portion must be formatted as
HH:MM[:SS[.SSSSSS]]
(seconds and fractions of seconds are optional).The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zone designator (
Z
). For more information, see Time zones.
For example, any of the following are valid timestamp values:
- 2018-08-19 12:11
- 2018-08-19 12:11:35
- 2018-08-19 12:11:35.22
- 2018/08/19 12:11
- 2018-07-05 12:54:00 UTC
- 2018-08-19 07:11:35.220 -05:00
- 2018-08-19T12:11:35.220Z
If you provide a schema, BigQuery also accepts Unix Epoch time for timestamp values. However, schema autodetection will not detect this case, and will treat the value as a numeric or string type instead.
Examples of Unix Epoch timestamp values:
- 1534680695
- 1.534680695e11
CSV options
To change how BigQuery parses CSV data, specify additional options
in the Cloud Console, the bq
command-line tool, or the API.
For more information on the CSV format, see RFC 4180.
CSV option | Console option | bq tool flag |
BigQuery API property | Description |
---|---|---|---|---|
Field delimiter | Field delimiter: Comma, Tab, Pipe, Custom | -F or --field_delimiter |
fieldDelimiter |
(Optional) The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF8. BigQuery converts the string to ISO-8859-1 encoding, and uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (`,`). |
Header rows | Header rows to skip | --skip_leading_rows |
skipLeadingRows |
(Optional) An integer indicating the number of header rows in the source data. |
Number of bad records allowed | Number of errors allowed | --max_bad_records |
maxBadRecords |
(Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. |
Newline characters | Allow quoted newlines | --allow_quoted_newlines |
allowQuotedNewlines |
(Optional) Indicates whether to allow quoted data sections that contain newline characters in a CSV file. The default value is false. |
Custom null values | None | --null_marker |
nullMarker |
(Optional) Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value. |
Trailing optional columns | Allow jagged rows | --allow_jagged_rows |
allowJaggedRows |
(Optional) Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats. |
Unknown values | Ignore unknown values | --ignore_unknown_values |
ignoreUnknownValues |
(Optional) Indicates if BigQuery should allow extra values
that are not represented in the table schema. If true, the extra values
are ignored. If false, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned in the job result. The default value is false. The
sourceFormat property determines what BigQuery
treats as an extra value:
|
Quote | None | --quote |
quote |
(Optional) The value that is used to quote data sections in a CSV file.
BigQuery converts the string to ISO-8859-1 encoding, and
then uses the first byte of the encoded string to split the data in its
raw, binary state. The default value is a double-quote ('"'). If your data
does not contain quoted sections, set the property value to an empty
string. If your data contains quoted newline characters, you must also set
the allowQuotedNewlines property to true . To
include the specific quote character within a quoted value, precede it
with an additional matching quote character. For example, if you want to
escape the default character ' " ', use ' "" '. |
Encoding | None | -E or --encoding |
encoding |
(Optional) The character encoding of the data. The supported values are
UTF-8 or ISO-8859-1. The default value is UTF-8. BigQuery
decodes the data after the raw, binary data has been split using the
values of the quote and fieldDelimiter
properties. |