Data Formats

This page describes the data formats that Google BigQuery supports.

Introduction

BigQuery supports these data formats:

Choose a format based upon the following factors:

  • Your data's schema.

    CSV, JSON, and Avro all support flat data. JSON and Avro also support data with nested and repeated fields, which are useful for expressing hierarchical data. Nested and repeated fields also reduce duplication when denormalizing the data.

  • Embedded newlines.

    If your data contains embedded newlines, BigQuery can load the data much faster in JSON or Avro format.

  • External limitations.

    For example, your data might come from a document store database that natively stores data in JSON format. Or, your data might come from a source that only exports in CSV format.

Data format limits

For information about quotas and limits, see Load Jobs on the Quota Policy page.

CSV format

To change how BigQuery parses CSV data, specify additional CSV properties in the configuration.load object.

CSV data type BigQuery property
Header rows skipLeadingRows
Newline characters allowQuotedNewlines
Custom field delimiters fieldDelimiter
Trailing optional columns allowJaggedRows

JSON format

One JSON object, including any nested/repeated fields, must appear on each line.

The following example shows sample nested/repeated data:

{"kind": "person", "fullName": "John Doe", "age": 22, "gender": "Male", "citiesLived": [{ "place": "Seattle", "numberOfYears": 5}, {"place": "Stockholm", "numberOfYears": 6}]}
{"kind": "person", "fullName": "Jane Austen", "age": 24, "gender": "Female", "citiesLived": [{"place": "Los Angeles", "numberOfYears": 2}, {"place": "Tokyo", "numberOfYears": 2}]}

For instructions on loading a JSON file, see Loading Nested and Repeated JSON Data.

Back to top

Avro format

Avro is an open source data format that bundles serialized data with the data's schema in the same file.

Compressed Avro files are not supported, but compressed data blocks are. BigQuery supports the DEFLATE codec.

BigQuery converts Avro data types to the following BigQuery data types:

Avro data type BigQuery data type Notes
null BigQuery ignores these values
boolean BOOLEAN
int INTEGER
long INTEGER
float FLOAT
double FLOAT
bytes BYTES
string STRING UTF-8 only
record RECORD
  • Aliases are ignored
  • Doc is ignored
  • Default values are set at read time
  • Order is ignored
enum STRING
  • The string is the symbolic value of the enum
  • Aliases are ignored
  • Doc is ignored
array repeated fields Arrays of arrays are not supported.
map<T> RECORD BigQuery converts an Avro map<T> field to a repeated RECORD that contains two fields: a key and a value. BigQuery stores the key as a STRING, and converts the value to its corresponding data type in BigQuery.
union
  • Nullable field
  • RECORD with a list of nullable fields
  • When union only has one non-null type, it converts to a nullable field.
  • Otherwise it converts to a RECORD with a list of nullable fields. Only one of these fields will be set at read time.
fixed BYTES
  • Aliases are ignored
  • Size is ignored

Cloud Datastore backups

For information about loading Cloud Datastore backup files, see Loading Data From Cloud Datastore.

What's next

Send feedback about...

BigQuery Documentation