Jump to Content
Data Analytics

Having trouble loading your JSON into BigQuery? Try this!

October 6, 2023
Celia Ji

Strategic Cloud Engineer, Google Cloud

Daniel De Leo

Data Engineer, Google Cloud

The Problem

Have you ever tried to load a JSON file into BigQuery only to find out the file wasn’t in the proper newline delimited format (each line being an object, not an array) that BigQuery expects? Well you might still be able to load the file using only BigQuery (no other tools necessary) and a little bit of creativity!

Here are 2 examples of JSON formats which BigQuery will not load if you specify the file format as JSON (e.g. sourceFormat=NEWLINE_DELIMITED_JSON):

  • Newline delimited JSON arrays:
    Although the following is valid newline delimited JSON format, it will not load into BigQuery if you specify the source format as JSON because each line must be a JSON object, not an array.
Loading...

  • Pretty printed JSON object:
    Although the following is valid JSON format, it will not load into BigQuery if you specify the source format as JSON because it is not a newline delimited JSON object but rather a pretty printed JSON object.
Loading...

The Answer

So how do you actually load the JSON examples shown above without getting errors?

You can do this by loading the entire JSON file data as if it were a 1-column CSV file (e.g. sourceFormat=CSV) and then use BigQuery’s native JSON functions to parse the data into the proper JSON data type. The following sections explain how to do this in more detail.

Note: Since BigQuery row size is limited to 100 MB and you’re loading all data from the JSON file into a 1-column row, your JSON file must not be larger than 100 MB for this loading technique to work.

Load newline delimited JSON with lines of arrays

1. Upload a file with newline delimited JSON arrays to Cloud Storage

  • Create a file named “my_newline_json_arrays.json” and copy the contents shown below into the file
Loading...

  • Copy the file into your Cloud Storage bucket
Loading...

2. Create an external table over your JSON file in Cloud Storage

Execute the following SQL in your BigQuery console to create an external table over the JSON file you just created in Cloud Storage.

Note: Remember to replace the placeholders in the SQL below (YOUR_PROJECT, YOUR_DATASET, YOUR_BUCKET) with your own values.

Loading...

3. Now query the external table and validate the data can be read

Loading...

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_JXXOrzE.max-1100x1100.jpg

Load pretty printed JSON files

1. Upload a file with pretty printed JSON to Cloud Storage

  • Create a file named “my_pretty_printed_json_data.json” and copy the contents shown below into the file.
Loading...

  • Copy the file into your Cloud Storage bucket
Loading...

2. Create an external table over your JSON file in Cloud Storage

Execute the following SQL in your BigQuery console to create an external table over the JSON file you just created in Cloud Storage.

Note: Remember to replace the placeholders in the SQL below (YOUR_PROJECT, YOUR_DATASET, YOUR_BUCKET) with your own values.

Loading...

3. Now query the external table and validate the data can be read

Note: The following SQL query aggregates the entire pretty print JSON into one row, therefore, the JSON file size limit should be < 100MB because that's the BigQuery row limit.

Loading...

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_t2jspCH.max-700x700.jpg

The Big Picture

Congrats! You’re unstoppable now that you can easily load all your JSON data and redirect your focus on deriving data insights instead of figuring out how to reformat pesky JSON files. Also don’t forget to check out all of BigQuery’s JSON functions which are sure to save you time as you explore your data, as well as BigQuery’s guide for working with JSON data. Good luck and happy querying!

Posted in