Having trouble loading your JSON into BigQuery? Try this!
Celia Ji
Strategic Cloud Engineer, Google Cloud
Daniel De Leo
Data Engineer, Google Cloud
The Problem
Have you ever tried to load a JSON file into BigQuery only to find out the file wasn’t in the proper newline delimited format (each line being an object, not an array) that BigQuery expects? Well you might still be able to load the file using only BigQuery (no other tools necessary) and a little bit of creativity!
Here are 2 examples of JSON formats which BigQuery will not load if you specify the file format as JSON (e.g. sourceFormat=NEWLINE_DELIMITED_JSON):
- Newline delimited JSON arrays:
Although the following is valid newline delimited JSON format, it will not load into BigQuery if you specify the source format as JSON because each line must be a JSON object, not an array.
- Pretty printed JSON object:
Although the following is valid JSON format, it will not load into BigQuery if you specify the source format as JSON because it is not a newline delimited JSON object but rather a pretty printed JSON object.
The Answer
So how do you actually load the JSON examples shown above without getting errors?
You can do this by loading the entire JSON file data as if it were a 1-column CSV file (e.g. sourceFormat=CSV) and then use BigQuery’s native JSON functions to parse the data into the proper JSON data type. The following sections explain how to do this in more detail.
Note: Since BigQuery row size is limited to 100 MB and you’re loading all data from the JSON file into a 1-column row, your JSON file must not be larger than 100 MB for this loading technique to work.
Load newline delimited JSON with lines of arrays
1. Upload a file with newline delimited JSON arrays to Cloud Storage
- Create a file named “my_newline_json_arrays.json” and copy the contents shown below into the file
- Copy the file into your Cloud Storage bucket
2. Create an external table over your JSON file in Cloud Storage
Execute the following SQL in your BigQuery console to create an external table over the JSON file you just created in Cloud Storage.
Note: Remember to replace the placeholders in the SQL below (YOUR_PROJECT
, YOUR_DATASET
, YOUR_BUCKET
) with your own values.
3. Now query the external table and validate the data can be read
Load pretty printed JSON files
1. Upload a file with pretty printed JSON to Cloud Storage
- Create a file named “my_pretty_printed_json_data.json” and copy the contents shown below into the file.
- Copy the file into your Cloud Storage bucket
2. Create an external table over your JSON file in Cloud Storage
Execute the following SQL in your BigQuery console to create an external table over the JSON file you just created in Cloud Storage.
Note: Remember to replace the placeholders in the SQL below (YOUR_PROJECT
, YOUR_DATASET
, YOUR_BUCKET
) with your own values.
3. Now query the external table and validate the data can be read
Note: The following SQL query aggregates the entire pretty print JSON into one row, therefore, the JSON file size limit should be < 100MB because that's the BigQuery row limit.
The Big Picture
Congrats! You’re unstoppable now that you can easily load all your JSON data and redirect your focus on deriving data insights instead of figuring out how to reformat pesky JSON files. Also don’t forget to check out all of BigQuery’s JSON functions which are sure to save you time as you explore your data, as well as BigQuery’s guide for working with JSON data. Good luck and happy querying!