Jump to Content
Data Analytics

Introducing the BigQuery Terraform module

August 20, 2019
Tony DiGangi

Infrastructure Cloud Consultant

It’s no secret software developers love to automate their work away, and cloud development is no different. Since the release of the Cloud Foundation Toolkit (CFT), we’ve offered automation templates with Deployment Manager and Terraform to help engineers get set up with Google Cloud Platform (GCP) quickly. But as useful as the Terraform offering was, it was missing a critical module for a critical piece of GCP: BigQuery.

Fortunately, those days are over. With the BigQuery module for Terraform, you can now automate the instantiation and deployment of your BigQuery datasets and tables. This means you have an open-source option to start using BigQuery for data analytics.

In building the module, we applied the flexibility and extensibility of Terraform throughout and adhered to the following principles:

  • Referenceable templates

  • Modular, loosely coupled design for reusability

  • Provisioning and association for both datasets and tables

  • Support for full unit testing (via Kitchen-Terraform)

  • Access control (coming soon)

By including the BigQuery Terraform module in your larger CFT scripts, it’s possible for you to go effectively from zero to ML in minutes, with significantly reduced barriers to implementation. 

Let’s walk through how to set this up.

Building blocks: GCP and Terraform prerequisites

To use the BigQuery Terraform module, you’ll need—you guessed it—to have BigQuery and Terraform ready to go.

Note: The steps outlined below are applicable for Unix- and Linux-based devices, and have not been optimized for CI/CD systems or production use.

1. Download the Terraform binary that matches your system type and Terraform installation process.

2. Install Google Cloud SDK on your local machine.

3. Start by creating a GCP project in your organization’s folder and project. Try something via Terraform like the following:

Loading...

4. Let’s set up some environment variables to use. Ensure you updated the values to accurately reflect your environment.

Loading...

5. Go ahead and enable the BigQuery API (or use the helpers directory in the module instead)

Loading...

6. Establish an identity with the IAM permissions required

Loading...

7. Browse through the examples directory to get a full list of examples that are possible within the module.

What’s in the box: Get to know the Terraform module

The BigQuery module is packaged in a self-contained GitHub repository for you to easily download (or reference) and deploy. Included in the repo is a central module that supports both Terraform v0.12.X and v0.11.X, allowing users (both human and GCP service accounts) to dynamically deploy datasets with any number of tables attached to the dataset. (By the way, the BigQuery module has you covered in case you’re planning to partition your tables using a TIMESTAMP OR DATE column to optimize for faster retrieval and lower query costs.) 

To enforce naming standardization, the BigQuery module creates a single dataset that is referenced in the multiple tables that are created, which streamlines the creation of multiple instances and generates individual Terraform state files per BigQuery dataset. This is especially useful for customers with hundreds of tables in dozens of datasets, who don’t want to get stuck with manual creation. That said, the module is fundamentally an opinionated method for setting up your datasets and table schemas; you’ll still need to handle your data ingestion or upload via any of the methods outlined here, as that’s not currently not supported by Terraform.

In addition, the repo is packaged with a rich set of test scripts that use Kitchen-Terraform plugins, robust examples on how to use the module in your deployments, major version upgrade guides, and helper files to get users started quickly.

Putting them together: Deploying the module

Now that you have BigQuery and Terraform set up, it’s time to plug them together. 

1. Start by cloning the repository:

Loading...

2. If you didn’t enable the BigQuery API earlier and create the service account with permissions, run the setup-sa.sh quickstart script in the helpers directory of the repo. This will set up the service account and permissions, and enable the BigQuery API.

Loading...

3. Define your BigQuery table schema, or try out an example schema here.

4. Create a deployment (module) directory.

Loading...

5. Create the deployment files: main.tf, variables.tf, outputs.tf, and optionally a terraform.tfvars (in case you want to override default vars in the variables.tf file):

Loading...

6. Populate the files as detailed below.

Main.tf

Loading...

Outputs.tf

Loading...

Terraform.tfvars

Loading...

Variables.tf

Loading...

7. Navigate to the correct directory.

Loading...

8. Initialize the directory and plan.

Loading...

9. Apply the changes.

Loading...

What’s next?

That’s it! You’ve used the BigQuery Terraform module to deploy your dataset and tables, and you’re now ready to load in your data for querying. We think this fills a critical gap in our Cloud Foundations Toolkit so you can easily stand up BigQuery with an open-source, extensible solution. Set it and forget it, or update it anytime you need to change your schema or modify your table structure. Once you’ve given it a shot, if you have any questions, give us feedback by opening an issue. Watch or star the module to stay on top of future releases and enjoy all your newfound free time (we hear BQML is pretty fun).
Posted in