Hadoop on Google Cloud Platform

BigQuery Connector for Hadoop

You can use a BigQuery connector to enable programmatic read/write access to Google BigQuery. This is ideal for processing data that you've already stored in BigQuery. No command-line access is exposed.

The BigQuery connector for Hadoop is a Java library that enables Hadoop to process data from BigQuery, using abstracted versions of the Hadoop InputFormat and OutputFormat classes.

Pricing considerations

The BigQuery connector for Hadoop downloads data into your Google Cloud Storage bucket before running a Hadoop job. After the Hadoop job successfully completes, the data is deleted from Cloud Storage. You are charged for storage according to Cloud Storage pricing. In order to avoid excess charges, check your Cloud Storage account and make sure unneeded temporary files are removed. By downloading the BigQuery connector for Hadoop you acknowledge and accept these additional terms.

When using the connector you will also be charged for any associated BigQuery usage fees.

Getting the connector

The connector is included with the basic Hadoop tools download, or you can download the BigQuery connector for Hadoop directly.

Download the BigQuery connector javadoc reference.

Using the connector

To configure and enable BigQuery access, modify bigquery_env.sh and load it with bdutil:

./bdutil --bucket foo-bucket -n 5 -P my-cluster --env_var_files bigquery_env.sh deploy

Learn more about writing a MapReduce job with the connector and running a MapReduce job with the connector.