Google BigQuery connector

You can use a Google BigQuery connector to enable programmatic read/write access to Google BigQuery. This is an ideal way to process data that is stored in BigQuery. No command-line access is exposed. The BigQuery connector is a Java library that enables Hadoop to process data from BigQuery using abstracted versions of the Apache Hadoop InputFormat and OutputFormat classes.

Pricing considerations

When using the connector, you will also be charged for any associated BigQuery usage fees. Additionally, the BigQuery connector downloads data into a Google Cloud Storage bucket before running a Hadoop job. After the Hadoop job successfully completes, the data is deleted from Cloud Storage. You are charged for storage according to Cloud Storage pricing. To avoid excess charges, check your Cloud Storage account and make sure to remove unneeded temporary files.

Getting the connector

Cloud Dataproc clusters

The BigQuery connector is installed by default on all Google Cloud Dataproc clusters.

Other Spark/Hadoop clusters

You can can download the BigQuery connector 1.x or the BigQuery connector 2.x. To install the connector, follow the directions in the bigdata-interop project on GitHub.

Using the connector

To get started quickly using the BigQuery connector, see the following examples:

For more detailed information, you can download the BigQuery connector Javadoc reference.

What's next

Send feedback about...

Google Cloud Dataproc Documentation