BigQuery connector

You can use a BigQuery connector to enable programmatic read/write access to BigQuery. This is an ideal way to process data that is stored in BigQuery. No command-line access is exposed. The BigQuery connector is a Java library that enables Hadoop to process data from BigQuery using abstracted versions of the Apache Hadoop InputFormat and OutputFormat classes.

Pricing considerations

When using the connector, you will also be charged for any associated BigQuery usage fees. Additionally, the BigQuery connector downloads data into a Cloud Storage bucket before running a Hadoop job. After the Hadoop job successfully completes, the data is deleted from Cloud Storage. You are charged for storage according to Cloud Storage pricing. To avoid excess charges, check your Cloud Storage account and make sure to remove unneeded temporary files.

Getting the connector

Cloud Dataproc clusters

The BigQuery connector is installed by default on all Cloud Dataproc 1.0-1.2 cluster nodes under /usr/lib/hadoop/lib/. It's available in both Spark and PySpark environments.

Because BigQuery connector is not installed by default in Cloud Dataproc 1.3 and higher, you should use it in one of the following ways:

  1. install the BigQuery connector using initialization action
  2. specify the BigQuery connector in the jars parameter when submitting a job:
    --jars=gs://hadoop-lib/bigquery/bigquery-connector-latest-hadoop2.jar
  3. include the BigQuery connector classes in the application's jar-with-dependencies

Other Spark/Hadoop clusters

You can can download the BigQuery connector for Hadoop 1.x or the BigQuery connector for Hadoop 2.x. For more information, see bigdata-interop on GitHub.

Using the connector

To get started quickly using the BigQuery connector, see the following examples:

Java version

The BigQuery connector requires Java 8.

Apache Maven Dependency Information

<dependency>
    <groupId>com.google.cloud.bigdataoss</groupId>
    <artifactId>bigquery-connector</artifactId>
    <version>insert "x.x.x" connector version number here</version>
</dependency>

For more detailed information, see the BigQuery connector release notes and Javadoc reference.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation