You can use a BigQuery connector to enable programmatic read/write access to BigQuery. This is an ideal way to process data that is stored in BigQuery. No command-line access is exposed. The BigQuery connector is a Java library that enables Hadoop to process data from BigQuery using abstracted versions of the Apache Hadoop InputFormat and OutputFormat classes.
When using the connector, you will also be charged for any associated BigQuery usage fees. Additionally, the BigQuery connector downloads data into a Cloud Storage bucket before running a Hadoop job. After the Hadoop job successfully completes, the data is deleted from Cloud Storage. You are charged for storage according to Cloud Storage pricing. To avoid excess charges, check your Cloud Storage account and make sure to remove unneeded temporary files.
Getting the connector
Cloud Dataproc clusters
The BigQuery connector is installed by default on all Cloud Dataproc clusters.
Other Spark/Hadoop clusters
Using the connector
To get started quickly using the BigQuery connector, see the following examples:
Apache Maven Dependency Information
<dependency> <groupId>com.google.cloud.bigdataoss</groupId> <artifactId>bigquery-connector</artifactId> <version>insert "x.x.x" connector version number here-hadoop2</version> <scope>compile</scope> </dependency>
For more detailed information, see the BigQuery connector Javadoc reference.