You can use a BigQuery connector to enable programmatic read/write access to BigQuery. This is an ideal way to process data that is stored in BigQuery. Command-line access is not exposed. The BigQuery connector is a library that enables Spark and Hadoop applications to process data from BigQuery and write data to BigQuery using its native terminology.
When using the connector, charges include BigQuery usage fees. The following service-specific charges may also apply:
- Cloud Storage - the connector downloads data into a Cloud Storage bucket before or during job execution. After the job successfully completes, the data is deleted from Cloud Storage. You are charged for this storage according to Cloud Storage pricing. To avoid excess charges, check your Cloud Storage account and remove unneeded temporary files.
- BigQuery Storage API - to achieve better performance, the connector reads data using the BigQuery Storage API. You are charged for this usage according to BigQuery Storage API pricing.
The following BigQuery connectors are available for use in the Hadoop eco-system:
- The Spark BigQuery Connector
adds a Spark data source, which allows DataFrames to interact directly with
BigQuery tables using familiar
- The Hadoop BigQuery Connector allows Hadoop mappers and reducers to interact with BigQuery tables using abstracted versions of the InputFormat and OutputFormat classes.
Using the connectors
For a quick start using the BigQuery connector, see the following examples:
- Learn more about BigQuery
- Follow the BigQuery example for Spark
- Follow the BigQuery example for Java MapReduce