BigQuery connector

You can use a BigQuery connector to enable programmatic read/write access to BigQuery. This is an ideal way to process data that is stored in BigQuery. Command-line access is not exposed. The BigQuery connector is a library that enables Spark and Hadoop applications to process data from BigQuery and write data to BigQuery using its native terminology.

Pricing considerations

When using the connector, charges include BigQuery usage fees. The following service-specific charges may also apply:

  • Cloud Storage - the connector downloads data into a Cloud Storage bucket before or during job execution. After the job successfully completes, the data is deleted from Cloud Storage. You are charged for this storage according to Cloud Storage pricing. To avoid excess charges, check your Cloud Storage account and remove unneeded temporary files.
  • BigQuery Storage API - to achieve better performance, the connector reads data using the BigQuery Storage API. You are charged for this usage according to BigQuery Storage API pricing.

Available connectors

The following BigQuery connectors are available for use in the Hadoop ecosystem:

  1. The Spark BigQuery Connector adds a Spark data source, which allows DataFrames to interact directly with BigQuery tables using Spark's read and write operations.
  2. The Hive BigQuery Connector adds a Storage Handler, which allows Apache Hive to interact directly with BigQuery tables using HiveQL syntax.
  3. The Hadoop BigQuery Connector allows Hadoop mappers and reducers to interact with BigQuery tables using abstracted versions of the InputFormat and OutputFormat classes.

Using the connectors

For a quick start using the BigQuery connector, see the following examples:

What's next