You can use a BigQuery connector to enable programmatic read/write access to BigQuery. This is an ideal way to process data that is stored in BigQuery. Command-line access is not exposed. The BigQuery connector is a library that enables Spark and Hadoop applications to process data from BigQuery and write data to BigQuery using its native terminology.
Pricing considerations
When using the connector, charges include BigQuery usage fees. The following service-specific charges may also apply:
- Cloud Storage - the connector downloads data into a Cloud Storage bucket before or during job execution. After the job successfully completes, the data is deleted from Cloud Storage. You are charged for this storage according to Cloud Storage pricing. To avoid excess charges, check your Cloud Storage account and remove unneeded temporary files.
- BigQuery Storage API - to achieve better performance, the connector reads data using the BigQuery Storage API. You are charged for this usage according to BigQuery Storage API pricing.
Available connectors
The following BigQuery connectors are available for use in the Hadoop ecosystem:
- The Spark BigQuery Connector
adds a Spark data source, which allows DataFrames to interact directly with
BigQuery tables using Spark's
read
andwrite
operations. - The Hive BigQuery Connector adds a Storage Handler, which allows Apache Hive to interact directly with BigQuery tables using HiveQL syntax.
- The Hadoop BigQuery Connector allows Hadoop mappers and reducers to interact with BigQuery tables using abstracted versions of the InputFormat and OutputFormat classes.
Using the connectors
For a quick start using the BigQuery connector, see the following examples:
What's next
- Learn more about BigQuery
- Follow the BigQuery example for Spark
- Learn more about the Hive BigQuery Connector
- Follow the BigQuery example for Java MapReduce