Hadoop on Google Cloud Platform

Datastore Connector for Hadoop

The Datastore connector for Hadoop is a Java library that enables Hadoop to process to read and write data to/from Cloud Datastore programmatically.

Pricing considerations

You are charged for per-operation costs for entities that are read or written to Datastore, according to Datastore pricing. Datastore costs more on average per byte than Google Cloud Storage, but allows more sophisticated and fine-graned access patterns to the data.

Getting the connector

The connector is included with the basic Hadoop tools download, or you can download the Datastore connector for Hadoop directly.

Download the Datastore connector javadoc reference.

Using the connector

To configure and enable Datastore access, modify bigquery_env.sh and load it with bdutil:

./bdutil --bucket foo-bucket -n 5 -P my-cluster --env_var_files datastore_env.sh deploy

Learn more about writing a MapReduce job with the connector and running a MapReduce job with the connector.

Back to top