Hadoop on Google Cloud Platform

Cloud Datastore Connector for Hadoop

The Google Cloud Datastore connector for Hadoop is a Java library that enables Hadoop to process to read and write data to/from Cloud Datastore programmatically.

Alternatives to the Datastore connector

Since the Cloud Datastore connector is deprecated and no longer supported, we recommend one of the following methods for accessing Cloud Datastore data.

  1. Cloud Dataflow - Dataflow pipelines can read from and write to Cloud Datastore in all modes, either when run locally or on the Dataflow service. Your pipelines can connect to, read, write, and update Cloud Datastore directly by using the Cloud Datastore SDK.
  2. Google AppEngine MapReduce - Use the open source MapReduce library that runs within App Engine and leverages Cloud Datastore data and TaskQueues.
  3. Cloud Datastore Backups to GCS - Backup Cloud Datastore data to Google Cloud Storage(GCS), leveraging the standard backup mechanisms. Once data is in GCS, you can use the Google Hadoop on Google Cloud Storage Connector to connect to Hadoop clusters running on Google Compute Engine (GCE). We have also integrated deployment of the Hadoop on Cloud Storage Connector with bdutil, the Google Cloud Platform Hadoop Deployment toolset.
  4. Analyze Cloud Datastore data via BigQuery - Create a data processing pipeline to export data from Cloud Datastore and load data into BigQuery.

Pricing considerations

See Cloud Datastore pricing for the per-operation costs that apply to reading from and writing to Cloud Datastore. Cloud Datastore costs are, on an average per-byte basis, greater than Google Cloud Storage costs, but Cloud Datastore allows applying more sophisticated and fine-grained access patterns to the data

Back to top