Datastore I/O

The Dataflow SDKs provide an API for reading data from and writing data to a Google Cloud Datastore database. The Datastore I/O Read and Write transforms let you read or write a PCollection of Datastore Entity objects, which are analagous to rows in a traditional database table.

Reading from Datastore

To read from Datastore, you'll need to apply the Datastore Read transform and supply the target Datastore dataset and the query to use when reading. Optionally, you can provide a namespace to query within; your read will return only Datastore entities whose key matches the provided namespace.

  • Project ID: A String containing ID of the Cloud Platform project that contains your Datastore database.
  • Query: A datastore Query object that represents the query to use when reading.
  • Namespace (optional): A String containing a namespace to query within.

Read operations using Datastore I/O return a PCollection of Datastore Entity objects. Entities are data objects in Cloud Datastore.

The following example code shows a simple read using DatastoreIO:

Java

PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Query query = ...;
String projectId = "...";

Pipeline p = Pipeline.create(options);
PCollection<Entity> entities = p.apply(
   DatastoreIO.v1().read()
       .withProjectId(projectId)
       .withQuery(query));

Note: Reads using DatastoreIO typically use multiple workers to read in parallel. However, not all queries can be parallelized, such as if you specify a limit or if the query contains certain inequality filters. For such queries the Dataflow service will use a single Dataflow worker to ensure data correctness. This behavior can have implications for your pipeline's throughput.

Writing to Datastore

To write to Datastore, you'll need to format your output as a PCollection of Datastore entity objects, and then apply a Datastore Write transform. You'll need to pass the Cloud Platform project ID that contains your Datastore database.

The following example code shows a simple write using Datastore I/O:

Java

PCollection<Entity> entities = ...;
entities.apply(DatastoreIO.v1().write().withProjectId(projectId));

The entities you write to Datastore must have complete Keys. A complete Key specifies both the name and id for the entity. If you want to write an entity to a specific namespace, you'll need to specify that namespace in the corresponding property of your entity's key.

Entities you write using Dataflow are committed to Datastore as upsert (update or insert) mutation operations, meaning any entities that already exist in Datastore are overwritten, and any other entities are inserted.

Send feedback about...

Cloud Dataflow Documentation