The Dataflow SDKs provide an API for reading data from and writing data to a
Google Cloud Datastore database. The Datastore I/O Read and
Write transforms let you read or write a
PCollection of Datastore
Entity objects, which are analagous to rows in a traditional database table.
Reading from Datastore
To read from Datastore, you'll need to apply the Datastore Read transform and supply the target Datastore dataset and the query to use when reading. Optionally, you can provide a namespace to query within; your read will return only Datastore entities whose key matches the provided namespace.
- Project ID: A
Stringcontaining ID of the Cloud Platform project that contains your Datastore database.
- Query: A datastore
Queryobject that represents the query to use when reading.
- Namespace (optional): A
Stringcontaining a namespace to query within.
Read operations using Datastore I/O return a
PCollection of Datastore
Entity objects. Entities are data
objects in Cloud Datastore.
The following example code shows a simple read using
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create(); Query query = ...; String projectId = "..."; Pipeline p = Pipeline.create(options); PCollection<Entity> entities = p.apply( DatastoreIO.v1().read() .withProjectId(projectId) .withQuery(query));
Note: Reads using
DatastoreIO typically use multiple workers to
read in parallel. However, not all queries can be parallelized, such as if you specify a limit or
if the query contains certain inequality filters. For such queries the Dataflow service will use
a single Dataflow worker to ensure data correctness. This behavior can have implications for
your pipeline's throughput.
Writing to Datastore
To write to Datastore, you'll need to format your output as a
Datastore entity objects, and then apply a Datastore Write transform. You'll need to pass
the Cloud Platform project ID that contains your Datastore database.
The following example code shows a simple write using Datastore I/O:
PCollection<Entity> entities = ...; entities.apply(DatastoreIO.v1().write().withProjectId(projectId));
The entities you write to Datastore must have complete
Keys. A complete
Key specifies both the
id for the entity. If you
want to write an entity to a specific
namespace, you'll need to specify that
namespace in the corresponding property of your entity's
Entities you write using Dataflow are committed to Datastore as upsert (update or insert) mutation operations, meaning any entities that already exist in Datastore are overwritten, and any other entities are inserted.