DatastoreIO (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class DatastoreIO


  • Deprecated. 
    replaced by DatastoreIO

    @Deprecated
     @Experimental(value=SOURCE_SINK)
    public class DatastoreIO
    extends Object
    DatastoreIO provides an API to Read and Write PCollections of Google Cloud Datastore DatastoreV1.Entity objects.

    Google Cloud Datastore is a fully managed NoSQL data storage service. An Entity is an object in Datastore, analogous to a row in traditional database table.

    This API currently requires an authentication workaround. To use DatastoreIO, users must use the gcloud command line tool to get credentials for Datastore:

     $ gcloud auth login
     

    To read a PCollection from a query to Datastore, use source() and its methods DatastoreIO.Source.withDataset(java.lang.String) and DatastoreIO.Source.withQuery(com.google.api.services.datastore.DatastoreV1.Query) to specify the dataset to query and the query to read from. You can optionally provide a namespace to query within using DatastoreIO.Source.withNamespace(java.lang.String) or a Datastore host using DatastoreIO.Source.withHost(java.lang.String).

    For example:

     
     // Read a query from Datastore
     PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
     Query query = ...;
     String dataset = "...";
    
     Pipeline p = Pipeline.create(options);
     PCollection<Entity> entities = p.apply(
         Read.from(DatastoreIO.source()
             .withDataset(datasetId)
             .withQuery(query)
             .withHost(host)));
      

    or:

     
     // Read a query from Datastore using the default namespace and host
     PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
     Query query = ...;
     String dataset = "...";
    
     Pipeline p = Pipeline.create(options);
     PCollection<Entity> entities = p.apply(DatastoreIO.readFrom(datasetId, query));
     p.run();
      

    Note: Normally, a Cloud Dataflow job will read from Cloud Datastore in parallel across many workers. However, when the DatastoreV1.Query is configured with a limit using DatastoreV1.Query.Builder.setLimit(int), then all returned results will be read by a single Dataflow worker in order to ensure correct data.

    To write a PCollection to a Datastore, use writeTo(java.lang.String), specifying the datastore to write to:

     
     PCollection<Entity> entities = ...;
     entities.apply(DatastoreIO.writeTo(dataset));
     p.run();
      

    To optionally change the host that is used to write to the Datastore, use sink() to build a DatastoreIO.Sink and write to it using the Write transform:

     
     PCollection<Entity> entities = ...;
     entities.apply(Write.to(DatastoreIO.sink().withDataset(dataset).withHost(host)));
      

    Entities in the PCollection to be written must have complete Keys. Complete Keys specify the name and id of the Entity, where incomplete Keys do not. A namespace other than the project default may be written to by specifying it in the Entity Keys.

    
     Key.Builder keyBuilder = DatastoreHelper.makeKey(...);
     keyBuilder.getPartitionIdBuilder().setNamespace(namespace);
     

    Entities will be committed as upsert (update or insert) mutations. Please read Entities, Properties, and Keys for more information about Entity keys.

    Permissions

    Permission requirements depend on the PipelineRunner that is used to execute the Dataflow job. Please refer to the documentation of corresponding PipelineRunners for more details.

    Please see Cloud Datastore Sign Up for security and permission related information specific to Datastore.

    See Also:
    PipelineRunner


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow