DatastoreV1 (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io.datastore

Class DatastoreV1



  • @Experimental(value=SOURCE_SINK)
    public class DatastoreV1
    extends Object
    DatastoreV1 provides an API to Read, Write and Delete PCollections of Google Cloud Datastore version v1 Entity objects.

    This API currently requires an authentication workaround. To use DatastoreV1, users must use the gcloud command line tool to get credentials for Cloud Datastore:

     $ gcloud auth login
     

    To read a PCollection from a query to Cloud Datastore, use read() and its methods DatastoreV1.Read.withProjectId(java.lang.String) and DatastoreV1.Read.withQuery(com.google.datastore.v1.Query) to specify the project to query and the query to read from. You can optionally provide a namespace to query within using DatastoreV1.Read.withNamespace(java.lang.String). You could also optionally specify how many splits you want for the query using DatastoreV1.Read.withNumQuerySplits(int).

    For example:

     
     // Read a query from Datastore
     PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
     Query query = ...;
     String projectId = "...";
    
     Pipeline p = Pipeline.create(options);
     PCollection<Entity> entities = p.apply(
         DatastoreIO.v1().read()
             .withProjectId(projectId)
             .withQuery(query));
      

    Note: Normally, a Cloud Dataflow job will read from Cloud Datastore in parallel across many workers. However, when the Query is configured with a limit using Query.Builder.setLimit(Int32Value), then all returned results will be read by a single Dataflow worker in order to ensure correct data.

    To write a PCollection to a Cloud Datastore, use write(), specifying the Cloud Datastore project to write to:

     
     PCollection<Entity> entities = ...;
     entities.apply(DatastoreIO.v1().write().withProjectId(projectId));
     p.run();
      

    To delete a PCollection of Entities from Cloud Datastore, use deleteEntity(), specifying the Cloud Datastore project to write to:

     
     PCollection<Entity> entities = ...;
     entities.apply(DatastoreIO.v1().deleteEntity().withProjectId(projectId));
     p.run();
      

    To delete entities associated with a PCollection of Keys from Cloud Datastore, use deleteKey(), specifying the Cloud Datastore project to write to:

     
     PCollection<Entity> entities = ...;
     entities.apply(DatastoreIO.v1().deleteKey().withProjectId(projectId));
     p.run();
      

    Entities in the PCollection to be written or deleted must have complete Keys. Complete Keys specify the name and id of the Entity, where incomplete Keys do not. A namespace other than projectId default may be used by specifying it in the Entity Keys.

    
     Key.Builder keyBuilder = DatastoreHelper.makeKey(...);
     keyBuilder.getPartitionIdBuilder().setNamespace(namespace);
     

    Entities will be committed as upsert (update or insert) or delete mutations. Please read Entities, Properties, and Keys for more information about Entity keys.

    Permissions

    Permission requirements depend on the PipelineRunner that is used to execute the Dataflow job. Please refer to the documentation of corresponding PipelineRunners for more details.

    Please see Cloud Datastore Sign Up for security and permission related information specific to Cloud Datastore.

    See Also:
    PipelineRunner


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow