The Firestore Bulk Delete template is a pipeline which reads in Entities from Firestore with a given GQL query and then deletes all matching Entities in the selected target project. The pipeline can optionally pass the JSON encoded Firestore Entities to your Javascript UDF, which you can use to filter out Entities by returning null values.
Pipeline requirements
- Firestore must be set up in the project prior to running the template.
- If reading and deleting from separate Firestore instances, the Dataflow Worker Service Account must have permission to read from one instance and delete from the other.
- Database writes must be enabled on the Firestore instance.
Template parameters
Parameter | Description |
---|---|
firestoreReadGqlQuery |
GQL query which specifies which entities to match for deletion. Using a keys-only query may improve performance. For example: "SELECT __key__ FROM MyKind". |
firestoreReadProjectId |
Project ID of the Firestore instance from which you want to read entities (using your GQL query) that are used for matching. |
firestoreDeleteProjectId |
Project ID of the Firestore instance from which to delete matching entities. This can be the same as firestoreReadProjectId if you want to read and delete within the same Firestore instance. |
firestoreReadNamespace |
(Optional) Namespace of requested Entities. Set as "" for default namespace. |
firestoreHintNumWorkers |
(Optional) Hint for the expected number of workers in the Firestore ramp-up throttling step. Default is 500 . |
javascriptTextTransformGcsPath |
(Optional)
The Cloud Storage URI of the .js file that defines the JavaScript user-defined
function (UDF) you want to use. For example, gs://my-bucket/my-udfs/my_file.js .
|
javascriptTextTransformFunctionName |
(Optional)
The name of the JavaScript user-defined function (UDF) that you want to use.
For example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ } , then the function name is
myTransform . For sample JavaScript UDFs, see
UDF Examples.
If this function returns a value of undefined or null for
a given Firestore entity, then that entity is not deleted. |
User-defined function
Optionally, you can extend this template by writing a user-defined function (UDF). The template calls the UDF for each input element. Element payloads are serialized as JSON strings. For more information, see Create user-defined functions for Dataflow templates.
Function specification
The UDF has the following specification:
- Input: a Firestore entity, serialized as a JSON string.
- Output: if you want to keep the entity and not delete it,
return
null
orundefined
. Otherwise, return the original entity for deletion.
Run the template
Console
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations.
- From the Dataflow template drop-down menu, select the Bulk Delete Entities in Firestore template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
gcloud
In your shell or terminal, run the template:
gcloud dataflow jobs run JOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/Firestore_to_Firestore_Delete \ --region REGION_NAME \ --parameters \ firestoreReadGqlQuery="GQL_QUERY",\ firestoreReadProjectId=FIRESTORE_READ_AND_DELETE_PROJECT_ID,\ firestoreDeleteProjectId=FIRESTORE_READ_AND_DELETE_PROJECT_ID
Replace the following:
JOB_NAME
: a unique job name of your choiceREGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
GQL_QUERY
: the query you'll use to match entities for deletionFIRESTORE_READ_AND_DELETE_PROJECT_ID
: your Firestore instance project ID. This example both reads and deletes from the same Firestore instance.
API
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see
projects.templates.launch
.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/Firestore_to_Firestore_Delete { "jobName": "JOB_NAME", "parameters": { "firestoreReadGqlQuery": "GQL_QUERY", "firestoreReadProjectId": "FIRESTORE_READ_AND_DELETE_PROJECT_ID", "firestoreDeleteProjectId": "FIRESTORE_READ_AND_DELETE_PROJECT_ID" }, "environment": { "zone": "us-central1-f" } }
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceLOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
GQL_QUERY
: the query you'll use to match entities for deletionFIRESTORE_READ_AND_DELETE_PROJECT_ID
: your Firestore instance project ID. This example both reads and deletes from the same Firestore instance.
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.