The Firestore to Cloud Storage Text template is a batch pipeline that reads Firestore entities and writes them to Cloud Storage as text files. You can provide a function to process each entity as a JSON string. If you don't provide such a function, every line in the output file will be a JSON-serialized entity.
Pipeline requirements
Firestore must be set up in the project before running the pipeline.
Template parameters
Required parameters
- firestoreReadGqlQuery: A GQL (https://cloud.google.com/datastore/docs/reference/gql_reference) query that specifies which entities to grab. For example,
SELECT * FROM MyKind
. - firestoreReadProjectId: The ID of the Google Cloud project that contains the Firestore instance that you want to read data from.
- textWritePrefix: The Cloud Storage path prefix that specifies where the data is written. For example,
gs://mybucket/somefolder/
.
Optional parameters
- firestoreReadNamespace: The namespace of the requested entities. To use the default namespace, leave this parameter blank.
- javascriptTextTransformGcsPath: The Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) to use. For example,
gs://my-bucket/my-udfs/my_file.js
. - javascriptTextTransformFunctionName: The name of the JavaScript user-defined function (UDF) to use. For example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ }
, then the function name ismyTransform
. For sample JavaScript UDFs, see UDF Examples (https://github.com/GoogleCloudPlatform/DataflowTemplates#udf-examples).
User-defined function
Optionally, you can extend this template by writing a user-defined function (UDF). The template calls the UDF for each input element. Element payloads are serialized as JSON strings. For more information, see Create user-defined functions for Dataflow templates.
Function specification
The UDF has the following specification:
- Input: a Firestore entity, serialized as a JSON string.
- Output: the string value to write to Cloud Storage.
Run the template
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations.
- From the Dataflow template drop-down menu, select the Firestore to Text Files on Cloud Storage template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
In your shell or terminal, run the template:
gcloud dataflow jobs runJOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME /VERSION /Firestore_to_GCS_Text \ --regionREGION_NAME \ --parameters \ firestoreReadGqlQuery="SELECT * FROMFIRESTORE_KIND ",\ firestoreReadProjectId=FIRESTORE_PROJECT_ID ,\ firestoreReadNamespace=FIRESTORE_NAMESPACE ,\ javascriptTextTransformGcsPath=PATH_TO_JAVASCRIPT_UDF_FILE ,\ javascriptTextTransformFunctionName=JAVASCRIPT_FUNCTION ,\ textWritePrefix=gs://BUCKET_NAME /output/
Replace the following:
JOB_NAME
: a unique job name of your choiceREGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
BUCKET_NAME
: the name of your Cloud Storage bucketFIRESTORE_PROJECT_ID
: the Google Cloud project ID where the Firestore instance existsFIRESTORE_KIND
: the type of your Firestore entitiesFIRESTORE_NAMESPACE
: the namespace of your Firestore entitiesJAVASCRIPT_FUNCTION
: the name of the JavaScript user-defined function (UDF) that you want to useFor example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ }
, then the function name ismyTransform
. For sample JavaScript UDFs, see UDF Examples.PATH_TO_JAVASCRIPT_UDF_FILE
: the Cloud Storage URI of the.js
file that defines the JavaScript user-defined function (UDF) you want to use—for example,gs://my-bucket/my-udfs/my_file.js
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see
projects.templates.launch
.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID /locations/LOCATION /templates:launch?gcsPath=gs://dataflow-templates-LOCATION /VERSION /Firestore_to_GCS_Text { "jobName": "JOB_NAME ", "parameters": { "firestoreReadGqlQuery": "SELECT * FROMFIRESTORE_KIND " "firestoreReadProjectId": "FIRESTORE_PROJECT_ID ", "firestoreReadNamespace": "FIRESTORE_NAMESPACE ", "javascriptTextTransformGcsPath": "PATH_TO_JAVASCRIPT_UDF_FILE ", "javascriptTextTransformFunctionName": "JAVASCRIPT_FUNCTION ", "textWritePrefix": "gs://BUCKET_NAME /output/" }, "environment": { "zone": "us-central1-f" } }
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceLOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
BUCKET_NAME
: the name of your Cloud Storage bucketFIRESTORE_PROJECT_ID
: the Google Cloud project ID where the Firestore instance existsFIRESTORE_KIND
: the type of your Firestore entitiesFIRESTORE_NAMESPACE
: the namespace of your Firestore entitiesJAVASCRIPT_FUNCTION
: the name of the JavaScript user-defined function (UDF) that you want to useFor example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ }
, then the function name ismyTransform
. For sample JavaScript UDFs, see UDF Examples.PATH_TO_JAVASCRIPT_UDF_FILE
: the Cloud Storage URI of the.js
file that defines the JavaScript user-defined function (UDF) you want to use—for example,gs://my-bucket/my-udfs/my_file.js
Template source code
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.