The Cloud Storage Text to Firestore template is a batch pipeline that imports from JSON documents stored in Cloud Storage to Firestore.
Pipeline requirements
Firestore must be enabled in the destination project.
Input format
Each input file must contain newline-delimited JSON, where each line contains a
JSON representation of a Datastore
Entity
data type.
For example, the following JSON represents a document in a collection named
Users
. The example is formatted for readability, but each document
must appear as a single line of input.
{ "key": { "partitionId": { "projectId": "my-project" }, "path": [ { "kind": "users", "name": "alovelace" } ] }, "properties": { "first": { "stringValue": "Ada" }, "last": { "stringValue": "Lovelace" }, "born": { "integerValue": "1815", "excludeFromIndexes": true } } }
For more information about the document model, see Entities, Properties, and Keys.
Template parameters
Required parameters
- textReadPattern : A Cloud Storage path pattern that specifies the location of your text data files. For example,
gs://mybucket/somepath/*.json
. - firestoreWriteProjectId : The ID of the Google Cloud project to write the Firestore entities to.
- errorWritePath : The error log output file to use for write failures that occur during processing. (Example: gs://your-bucket/errors/).
Optional parameters
- javascriptTextTransformGcsPath : The Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) to use. For example,
gs://my-bucket/my-udfs/my_file.js
. - javascriptTextTransformFunctionName : The name of the JavaScript user-defined function (UDF) to use. For example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ }
, then the function name ismyTransform
. For sample JavaScript UDFs, see UDF Examples (https://github.com/GoogleCloudPlatform/DataflowTemplates#udf-examples). - firestoreHintNumWorkers : Hint for the expected number of workers in the Firestore ramp-up throttling step. Default is 500.
User-defined function
Optionally, you can extend this template by writing a user-defined function (UDF). The template calls the UDF for each input element. Element payloads are serialized as JSON strings. For more information, see Create user-defined functions for Dataflow templates.
Function specification
The UDF has the following specification:
- Input: a line of text from a Cloud Storage input file.
- Output: an
Entity
, serialized as a JSON string.
Run the template
Console
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations.
- From the Dataflow template drop-down menu, select the Text Files on Cloud Storage to Firestore template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
gcloud
In your shell or terminal, run the template:
gcloud dataflow jobs run JOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/GCS_Text_to_Firestore \ --region REGION_NAME \ --parameters \ textReadPattern=PATH_TO_INPUT_TEXT_FILES,\ javascriptTextTransformGcsPath=PATH_TO_JAVASCRIPT_UDF_FILE,\ javascriptTextTransformFunctionName=JAVASCRIPT_FUNCTION,\ firestoreWriteProjectId=PROJECT_ID,\ errorWritePath=ERROR_FILE_WRITE_PATH
Replace the following:
JOB_NAME
: a unique job name of your choiceVERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
REGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
PATH_TO_INPUT_TEXT_FILES
: the input files pattern on Cloud StorageJAVASCRIPT_FUNCTION
: the name of the JavaScript user-defined function (UDF) that you want to useFor example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ }
, then the function name ismyTransform
. For sample JavaScript UDFs, see UDF Examples.PATH_TO_JAVASCRIPT_UDF_FILE
: the Cloud Storage URI of the.js
file that defines the JavaScript user-defined function (UDF) you want to use—for example,gs://my-bucket/my-udfs/my_file.js
ERROR_FILE_WRITE_PATH
: your desired path to error file on Cloud Storage
API
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see
projects.templates.launch
.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/GCS_Text_to_Firestore { "jobName": "JOB_NAME", "parameters": { "textReadPattern": "PATH_TO_INPUT_TEXT_FILES", "javascriptTextTransformGcsPath": "PATH_TO_JAVASCRIPT_UDF_FILE", "javascriptTextTransformFunctionName": "JAVASCRIPT_FUNCTION", "firestoreWriteProjectId": "PROJECT_ID", "errorWritePath": "ERROR_FILE_WRITE_PATH" }, "environment": { "zone": "us-central1-f" } }
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceVERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
LOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
PATH_TO_INPUT_TEXT_FILES
: the input files pattern on Cloud StorageJAVASCRIPT_FUNCTION
: the name of the JavaScript user-defined function (UDF) that you want to useFor example, if your JavaScript function code is
myTransform(inJson) { /*...do stuff...*/ }
, then the function name ismyTransform
. For sample JavaScript UDFs, see UDF Examples.PATH_TO_JAVASCRIPT_UDF_FILE
: the Cloud Storage URI of the.js
file that defines the JavaScript user-defined function (UDF) you want to use—for example,gs://my-bucket/my-udfs/my_file.js
ERROR_FILE_WRITE_PATH
: your desired path to error file on Cloud Storage
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.