The BigQuery to MongoDB template is a batch pipeline that reads rows from a BigQuery and writes them to MongoDB as documents. Currently each row is stored as a document.
Pipeline requirements
- The source BigQuery table must exist.
- The target MongoDB instance should be accessible from the Dataflow worker machines.
Template parameters
Required parameters
- mongoDbUri: The MongoDB connection URI in the format
mongodb+srv://:@
. - database: Database in MongoDB to store the collection. For example,
my-db
. - collection: The name of the collection in the MongoDB database. For example,
my-collection
. - inputTableSpec: The BigQuery table to read from. For example,
bigquery-project:dataset.input_table
.
Run the template
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations.
- From the Dataflow template drop-down menu, select the BigQuery to MongoDB template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
In your shell or terminal, run the template:
gcloud dataflow flex-template runJOB_NAME \ --project=PROJECT_ID \ --region=REGION_NAME \ --template-file-gcs-location=gs://dataflow-templates-REGION_NAME /VERSION /flex/BigQuery_to_MongoDB \ --parameters \ inputTableSpec=INPUT_TABLE_SPEC ,\ mongoDbUri=MONGO_DB_URI ,\ database=DATABASE ,\ collection=COLLECTION
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceREGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
INPUT_TABLE_SPEC
: your source BigQuery table name.MONGO_DB_URI
: your MongoDB URI.DATABASE
: your MongoDB database.COLLECTION
: your MongoDB collection.
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see
projects.templates.launch
.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID /locations/LOCATION /flexTemplates:launch { "launch_parameter": { "jobName": "JOB_NAME ", "parameters": { "inputTableSpec": "INPUT_TABLE_SPEC ", "mongoDbUri": "MONGO_DB_URI ", "database": "DATABASE ", "collection": "COLLECTION " }, "containerSpecGcsPath": "gs://dataflow-templates-LOCATION /VERSION /flex/BigQuery_to_MongoDB", } }
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceLOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
INPUT_TABLE_SPEC
: your source BigQuery table name.MONGO_DB_URI
: your MongoDB URI.DATABASE
: your MongoDB database.COLLECTION
: your MongoDB collection.
Template source code
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.