Parallel steps can reduce the total execution time for a workflow by performing multiple blocking calls at the same time.
Blocking calls such as sleep, HTTP calls, and callbacks can take time, from milliseconds to days. Parallel steps are intended to assist with such concurrent long-running operations. If a workflow must perform multiple blocking calls that are independent of each other, using parallel branches can reduce the total execution time by starting the calls at the same time, and waiting for all of them to complete.
For example, if your workflow must retrieve customer data from several independent systems before continuing, parallel branches allow for concurrent API requests. If there are five systems and each takes two seconds to respond, performing the steps sequentially in a workflow could take at least 10 seconds; performing them in parallel could take as little as two.
Create a parallel step
Create a parallel
step to define a part of your workflow where two or more
steps can execute concurrently.
YAML
- PARALLEL_STEP_NAME: parallel: exception_policy: POLICY shared: [VARIABLE_A, VARIABLE_B, ...] concurrency_limit: CONCURRENCY_LIMIT BRANCHES_OR_FOR: ...
JSON
[ { "PARALLEL_STEP_NAME": { "parallel": { "exception_policy": "POLICY", "shared": [ "VARIABLE_A", "VARIABLE_B", ... ], "concurrency_limit": "CONCURRENCY_LIMIT", "BRANCHES_OR_FOR": ... } } } ]
Replace the following:
PARALLEL_STEP_NAME
: the name of the parallel step.POLICY
(optional): determines the action other branches will take when an unhandled exception occurs. The default policy,continueAll
, results in no further action, and all other branches will attempt to run. Note thatcontinueAll
is the only policy currently supported.VARIABLE_A
,VARIABLE_B
, and so on: a list of writable variables with parent scope that allow assignments within the parallel step. For more information, see Shared variables.CONCURRENCY_LIMIT
(optional): the maximum number of branches and iterations that can concurrently execute within a single workflow execution before further branches and iterations are queued to wait. This applies to a singleparallel
step only and does not cascade. Must be a positive integer and can be either a literal value or an expression. For details, see Concurrency limits.BRANCHES_OR_FOR
: use eitherbranches
orfor
to indicate one of the following:- Branches that can run concurrently.
- A loop where iterations can run concurrently.
Note the following:
- Parallel branches and iterations can run in any order, and might run in a different order with each execution.
- Parallel steps can include other, nested parallel steps up to the depth limit. See Quotas and limits.
- For more details, see the syntax reference page for parallel steps.
Replace experimental function with parallel step
If you are using experimental.executions.map
to support parallel work, you can
migrate your workflow to use parallel steps instead, executing ordinary
for
loops in parallel. For examples, see
Replace experimental function with parallel step.
Samples
These samples demonstrate the syntax.
Perform operations in parallel (using branches)
If your workflow has multiple and different sets of steps that can be executed at the same time, placing them in parallel branches can decrease the total time needed to complete those steps.
In the following example, a user ID is passed as an argument to the workflow and data is retrieved in parallel from two different services. Shared variables allow values to be written to in the branches, and read after the branches complete:
YAML
JSON
Process items in parallel (using a parallel loop)
If you need to perform the same action for each item in a list, you can complete the execution more quickly by using a parallel loop. A parallel loop allows multiple loop iterations to be performed in parallel. Note that, unlike regular for loops, iterations can be performed in any order.
In the following example, a set of user notifications are processed in a
parallel for
loop:
YAML
JSON
Aggregate data (using a parallel loop)
You can process a set of items while collecting data from the operations performed on each item. For example, you might want to track the IDs of created items, or maintain a list of items with errors.
In the following example, 10 separate queries to a public BigQuery dataset each return the number of words in a document, or set of documents. A shared variable allows the count of the words to accumulate and be read after all the iterations complete. After calculating the number of words across all the documents, the workflow returns the total.
YAML
JSON
What's next
- Syntax reference: Parallel steps
- Tutorial: Run a workflow that executes other workflows in parallel
- Tutorial: Run multiple BigQuery jobs in parallel