Join the Apache Beam community on July 18th-20th for the Beam Summit 2022 to learn more about Beam and share your expertise.

Stop a running pipeline

You cannot delete a Dataflow job; you can only stop it.

To stop a Dataflow job, you can use either the Google Cloud console, Cloud Shell, a local terminal installed with the Google Cloud CLI, or the Dataflow REST API.

You can stop a Dataflow job in the following two ways:

  • Canceling a job. This method applies to both streaming and batch pipelines. Canceling a job stops the Dataflow service from processing any data, including buffered data. For more information, see Canceling a job.

  • Draining a job. This method applies only to streaming pipelines. Draining a job enables the Dataflow service to finish processing the buffered data while simultaneously ceasing the ingestion of new data. For more information, see Draining a job.

Cancel a job

When you cancel a job, the Dataflow service stops the job immediately.

The following actions occur when you cancel a job:

  1. The Dataflow service halts all data ingestion and data processing.

  2. The Dataflow service begins cleaning up the Google Cloud resources that are attached to your job.

    These resources may include shutting down Compute Engine worker instances and closing active connections to I/O sources or sinks.

Important information about canceling a job

  • Canceling a job immediately halts the processing of the pipeline.

  • You might lose in-flight data when you cancel a job. In-flight data refers to data that is already read but is still being processed by the pipeline.

  • Data that's written from the pipeline to an output sink before you cancelled the job might still be accessible on your output sink.

  • If data loss is not a concern, canceling your job ensures that the Google Cloud resources that are associated with your job are shut down as soon as possible.

Drain a job

When you drain a job, the Dataflow service finishes your job in its current state. If you want to prevent data loss as you bring down your streaming pipelines, the best option is to drain your job.

The following actions occur when you drain a job:

  1. Your job stops ingesting new data from input sources soon after receiving the drain request (typically within a few minutes).

  2. The Dataflow service preserves any existing resources, such as worker instances, to finish processing and writing any buffered data in your pipeline.

  3. When all pending processing and write operations are complete, the Dataflow service shuts down Google Cloud resources that are associated with your job.

Important information about draining a job

  • Draining a job is not supported for batch pipelines.

  • Your pipeline continues to incur the cost of maintaining any associated Google Cloud resources until all processing and writing is finished.

  • If your streaming pipeline code includes a looping timer, the job cannot be drained.

  • Dataflow streaming is tolerant to workers restarting, and does not fail streaming jobs on errors. Instead, the Dataflow service retries until the user takes an action such as draining, canceling, or restarting the job. Draining a job is recommended to avoid data loss.

  • Dataflow does not acknowledge messages until the Dataflow service durably commits them. For example, with Kafka, this can be viewed as a safe handoff of ownership of the message from Kafka to Dataflow, eliminating the risk of data loss.

  • If your streaming pipeline includes a Splittable DoFn, you must truncate the result before running the drain option. For more information about truncating Splittable DoFns, see the Apache Beam documentation.

  • You can update a pipeline that is being drained.

  • Draining a job can take a significant amount of time to complete, such as when your pipeline has a large amount of buffered data.

  • You can cancel a job that is currently draining.

  • In some cases, a Dataflow job might be unable to complete the drain operation. You can consult the job logs to determine the root cause and take appropriate action.

Effects of draining a job

When you drain a streaming pipeline, Dataflow immediately closes any in-process windows and fires all triggers.

The system does not wait for any outstanding time-based windows to finish in a drain operation.

For example, if your pipeline is ten minutes into a two-hour window when you drain the job, Dataflow doesn't wait for the remainder of the window to finish. It closes the window immediately with partial results. Dataflow causes open windows to close by advancing the system watermark to infinity. This functionality also works with custom data sources.

When draining a pipeline that uses a custom data source class, Dataflow stops issuing requests for new data, advances the system watermark to infinity, and calls your source's finalize() method on the last checkpoint.

In the Google Cloud console, you can view the details of your pipeline's transforms. The following diagram shows the effects of an in-process drain operation. Note that the watermark is advanced to the maximum value.

A step view of a drain operation.

Figure 1. A step view of a drain operation.

Stop a job

Before stopping a job, you must understand the effects of canceling or draining a job.


  1. Go to the Dataflow Jobs page.

    Go to Jobs

  2. Click the job that you want to stop.

    To stop a job, the status of the job must be running.

  3. In the job details page, click Stop.

  4. Do one of the following:

    • For a batch pipeline, click Cancel.

    • For a streaming pipeline, click either Cancel or Drain.

  5. To confirm your choice, click Stop Job.


To either drain or cancel a Dataflow job, you can use the gcloud dataflow jobs command in the Cloud Shell or a local terminal installed with the gcloud CLI.

  1. Log in to your shell.

  2. List the job IDs for the Dataflow jobs that are currently running, and then note the job ID for the job that you want to stop:

    gcloud dataflow jobs list

    If the --region flag is not set, Dataflow jobs from all available regions are displayed.

  3. Do one of the following:

    • To drain a streaming job:

       gcloud dataflow jobs drain JOB_ID

      Replace JOB_ID with the job ID that you copied earlier.

    • To cancel a batch or streaming job:

      gcloud dataflow jobs cancel JOB_ID

      Replace JOB_ID with the job ID that you copied earlier.


To cancel or drain a job using the Dataflow REST API, you can choose either or In the request body, pass the required job state in the requestedState field of the job instance of the chosen API.

Important: Using is recommended, as only allows updating the state of jobs running in us-central1.

  • To cancel the job, set the job state to JOB_STATE_CANCELLED.

  • To drain the job, set the job state to JOB_STATE_DRAINED.

Detect job completion

To detect when the job cancellation or draining has completed, use one of the following methods:

  • Use a workflow orchestration service such as Cloud Composer to monitor your Dataflow job.
  • Run the pipeline synchronously so that tasks are blocked until pipeline completion. For more information, see Controlling execution modes in Setting pipeline options.
  • Use the command-line tool in the Google Cloud CLI to poll the job status. To get a list of all the Dataflow jobs in your project, run the following command in your shell or terminal:

    gcloud dataflow jobs list

    The output shows the job ID, name, status (STATE), and other information for each job. For more information, see Using the Dataflow command-line interface.

What's next