Stopping a Running Pipeline

If you need to stop a running Dataflow job, you can do so by issuing a command using either the Dataflow Monitoring Interface or the Dataflow Command-line Interface. There are two possible commands you can issue to stop your job: Cancel and Drain.

Stopping a Job Using the Dataflow Monitoring UI

To stop a job, select the job from the jobs list in the Dataflow Monitoring Interface. On the information card for your job, click Stop Job.

Figure 1: An information card for a Dataflow job, with the Stop Job button.

The Stop Job dialog appears with your options for how to stop your job:

Figure 2: The Stop Job dialog with options for Cancel and Drain.

Select Cancel or Drain option as appropriate and click the Stop Job button.

Cancel

Using the Cancel option to stop your job tells the Dataflow service to abort your job immediately. The service will halt all data ingestion and processing as soon as possible and immediately begin cleaning up the Cloud Platform resources attached to your job. This may include shutting down Compute Engine worker instances and closing active connections to I/O sources or sinks.

Because Cancel immediately halts processing, you may lose any "in-flight" data (data that has been read, but is still being processed by your pipeline). Data written from your pipeline to an output sink before you issued the Cancel command may still be accessible on your output sink.

You should use the Cancel option to stop your job if you want to ensure the Cloud Platform resources associated with your job are shut down as soon as possible, and data loss isn't of particular concern.

Drain

Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your job will immediately stop ingesting new data from input sources, but the Dataflow service will preserve any existing resources (such as worker instances) to finish processing and writing any buffered data in your pipeline. When all pending processing and write operations are complete, the Dataflow service will clean up the Cloud Platform resources associated with your job.

You should use the Drain option to stop your job if you want to prevent data loss as you bring down your pipeline.

Effects of Draining a Job

When you issue the Drain command, Dataflow immediately closes any in-process windows and fires all triggers. Note that the system does not wait for any outstanding time-based windows to finish. For example, if your pipeline is ten minutes into a two-hour window when you issue the Drain command, Dataflow won't wait for the remainder of the window to finish; it will close the window immediately with partial results.

In the detailed view of your pipeline's transforms, you can see the effects of an in-process Drain command:

Figure 3: A step view with Drain in progress; notice the watermark has advanced to the maximum value.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow Documentation