Stopping a running pipeline

If you need to stop a running Dataflow job, you can do so by issuing a command using either the Dataflow Monitoring Interface or the Dataflow Command-line Interface. There are two possible commands you can issue to stop your job: Cancel and Drain.

Stopping a job using the Cloud Dataflow monitoring UI

To stop a job, select the job from the jobs list in the Dataflow Monitoring Interface. On the information card for your job, click Stop Job.

Figure 1: An information card for a Dataflow job, with the Stop Job button.

The Stop Job dialog appears with your options for how to stop your job:

Figure 2: The Stop Job dialog with options for Cancel and Drain.

Select Cancel or Drain option as appropriate and click the Stop Job button.

Cancel

Using the Cancel option to stop your job tells the Dataflow service to cancel your job immediately. The service will halt all data ingestion and processing as soon as possible and immediately begin cleaning up the Google Cloud resources attached to your job. These resources may include shutting down Compute Engine worker instances and closing active connections to I/O sources or sinks.

Because Cancel immediately halts processing, you may lose any "in-flight" data. "In-flight" data refers to data that has been read but is still being processed by your pipeline. Data written from your pipeline to an output sink before you issued the Cancel command may still be accessible on your output sink.

If data loss is not a concern, use the Cancel option to stop your to ensure the Google Cloud resources associated with your job are shut down as soon as possible.

Drain

Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your job stops ingesting new data from input sources soon after receiving the drain request (typically within a few minutes). However, the Dataflow service preserves any existing resources, such as worker instances, to finish processing and writing any buffered data in your pipeline. When all pending processing and write operations are complete, the Dataflow service cleans up the Google Cloud resources associated with your job.

If you want to prevent data loss as you bring down your pipelines, use the Drain option to stop your job.

Effects of draining a job

When you issue the Drain command, Dataflow immediately closes any in-process windows and fires all triggers. The system does not wait for any outstanding time-based windows to finish. For example, if your pipeline is ten minutes into a two-hour window when you issue the Drain command, Dataflow won't wait for the remainder of the window to finish. It closes the window immediately with partial results.

In the detailed view of your pipeline's transforms, you can see the effects of an in-process Drain command:

Figure 3: A step view with Drain in progress; notice the watermark has advanced to the maximum value.

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Trenger du hjelp? Gå til brukerstøttesiden vår.