If you need to stop a running Dataflow job, you can do so by issuing a command using either the Dataflow Monitoring Interface or the Dataflow Command-line Interface. There are two possible commands you can issue to stop your job: Cancel and Drain.
Stopping a job using the Cloud Dataflow monitoring UI
To stop a job, select the job from the jobs list in the Dataflow Monitoring Interface. On the top panel, click stopStop.
The Stop Job dialog appears with your options for how to stop your job:
Select Cancel or Drain option as appropriate and click the Stop Job button.
Using the Cancel option to stop your job tells the Dataflow service to cancel your job immediately. The service will halt all data ingestion and processing as soon as possible and immediately begin cleaning up the Google Cloud resources attached to your job. These resources may include shutting down Compute Engine worker instances and closing active connections to I/O sources or sinks.
Because Cancel immediately halts processing, you may lose any "in-flight" data. "In-flight" data refers to data that has been read but is still being processed by your pipeline. Data written from your pipeline to an output sink before you issued the Cancel command may still be accessible on your output sink.
If data loss is not a concern, use the Cancel option to stop your to ensure the Google Cloud resources associated with your job are shut down as soon as possible.
Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your job stops ingesting new data from input sources soon after receiving the drain request (typically within a few minutes). However, the Dataflow service preserves any existing resources, such as worker instances, to finish processing and writing any buffered data in your pipeline. When all pending processing and write operations are complete, the Dataflow service cleans up the Google Cloud resources associated with your job.
If you want to prevent data loss as you bring down your pipelines, use the Drain option to stop your job.
Effects of draining a job
When you issue the Drain command, Dataflow immediately closes any in-process windows and fires all triggers. The system does not wait for any outstanding time-based windows to finish. For example, if your pipeline is ten minutes into a two-hour window when you issue the Drain command, Dataflow won't wait for the remainder of the window to finish. It closes the window immediately with partial results.
In the detailed view of your pipeline's transforms, you can see the effects of an in-process Drain command: