If you need to stop a running Cloud Dataflow job, you can do so by issuing a command using either the Cloud Dataflow Monitoring Interface or the Cloud Dataflow Command-line Interface. There are two possible commands you can issue to stop your job: Cancel and Drain.
Stopping a job using the Cloud Dataflow monitoring UI
To stop a job, select the job from the jobs list in the Cloud Dataflow Monitoring Interface. On the information card for your job, click Stop Job.
The Stop Job dialog appears with your options for how to stop your job:
Select Cancel or Drain option as appropriate and click the Stop Job button.
Using the Cancel option to stop your job tells the Cloud Dataflow service to cancel your job immediately. The service will halt all data ingestion and processing as soon as possible and immediately begin cleaning up the Google Cloud Platform (GCP) resources attached to your job. These resources may include shutting down Compute Engine worker instances and closing active connections to I/O sources or sinks.
Because Cancel immediately halts processing, you may lose any "in-flight" data. "In-flight" data refers to data that has been read but is still being processed by your pipeline. Data written from your pipeline to an output sink before you issued the Cancel command may still be accessible on your output sink.
If data loss is not a concern, use the Cancel option to stop your to ensure the GCP resources associated with your job are shut down as soon as possible.
Using the Drain option to stop your job tells the Cloud Dataflow service to finish your job in its current state. Your job will stop ingesting new data from input sources soon after receiving the drain request (typically within a few minutes). However, the Cloud Dataflow service will preserve any existing resources, such as worker instances, to finish processing and writing any buffered data in your pipeline. When all pending processing and write operations are complete, the Cloud Dataflow service will clean up the GCP resources associated with your job.
Use the Drain option to stop your job if you want to prevent data loss as you bring down your pipeline.
Effects of draining a job
When you issue the Drain command, Cloud Dataflow immediately closes any in-process windows and fires all triggers. The system does not wait for any outstanding time-based windows to finish. For example, if your pipeline is ten minutes into a two-hour window when you issue the Drain command, Cloud Dataflow won't wait for the remainder of the window to finish. It will close the window immediately with partial results.
In the detailed view of your pipeline's transforms, you can see the effects of an in-process Drain command: