Using the Dataflow Command-line Interface

When you execute your pipeline using the Cloud Dataflow managed service, you can obtain information about your Dataflow job (and any others) by using the Dataflow Command-line Interface. The Dataflow Command-Line Interface is part of the gcloud command-line tool in the Google Cloud SDK.

NOTE: If you'd rather view and interact with your Dataflow jobs using the web-based UI, use the Dataflow Monitoring Interface.

Installing the Dataflow Command-line Component

To use the Dataflow Command-line Interface, you'll first need to install the Beta components in the gcloud tool. In your shell or terminal window, enter:

  gcloud components update beta

Type y when prompted to continue.

Running the Available Commands

You interact with the Dataflow Command-line Interface by running the available commands. To run a command, type the following command into your shell or terminal:

  gcloud beta dataflow

The Dataflow Command-line Interface has three major subcommands: jobs, logs, and metrics.

Jobs Commands

The jobs subcommands group lets you view and interact with the Dataflow jobs in your Cloud Platform project. You can use these commands to view a list of your jobs, cancel a job, show a description of a specific job, and others. For example, to view a list of all your Dataflow jobs, type the following command into your shell or terminal:

gcloud dataflow jobs list

The gcloud tool returns a list of your current jobs, as follows:

  ID                                        NAME                                    TYPE   CREATION_TIME        STATE
  2015-06-03_16_39_22-4020553808241078833   wordcount-janedoe-0603233849            Batch  2015-06-03 16:39:22  Done
  2015-06-03_16_38_28-4363652261786938862   wordcount-johndoe-0603233820            Batch  2015-06-03 16:38:28  Done
  2015-05-21_16_24_11-17823098268333533078  bigquerytornadoes-johndoe-0521232402    Batch  2015-05-21 16:24:11  Done
  2015-05-21_13_38_06-16409850040969261121  bigquerytornadoes-johndoe-0521203801    Batch  2015-05-21 13:38:06  Done
  2015-05-21_13_17_18-18349574013243942260  bigquerytornadoes-johndoe-0521201710    Batch  2015-05-21 13:17:18  Done
  2015-05-21_12_49_37-9791290545307959963   wordcount-johndoe-0521194928            Batch  2015-05-21 12:49:37  Done
  2015-05-20_15_54_51-15905022415025455887  wordcount-johndoe-0520225444            Batch  2015-05-20 15:54:51  Failed
  2015-05-20_15_47_02-14774624590029708464  wordcount-johndoe-0520224637            Batch  2015-05-20 15:47:02  Done

Using the job ID, you can run the describe command to display more information about a job.

export JOBID=<X>
gcloud beta dataflow jobs describe $JOBID

For example, if you run the command for job ID 2015-02-09_11_39_40-15635991037808002875, the gcloud tool returns the following information:

createTime: '2015-02-09T19:39:41.140Z'
currentState: JOB_STATE_DONE
currentStateTime: '2015-02-09T19:56:39.510Z'
id: 2015-02-09_11_39_40-15635991037808002875
name: tfidf-bchambers-0209193926
projectId: google.com:clouddfe
type: JOB_TYPE_BATCH

You can run the command with the --format=json option to format the result into JSON.

gcloud --format=json beta dataflow jobs describe $JOBID

The gcloud tool returns the following formatted information:

{
  "createTime": "2015-02-09T19:39:41.140Z",
  "currentState": "JOB_STATE_DONE",
  "currentStateTime": "2015-02-09T19:56:39.510Z",
  "id": "2015-02-09_11_39_40-15635991037808002875",
  "name": "tfidf-bchambers-0209193926",
  "projectId": "google.com:clouddfe",
  "type": "JOB_TYPE_BATCH"
}

For a complete list of jobs commands, see the gcloud beta dataflow jobs command in the Google Cloud SDK documentation.

Logs Commands

The logs commands display log entries for jobs run on the Dataflow Service.

For example, you can use the list command to print the logs that provide information about what your job is doing.

export JOBID=<X>
gcloud beta dataflow logs list $JOBID

For job ID 2015-02-09_11_39_40-15635991037808002875, the gcloud tool returns:

Listed 0 items.

In this example, no logs showed up at the default severity (Warning). You can include the BASIC logs by running the list command with the --importance=detailed option.

gcloud beta dataflow logs list $JOBID --importance=detailed

The gcloud tool prints out the following logs:

d 2016-08-29T09:33:28 2015-02-09_11_39_40-15635991037808002875_00000156d72606f7 (39b2a31f5e883423): Starting worker pool synchronously
d 2016-08-29T09:33:28 2015-02-09_11_39_40-15635991037808002875_00000156d7260871 (39b2a31f5e883ce9): Worker pool is running
d 2016-08-29T09:33:28 2015-02-09_11_39_40-15635991037808002875_00000156d7260874 (39b2a31f5e883b77): Executing operation Count.PerElement/Sum.PerKey/GroupByKey/GroupByKeyOnly…
...

For a complete list of logs commands, see the gcloud beta dataflow logs command in the Google Cloud SDK documentation.

Metrics Commands

The metrics commands allow you to view the metrics for a given Dataflow job.

Note: The metric command names are subject to change and certain metrics are subject to deletion.

You can use the list command to get information about the steps in your job.

gcloud beta dataflow metrics list $JOBID

For this command, the gcloud tool returns:

---
name:
  name: s09-s14-start-msecs
  origin: dataflow/v1b3
scalar: 137
updateTime: '2016-08-29T16:35:50.007Z'
---
name:
  context:
    output_user_name: WordCount.CountWords/Count.PerElement/Init-out0
  name: ElementCount
  origin: dataflow/v1b3
scalar: 26181
updateTime: '2016-08-29T16:35:50.007Z'
---
name:
  context:
    step: s2
  name: emptyLines
  origin: user
scalar: 1080
updateTime: '2016-08-29T16:35:50.007Z'
...

You can use gcloud beta dataflow metrics list command to obtain tentative metrics while your job is running (or shortly after it finishes). To view tentative metrics, run the command with the --tentative flag. A metric marked tentative is updated frequently as worker instances process your pipeline's data, but may decrease if a worker experiences an error. tentative metrics become committed values as a worker finishes work and commits the results.

For a complete list of metrics commands, see the gcloud beta dataflow metrics command in the Google Cloud SDK documentation.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow Documentation