- NAME
-
- gcloud dataflow sql query - execute the user-specified SQL query on Dataflow
- SYNOPSIS
-
-
gcloud dataflow sql query
QUERY
--job-name
=JOB_NAME
--region
=REGION
([--bigquery-table
=BIGQUERY_TABLE
:--bigquery-dataset
=BIGQUERY_DATASET
--bigquery-project
=BIGQUERY_PROJECT
] [--pubsub-topic
=PUBSUB_TOPIC
:--pubsub-project
=PUBSUB_PROJECT
]) [--bigquery-write-disposition
=BIGQUERY_WRITE_DISPOSITION
; default="write-empty"] [--dataflow-kms-key
=DATAFLOW_KMS_KEY
] [--disable-public-ips
] [--dry-run
] [--max-workers
=MAX_WORKERS
] [--network
=NETWORK
] [--num-workers
=NUM_WORKERS
] [--pubsub-create-disposition
=PUBSUB_CREATE_DISPOSITION
; default="create-if-not-found"] [--service-account-email
=SERVICE_ACCOUNT_EMAIL
] [--subnetwork
=SUBNETWORK
] [--worker-machine-type
=WORKER_MACHINE_TYPE
] [--parameter
=PARAMETER
|--parameters-file
=PARAMETERS_FILE
] [--worker-region
=WORKER_REGION
|--worker-zone
=WORKER_ZONE
|--zone
=ZONE
] [GCLOUD_WIDE_FLAG …
]
-
- DESCRIPTION
-
(DEPRECATED)
This command is deprecated and will be removed January 31, 2025. Please see Beam YAML and Beam notebooks for alternatives.Execute the user-specified SQL query on Dataflow. Queries must comply to the ZetaSQL dialect (https://github.com/google/zetasql). Results may be written to either BigQuery or Cloud Pub/Sub.
- EXAMPLES
-
To execute a simple SQL query on Dataflow that reads from and writes to
BigQuery, run:
gcloud dataflow sql query 'SELECT word FROM
bigquery.table.`my-project`.input_dataset.input_table where count > 3' --job-name=my-job --region=us-west1 \ --bigquery-dataset=my_output_dataset \ --bigquery-table=my_output_tableTo execute a simple SQL query on Dataflow that reads from and writes to Cloud Pub/Sub, run:
gcloud dataflow sql query 'SELECT word FROM pubsub.topic.`my-project`.input_topic where
count > 3' --job-name=my-job --region=us-west1 \ --pubsub-topic=my_output_topicTo join data from BigQuery and Cloud Pub/Sub and write the result to Cloud Pub/Sub, run:
gcloud dataflow sql query 'SELECT bq.name AS name FROM
pubsub.topic.`my-project`.input_topic p INNER JOIN bigquery.table.`my-project`.input_dataset.input_table bq ON p.id = bq.id' --job-name=my-job --region=us-west1 \ --pubsub-topic=my_output_topicTo execute a parameterized SQL query that reads from and writes to BigQuery, run:
gcloud dataflow sql query 'SELECT word FROM
bigquery.table.`my-project`.input_dataset.input_table where count > @threshold' --parameter=threshold:INT64:5 --job-name=my-job \ --region=us-west1 --bigquery-dataset=my_output_dataset \ --bigquery-table=my_output_table - POSITIONAL ARGUMENTS
-
QUERY
- The SQL query to execute.
- REQUIRED FLAGS
-
--job-name
=JOB_NAME
- The unique name to assign to the Cloud Dataflow job.
--region
=REGION
- Region ID of the job's regional endpoint. Defaults to 'us-central1'.
-
The destination(s) for the output of the query.
At least one of these must be specified:
-
BigQuery table resource - The BigQuery table to write query output to. The
arguments in this group can be used to specify the attributes of this resource.
--bigquery-table
=BIGQUERY_TABLE
-
ID of the BigQuery table or fully qualified identifier for the BigQuery table.
To set the
bigquery-table
attribute:-
provide the argument
--bigquery-table
on the command line.
This flag argument must be specified if any of the other arguments in this group are specified.
-
provide the argument
--bigquery-dataset
=BIGQUERY_DATASET
-
The BigQuery dataset ID.
To set the
bigquery-dataset
attribute:-
provide the argument
--bigquery-table
on the command line with a fully specified name; -
provide the argument
--bigquery-dataset
on the command line.
-
provide the argument
--bigquery-project
=BIGQUERY_PROJECT
-
The BigQuery project ID.
To set the
bigquery-project
attribute:-
provide the argument
--bigquery-table
on the command line with a fully specified name; -
provide the argument
--bigquery-project
on the command line.
-
provide the argument
-
Pub/Sub topic resource - The Cloud Pub/Sub topic to write query output to. The
arguments in this group can be used to specify the attributes of this resource.
--pubsub-topic
=PUBSUB_TOPIC
-
ID of the Pub/Sub topic or fully qualified identifier for the Pub/Sub topic.
To set the
pubsub-topic
attribute:-
provide the argument
--pubsub-topic
on the command line.
This flag argument must be specified if any of the other arguments in this group are specified.
-
provide the argument
--pubsub-project
=PUBSUB_PROJECT
-
The Pub/Sub project ID.
To set the
pubsub-project
attribute:-
provide the argument
--pubsub-topic
on the command line with a fully specified name; -
provide the argument
--pubsub-project
on the command line.
-
provide the argument
-
BigQuery table resource - The BigQuery table to write query output to. The
arguments in this group can be used to specify the attributes of this resource.
- OPTIONAL FLAGS
-
--bigquery-write-disposition
=BIGQUERY_WRITE_DISPOSITION
; default="write-empty"-
The behavior of the BigQuery write operation.
BIGQUERY_WRITE_DISPOSITION
must be one of:write-empty
,write-truncate
,write-append
. --dataflow-kms-key
=DATAFLOW_KMS_KEY
- The Cloud KMS key to protect the job resources.
--disable-public-ips
-
The Cloud Dataflow workers must not use public IP addresses. Overrides the
default
dataflow/disable_public_ips
property value for this command invocation. --dry-run
- Construct but do not run the SQL pipeline, for smoke testing.
--max-workers
=MAX_WORKERS
- The maximum number of workers to run.
--network
=NETWORK
- The Compute Engine network for launching instances to run your pipeline.
--num-workers
=NUM_WORKERS
- The initial number of workers to use.
--pubsub-create-disposition
=PUBSUB_CREATE_DISPOSITION
; default="create-if-not-found"-
The behavior of the Pub/Sub create operation.
PUBSUB_CREATE_DISPOSITION
must be one of:create-if-not-found
,fail-if-not-found
. --service-account-email
=SERVICE_ACCOUNT_EMAIL
- The service account to run the workers as.
--subnetwork
=SUBNETWORK
- The Compute Engine subnetwork for launching instances to run your pipeline.
--worker-machine-type
=WORKER_MACHINE_TYPE
- The type of machine to use for workers. Defaults to server-specified.
-
At most one of these can be specified:
--parameter
=PARAMETER
- Parameters to pass to a query. Parameters must use the format name:type:value, for example min_word_count:INT64:250.
--parameters-file
=PARAMETERS_FILE
- Path to a file containing query parameters in JSON format. e.g. [{"parameterType": {"type": "STRING"}, "parameterValue": {"value": "foo"}, "name": "x"}, {"parameterType": {"type": "FLOAT64"}, "parameterValue": {"value": "1.0"}, "name": "y"}]
-
Worker location options.
At most one of these can be specified:
--worker-region
=WORKER_REGION
- The region to run the workers in.
--worker-zone
=WORKER_ZONE
- The zone to run the workers in.
--zone
=ZONE
-
(DEPRECATED) The zone to run the workers in.
The --zone option is deprecated; use --worker-region or --worker-zone instead.
- GCLOUD WIDE FLAGS
-
These flags are available to all commands:
--access-token-file
,--account
,--billing-project
,--configuration
,--flags-file
,--flatten
,--format
,--help
,--impersonate-service-account
,--log-http
,--project
,--quiet
,--trace-token
,--user-output-enabled
,--verbosity
.Run
$ gcloud help
for details. - NOTES
-
This variant is also available:
gcloud beta dataflow sql query
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-06-18 UTC.