This page explains how to run parameterized Dataflow SQL queries.
Dataflow SQL supports named and positional query parameters. Specify query parameters with the following syntax:
- Named parameters with the
@
character followed by an identifier, such as@param_name
- Positional parameters with the
?
character
You can use either named or positional parameters in a query but not both.
Running parameterized queries
To run a parameterized query, define the data type and value of the query parameters. The type and value of the query parameters must match the type of the database objects that the parameters represent.
You can define query parameters using the Google Cloud console or Google Cloud CLI:
Console
To define query parameters, use the Dataflow SQL UI:
Go to the Dataflow SQL UI.
Enter the Dataflow SQL query into the query editor.
For example, the following query selects taxi rides that cost at least the specified price:
SELECT * FROM pubsub.topic.`pubsub-public-data`.`taxirides-realtime` WHERE ride_status = "dropoff" AND meter_reading >= @price_min
Click Create Cloud Dataflow job to open a panel of job options.
In the SQL query parameters section of the panel, fill in the Name, Type, and Value fields. The name of the parameter in the example is
price_min
and the type isFLOAT64
.If the parameter is positional, omit the Name field.
(Optional) If the query contains more parameters, click Add parameter.
The Google Cloud console does not support ARRAY
or STRUCT
parameters. To specify ARRAY
or STRUCT
parameters, use the
Google Cloud CLI.
gcloud
To define query parameters, use the
gcloud dataflow sql query
command and --parameter
flags.
You can repeat the --parameter
flag for each query parameter. Set each
--parameter
flag to
name:type:value
.
The following command runs a parameterized query on a Pub/Sub stream of taxi rides:
gcloud dataflow sql query \ --job-name=job-name \ --region=region \ --bigquery-dataset=destination-dataset \ --bigquery-table=destination-table \ --parameter=status:STRING:dropoff \ --parameter=price_min:FLOAT64:5.5 \ 'SELECT * FROM pubsub.topic.`pubsub-public-data`.`taxirides-realtime` WHERE ride_status = @status AND meter_reading >= @price_min'
If the parameter is positional, set --parameter
flags to
:type:value
.
Using arrays in parameterized queries
To use an array type in a query parameter, set the type to ARRAY<data-type>
.
Construct the value as a comma-separated list of elements enclosed in
square brackets, such as ["pickup", "enroute", "dropoff"]
.
Console
The Dataflow SQL UI does not support ARRAY
parameters in Dataflow SQL
queries.
gcloud
To define an array parameter, set the --parameter
flag of the
gcloud dataflow sql query
command to name:ARRAY<data-type>:value
The following command runs query with a named array parameter on a Pub/Sub stream of taxi rides:
gcloud dataflow sql query \ --job-name=job-name --region=region --bigquery-dataset=destination-dataset \ --bigquery-table=destination-table \ --parameter='status:ARRAY<STRING>:["pickup", "enroute", "dropoff"]' \ 'SELECT * FROM pubsub.topic.`pubsub-public-data`.`taxirides-realtime` WHERE ride_status IN UNNEST(@status)'
For more information, see the Dataflow SQL reference for ARRAY
types.
Using timestamps in parameterized queries
To use a timestamp in a query parameter, set the type to STRING
but construct
the value in the canonical timestamp format.
Console
To define timestamps parameters, use the Dataflow SQL UI:
Go to the Dataflow SQL UI.
Enter the Dataflow SQL query into the query editor.
For example, the following query selects taxi rides that occurred after the specified date:
SELECT * FROM pubsub.topic.`pubsub-public-data`.`taxirides-realtime` WHERE event_timestamp >= TIMESTAMP (@date_min)
Click Create Cloud Dataflow job to open a panel of job options.
In the SQL query parameters section, click Add parameter.
In the Type field, select
STRING
.In the Value field, enter a timestamp.
gcloud
To define an timestamp parameter, set the --parameter
flag of the
gcloud dataflow sql query
command to name:STRING:value
The following command runs query with a named timestamp parameter on a Pub/Sub stream of taxi rides:
gcloud dataflow sql query \ --job-name=job-name \ --region=region \ --bigquery-dataset=destination-dataset \ --bigquery-table=destination-table \ --parameter='date_min:STRING:2020-01-01 00:00:00.000 UTC' \ 'SELECT * FROM pubsub.topic.`pubsub-public-data`.`taxirides-realtime` WHERE event_timestamp >= TIMESTAMP (@date_min)'
For more information, see the Dataflow SQL reference for the TIMESTAMP
types.
Using structs in parameterized queries
To use a struct in a query parameter, set the type to STRUCT<field-name data-type, ...>
.
Construct the value as a comma-separated list of key-value pairs.
For example, STRUCT<date_min TIMESTAMP, status STRING>
defines a struct with a
field named date_min
of type TIMESTAMP
and a field named status
of type
STRING
.
Console
The Dataflow SQL UI does not support STRUCT
parameters in Dataflow SQL
queries.
gcloud
To define an array parameter, set the --parameter
flag of the
gcloud dataflow sql query
command to name:STRUCT<field-name data-type, ...>:value
The following command runs query with a named struct parameter on a Pub/Sub stream of taxi rides:
gcloud dataflow sql query \ --job-name=job-name \ --region=region \ --bigquery-dataset=destination-dataset \ --bigquery-table=destination-table \ --parameter='rides:STRUCT<riders_min INT64, status STRING>: {"riders_min": 2, "status": "dropoff"}' \ 'SELECT * FROM pubsub.topic.`pubsub-public-data`.`taxirides-realtime` WHERE passenger_count >= @rides.riders_min AND ride_status = @rides.status'
For more information, see the Dataflow SQL reference for the STRUCT
types.