Problem
When a Cloud Dataflow pipeline tries to read query results from BigQuery, it fails with a permission error:
User does not have bigquery.datasets.create permission in project...
Environment
- Cloud Dataflow
- BigQuery
Solution
- Provide the necessary BigQuery permissions to the service account on a project level.
- If you do not wish to provide the service account with broad BigQuery permissions, it is possible to force Dataflow to use a particular temporary dataset to store the query results.
- In Python this is done by specifying the temp_dataset parameter for ReadAllFromBigQuery or ReadFromBigQuery.
- In Java this is achieved through the use of withQueryTempDataset when calling the fromQuery method.
A possible implementation in Python may look like:
from apache_beam.io.gcp.internal.clients import bigquery ... beam.io.ReadFromBigQuery(query=query, use_standard_sql=True, temp_dataset=bigquery.DatasetReference( projectId="PROJECT_ID", datasetId="DATASET")
Cause
When Dataflow reads data from BigQuery, it needs to create a temporary dataset and tables to store the query result data before Dataflow can consume this data as input.