Problem
When a Cloud Dataflow pipeline tries to read query results from BigQuery, it fails with a permission error:
User does not have bigquery.datasets.create permission in project ...
Environment
- Cloud Dataflow
- BigQuery as Input
Solution
- Provide the necessary BigQuery permissions to the service account on a project level.
- For Python and Java: If you do not wish to provide the service account with broad BigQuery permissions, it is possible to force Dataflow to use a particular temporary dataset to store the query results. In Python this is done by specifying the temp_dataset parameter for ReadAllFromBigQuery or ReadFromBigQuery. In Java this is achieved through the use of withQueryTempDataset when calling the fromQuery method.
A separate dataset can then be created for the temporary data and only assign the service account the particular dataset level permissions to create tables and modify data inside this dataset.
A possible implementation in Python may look like:from apache_beam.io.gcp.internal.clients import bigquery ... beam.io.ReadFromBigQuery(query=query, use_standard_sql=True, temp_dataset=bigquery.DatasetReference( projectId="PROJECT_ID", datasetId="DATASET")
Cause
When Dataflow reads data from BigQuery, it needs to create a temporary dataset and tables to store the query result data before Dataflow can consume this data as input.