Dataflow pipeline reading from BigQuery fails due to missing permission

Problem

When a Cloud Dataflow pipeline tries to read query results from BigQuery, it fails with a permission error:

User does not have bigquery.datasets.create permission in project...

Environment

  • Cloud Dataflow
  • BigQuery

Solution

  1. Provide the necessary BigQuery permissions to the service account on a project level.
  2. If you do not wish to provide the service account with broad BigQuery permissions, it is possible to force Dataflow to use a particular temporary dataset to store the query results.
A separate dataset can then be created for the temporary data and only assign the service account the particular dataset level permissions to create tables and modify data inside this dataset.

A possible implementation in Python may look like:
from apache_beam.io.gcp.internal.clients import bigquery
...
beam.io.ReadFromBigQuery(query=query, use_standard_sql=True, temp_dataset=bigquery.DatasetReference( projectId="PROJECT_ID", datasetId="DATASET")

Cause

When Dataflow reads data from BigQuery, it needs to create a temporary dataset and tables to store the query result data before Dataflow can consume this data as input.