Similar BigQuery queries take widely different times to run

Problem

Similar BigQuery queries that scan similar data take widely different times to run, with some jobs running fast and some very slow. Some may even time out after 6 hours.

Environment

  • BigQuery queries run from the Cloud Console.
  • No apparent differences in query plans.

Solution

There is no general solution.

Workaround
  1. Partitioning the data to ensure equally-sized partitions may help.

Cause

It may happen that the root cause of query slowness in this case is the way BigQuery splits left join input.

When splitting left join input, BigQuery stops splitting when it reaches ~1000 sinks. The way the split happens is non-deterministic and depends on the order of reading the left side input. Because of this, sometimes some large sinks are not getting split, which causes the query to become slow.