This document describes how to manage jobs, datasets, and projects.
Jobs are used to start all potentially long-running actions, for instance: queries, table import, and export requests. Shorter actions, such as list or get requests, are not managed by a job resource.
To perform a job-managed action, you will create a job of the appropriate type, then periodically request the job resource and examine its status property to learn when the job is complete, and then check to see whether it finished successfully. Note that there are some wrapper functions that manage the status requests for you: for examples, you can run jobs.query which creates the job and periodically polls for DONE status for a specified period of time.
Jobs in BigQuery persist forever. This includes jobs that are running or completed, whether they have succeeded or failed. You can only list or get information about jobs that you have started, unless you are a project owner, who can perform all actions on any jobs associated with their project.
Every job is associated with a specific project that you specify; this project is billed for any usage incurred by the job. In order to run a job of any kind, you must have READ permissions on the job's project.
Here is how to run a standard job:
- Start the job by calling the
jobs.insertmethod using a unique job ID generated by your client code. The server will generate a job ID for you if you omit it, but we recommend generating it on the client side to allow reliable retry of the
- Check job status by calling
jobs.getwith the job ID and check the
status.statevalue to learn the job status. When
status.state=DONE, the job has stopped running; however, a DONE status does not mean that the job completed successfully, only that it is no longer running.
- Check for job success. If the job has a
status.errorResultproperty, the job has failed; this property holds information describing what went wrong in a failed job. If
status.errorResultis absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows in an import request. Non-fatal errors are listed in the returned job's
See the asynchronous query as an example of starting and polling a job.
There is no single-call method to re-run a job; if you want to re-run a specific job:
- Call jobs.get to retrieve the resource for the job to re-run,
- Remove the id, status, and statistics field. Change the jobId field to a new value generated by your client code. Change any other fields as necessary.
- Call jobs.insert with the modified resource and the new job ID to start the new job.
Running or pending jobs can be cancelled by calling jobs.cancel. Cancelling a running query job may incur charges up to the full cost for the query were it allowed to run to completion.
See jobs in the reference section for more information.
Generating a job ID
As a best practice, you should generate a job ID using your client code and
send that job ID when you call
If you call
jobs.insert without specifying a job ID,
BigQuery will create a job ID for you, but you will not be
able to check the status of that job until the call returns. Moreover,
it may be difficult to tell whether the job was successfully inserted or not.
If you use your own job ID, you can check the status of the job at any time
and you can retry on the same job ID to ensure that the job starts exactly
The job ID is a string comprising letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-), with a maximum length of 1,024 characters. Job IDs must be unique within any given project.
A common approach to generating a unique job ID is to use a human-readable
prefix and a suffix consisting of a timestamp or a GUID. For example:
"daily_import_job_1447971251". An example of a method that
generates GUIDs can be found in the
Python UUID module.
For an example of using the Python
uuid4() method with
jobs.insert, see the example code in
Loading data from Google Cloud Storage.
A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. Read more about datasets in the reference section. A dataset is contained within a specific project. You can list datasets to which you have access by calling bigquery.datasets.list.
Choosing a location
You can optionally choose the geographic location for your dataset when the dataset is created. All tables within the dataset inherit the same location value. Possible options include:
- "US": United States
- "EU": European Union
For legal information about the location feature, see the Google Cloud Platform Service Specific Terms.
- You can only set the geographic location at creation time. After a dataset has been created, the location becomes immutable and can't be changed by the
- All tables referenced in a query must be stored in the same location.
- You can stream data into a US or EU dataset, but inserting data across these locations can increase latency and error rates.
- When copying a table, the destination dataset must reside in the same location.
- Google Cloud Logging is unsupported for EU datasets.
- Google Analytics Premium customers who export their data to BigQuery must use a US-based BigQuery dataset as the destination.
Setting the location
To set the dataset location:
BigQuery web UI
When creating a dataset, select the location from the Data location dropdown.
BigQuery command-line tool
This sample uses the Google APIs Client Library for Java.
This sample uses the Google APIs Client Library for Python.
This sample uses the Google APIs Client Library for .NET.
This sample uses the Google APIs Client Library for PHP.
A project holds a group of datasets. Projects are created and managed in the APIs console. Jobs are billed to the project to which they are assigned. You can list projects to which you have access by calling bigquery.projects.list.