Hide
BigQuery

Managing Jobs, Datasets, and Projects

This document describes how to manage jobs, datasets, and projects.

Contents

Jobs

Jobs are used to start all potentially long-running actions, for instance: queries, table import, and export requests. Shorter actions, such as list or get requests, are not managed by a job resource.

To perform a job-managed action, you will create a job of the appropriate type, then periodically request the job resource and examine its status property to learn when the job is complete, and then check to see whether it finished successfully. Note that there are some wrapper functions that manage the status requests for you: for examples, you can run jobs.query which creates the job and periodically polls for DONE status for a specified period of time.

Jobs in BigQuery persist forever. This includes jobs that are running or completed, whether they have succeeded or failed. You can only list or get information about jobs that you have started, unless you are a project owner, who can perform all actions on any jobs associated with their project.

Every job is associated with a specific project that you specify; this project is billed for any usage incurred by the job. In order to run a job of any kind, you must have READ permissions on the job's project.

Here is how to run a standard job:

  1. Start the job by calling the generic jobs.insert method; the method call returns immediately with the job resource, which includes a jobId that is used to identify this job later.
  2. Check job status by calling jobs.get with the job ID returned by the initial request and check the status.state value to learn the job status. When status.state=DONE, the job has stopped running; however, a DONE status does not mean that the job completed successfully, only that it is no longer running.
  3. Check for job success. If the job has a status.errorResult property, the job has failed; this property holds information describing what went wrong in a failed job. If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows in an import request. Non-fatal errors are listed in the returned job's status.errors list.

See the asynchronous query as an example of starting and polling a job.

There is no single-call method to re-run a job; if you want to re-run a specific job:

  1. Call jobs.get to retrieve the resource for the job to re-run,
  2. Remove the id, jobId, status, and statistics field. Change any other fields as necessary.
  3. Call jobs.insert with the modified resource to start the new job.

Running or pending jobs can be cancelled by calling jobs.cancel. Cancelling a running query job may incur charges up to the full cost for the query were it allowed to run to completion.

See jobs in the reference section for more information.

Back to top

Datasets

A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. Read more about datasets in the reference section. A dataset is contained within a specific project. You can list datasets to which you have access by calling bigquery.datasets.list.

Choosing a location

You can optionally choose the geographic location for your dataset when the dataset is created. All tables within the dataset inherit the same location value. Possible options include:

  • "US": United States
  • "EU": European Union

For legal information about the location feature, see the Google Cloud Platform Service Specific Terms.

Location limitations

  • You can only set the geographic location at creation time. After a dataset has been created, the location becomes immutable and can't be changed by the patch or update methods.
  • All tables referenced in a query must be stored in the same location.
  • It is not possible to stream data into a EU dataset.
  • When copying a table, the destination dataset must reside in the same location.
  • Google Cloud Logging is unsupported for EU datasets.
  • Google Analytics Premium customers who export their data to BigQuery must use a US-based BigQuery dataset as the destination.

Setting the location

To set the dataset location:

BigQuery web UI

When creating a dataset, select the location from the Data location dropdown.

BigQuery command-line tool

Use the --data_location=<location> flag.

BigQuery API

Set the location property.

Examples

Java

This sample uses the Google APIs Client Library for Java.

Python

This sample uses the Google APIs Client Library for Python.

Projects

A project holds a group of datasets. Projects are created and managed in the APIs console. Jobs are billed to the project to which they are assigned. You can list projects to which you have access by calling bigquery.projects.list.

See projects in the reference section and Managing Projects in the APIs Console help for more information.

Example

Java

This sample uses the Google APIs Client Library for Java.

Python

This sample uses the Google APIs Client Library for Python.