Running and Managing Jobs

Jobs are actions that you construct and that BigQuery executes on your behalf to load data, export data, query data, or copy data.

Because jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status. Shorter actions, such as listing resources or getting metadata are not managed by a job resource.

When you perform actions in the web UI or CLI, a job resource is automatically created and run. When you use the API, you start a job by calling the jobs.insert method using a unique job ID that is generated by your client code.

Billing

Every job is associated with a specific project that you specify. The billing account attached to the associated project is billed for any usage incurred by the job. If you share access to a project, any jobs run in the project are billed to the attached billing account.

For more information, see Pricing.

Running jobs

In order to run a job, you must have bigquery.jobs.create permissions. The following predefined IAM roles include bigquery.jobs.create permissions:

For more information on IAM roles and permissions in BigQuery, see access control.

You are not required to run jobs when you perform actions in the BigQuery web UI or CLI. When you submit queries, load data, export data, or copy data, a job is automatically generated and submitted on your behalf.

If you need to run jobs programmatically, use the REST API and client libraries.

After you submit a BigQuery job, it can be in one of three states:

  • PENDING (scheduled)
  • RUNNING
  • DONE (reported as SUCCESS or FAILURE if the job completed with errors)

You can retrieve the job status by viewing job data.

Running jobs programmatically

To perform a job-managed action, you:

  1. Create a job of the appropriate type
  2. Periodically request the job resource and examine the status property to learn when the job is complete
  3. Check to see whether the job finished successfully

To run a job programmatically:

  1. Start the job by calling the jobs.insert method using a unique job ID generated by your client code. The server generates a job ID for you if you omit it, but it is a best practice to generate the job ID on the client side to allow reliable retry of the jobs.insert call.

  2. Check job status by calling jobs.get with the job ID and check the status.state value to learn the job status. When status.state is DONE, the job has stopped running; however, a DONE status does not mean that the job completed successfully, only that it is no longer running.

  3. Check for job success. If the job has a status.errorResult property, the job has failed. The status.errorResult property holds information describing what went wrong in a failed job. If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows in a load job. Non-fatal errors are returned in the job's status.errors list.

For more information, see:

Generating a job ID

As a best practice, generate a job ID using your client code and send that job ID when you call jobs.insert. If you call jobs.insert without specifying a job ID, BigQuery will create a job ID for you, but you will not be able to check the status of that job until the call returns. As well, it might be difficult to tell if the job was successfully inserted. If you use your own job ID, you can check the status of the job at any time, and you can retry on the same job ID to ensure that the job starts exactly one time.

The job ID is a string comprising letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-), with a maximum length of 1,024 characters. Job IDs must be unique within a project.

A common approach to generating a unique job ID is to use a human-readable prefix and a suffix consisting of a timestamp or a GUID. For example: daily_import_job_1447971251.

An example of a method that generates GUIDs can be found in the Python UUID module. For an example of using the Python uuid4() method with jobs.insert, see Loading data from Google Cloud Storage.

Cancelling a job

You can cancel a RUNNING or PENDING job in the web UI, CLI, or API. However, not all job types can be cancelled. If the job cannot be cancelled, an error is returned.

Even if the job can be cancelled, success is not guaranteed. The job might have completed by the time the cancel request is submitted, or the job might be in a stage where it cannot be cancelled.

In order to cancel a job, you must have bigquery.jobs.update permissions. The following predefined IAM role includes bigquery.jobs.update permissions:

The following roles can cancel self-created jobs:

For more information on IAM roles and permissions in BigQuery, see access control.

Web UI

  1. In the navigation pane, click Job History.

  2. In the Recent Jobs section, click the job you're cancelling. The most recent jobs appear at the top of the list.

  3. In the job details, click Cancel Job.

    Cancel job

CLI

Issue the bq cancel command with the job_id parameter. You can request cancellation and return immediately by using the --nosync flag. By default, cancellation requests wait for completion.

The following command requests job cancellation and waits for completion:

bq cancel [JOB_ID]

The following command requests job cancellation and returns immediately:

bq --nosync cancel [JOB_ID]

Where:

  • [JOB_ID] is the ID of the job you're cancelling.

For example, the following command cancels job bquijob_123x456_123y123z123c running in myproject and waits for completion:

bq cancel myproject:bquijob_123x456_123y123z123c

API

Call jobs.cancel and provide the jobId and projectId parameters.

Viewing job data

You can view job data and metadata by using the web UI, CLI, and API. This data includes details such as the job type, the job state, and the user who ran the job.

In order to get job data and metadata, you must have bigquery.jobs.get permissions. The following predefined IAM role includes bigquery.jobs.get permissions:

The following roles are granted bigquery.jobs.get permissions for self-created jobs:

For more information on IAM roles and permissions in BigQuery, see access control.

Web UI

  1. In the navigation pane, click Job History.

  2. In the Recent Jobs section, click the job to view the details.

CLI

Issue the bq show command with the -j flag and the job_id parameter.

The following command requests information about a job:

bq show -j [JOB_ID]

Where:

  • [JOB_ID] is the ID of the job you're getting.

For example, the following command gets summary information about job bquijob_123x456_123y123z123c running in myproject:

bq show -j myproject:bquijob_123x456_123y123z123c

The output looks like the following:

 Job Type    State      Start Time      Duration      User Email       Bytes Processed   Bytes Billed   Billing Tier   Labels


extract SUCCESS 06 Jul 11:32:10 0:01:41 user@example.com

To see full job details, enter:

bq show --format=prettyjson -j myproject:bquijob_123x456_789y123z456c

The output looks like the following:

{
  "configuration": {
    "extract": {
      "compression": "NONE",
      "destinationUri": "[URI removed]",
      "destinationUris": [
        "[URI removed]"
      ],
      "sourceTable": {
        "datasetId": "github_repos",
        "projectId": "bigquery-public-data",
        "tableId": "commits"
      }
    }
  },
  "etag": "\"[etag removed]\"",
  "id": "myproject:bquijob_123x456_789y123z456c",
  "jobReference": {
    "jobId": "bquijob_123x456_789y123z456c",
    "projectId": "[Project ID removed]"
  },
  "kind": "bigquery#job",
  "selfLink": "https://www.googleapis.com/bigquery/v2/projects/federated-testing/jobs/bquijob_123x456_789y123z456c",
  "statistics": {
    "creationTime": "1499365894527",
    "endTime": "1499365894702",
    "startTime": "1499365894702"
  },
  "status": {
    "errorResult": {
      "debugInfo": "[Information removed for readability]",
      "message": "Operation cannot be performed on a nested schema. Field: author",
      "reason": "invalid"
    },
    "errors": [
      {
        "message": "Operation cannot be performed on a nested schema. Field: author",
        "reason": "invalid"
      }
    ],
    "state": "DONE"
  },
  "user_email": "user@example.com"
}

API

Call jobs.get and provide the jobId and projectId parameters.

Listing jobs

Your project maintains your job history for all jobs created in the past six months. To request automatic deletion of jobs that are more than 50 days old, contact support.

You can view your BigQuery job history via the Google Cloud Platform Console, the CLI, or the API. This history includes jobs that are in the RUNNING state and jobs that are DONE (indicated by reporting the state as SUCCESS or FAILURE).

In order to list jobs, you must have bigquery.jobs.list permissions. The following predefined IAM roles include bigquery.jobs.list permissions:

The following role is granted bigquery.jobs.list permissions for self-created jobs:

For more information on IAM roles and permissions in BigQuery, see access control.

Web UI

  1. In the navigation pane, click Job History.

  2. In the Recent Jobs section, your jobs are listed by creation time with the most recent jobs at the top.

CLI

Issue the bq ls command with the -j flag. The -j flag lists all accessible jobs in a project. To limit the results, use the -n flag.

The following command lists all jobs in a project:

bq ls -j [PROJECT_ID]

The following command lists the most recent jobs in a project:

bq ls -j -n [INTEGER] [PROJECT_ID]

Where:

  • [PROJECT_ID] is the ID of the project that contains the listed jobs.
  • [INTEGER] is an integer that indicates the number of jobs returned.

For example, the following command lists the 10 most recent jobs in myproject:

bq ls -j -n 10 myproject-12312324

API

Call jobs.list and provide the projectId parameter.

Repeating a job

It is not possible to repeat a job using the same job ID. Instead you create a new job with the same configuration. When you submit the new job in the web UI or CLI, a new job ID is assigned. When you submit the job using the API or client libraries, you must generate a new job ID.

In order to run a job, you must have bigquery.jobs.create permissions. The following predefined IAM roles include bigquery.jobs.create permissions:

For more information on IAM roles and permissions in BigQuery, see access control.

Web UI

To repeat a query job:

  1. In the navigation pane, click Query History.

  2. In the Queries section, to the right of the query, click Open Query.

  3. Click Run Query.

To repeat a load job:

  1. In the navigation pane, click Job History.

  2. In the Recent Jobs section, click the job you want to repeat. The most recent jobs appear at the top of the list.

  3. In the job details, click Repeat Load Job.

CLI

Issue your command again and BigQuery automatically generates a job with a new job ID.

API

There is no single-call method to repeat a job; if you want to repeat a specific job:

  1. Call jobs.get to retrieve the resource for the job to repeat.

  2. Remove the id, status, and statistics field. Change the jobId field to a new value generated by your client code. Change any other fields as necessary.

  3. Call jobs.insert with the modified resource and the new job ID to start the new job.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...