Running BigQuery Jobs Programmatically

To run a BigQuery job programmatically using the REST API or client libraries, you:

  1. Call the jobs.insert method using a unique job ID that is generated by your client code
  2. Periodically request the job resource and examine the status property to learn when the job is complete
  3. Check to see whether the job finished successfully

Required permissions

To run BigQuery jobs, grant bigquery.jobs.create permissions to the user or service account used to submit the job, or grant your account a project-level predefined IAM role that incudes bigquery.jobs.create permissions. The following predefined IAM roles include bigquery.jobs.create permissions:

For more information on IAM roles and permissions in BigQuery, see access control.

Running jobs

To run a job programmatically:

  1. Start the job by calling the jobs.insert method. When you call the jobs.insert method, include a job resource representation that contains:

    • Your location in the location property in the jobReference section.
    • The job ID generated by your client code. The server generates a job ID for you if you omit it, but it is a best practice to generate the job ID on the client side to allow reliable retry of the jobs.insert call..

      For example:

      {
        "jobReference": {
          "projectId": "my_project",
          "jobId": "job_123",
          “location”: “asia-northeast1”
        },
        "configuration":
        {
          // ..
        },
      }
      

  2. In the configuration section of the job resource, include a child property that specifies the job type — load, query, extract, or copy.

  3. After calling the jobs.insert method, check the job status by calling jobs.get with the job ID and location, and check the status.state value to learn the job status. When status.state is DONE, the job has stopped running; however, a DONE status does not mean that the job completed successfully, only that it is no longer running.

  4. Check for job success. If the job has a status.errorResult property, the job has failed. The status.errorResult property holds information describing what went wrong in a failed job. If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows in a load job. Non-fatal errors are returned in the job's status.errors list.

Generating a job ID

As a best practice, generate a job ID using your client code and send that job ID when you call jobs.insert. If you call jobs.insert without specifying a job ID, BigQuery will create a job ID for you, but you will not be able to check the status of that job until the call returns. As well, it might be difficult to tell if the job was successfully inserted. If you use your own job ID, you can check the status of the job at any time, and you can retry on the same job ID to ensure that the job starts exactly one time.

The job ID is a string comprising letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-), with a maximum length of 1,024 characters. Job IDs must be unique within a project.

A common approach to generating a unique job ID is to use a human-readable prefix and a suffix consisting of a timestamp or a GUID. For example: daily_import_job_1447971251.

An example of a method that generates GUIDs can be found in the Python UUID module. For an example of using the Python uuid4() method with jobs.insert, see Loading data from Google Cloud Storage.

Next steps

  • See Running queries for a code example that starts and polls a query job.
  • For more information on creating a job resource representation, see the Jobs overview page in the API reference.
Was this page helpful? Let us know how we did:

Send feedback about...