API uploads

The media upload feature allows the BigQuery API to store data in the cloud and make it available to the server. The kind of data that one might want to upload include photos, videos, PDF files, zip files, or any other type of data.

Upload options

The BigQuery API allows you to upload certain types of binary data, or media. The specific characteristics of the data you can upload are specified on the reference page for any method that supports media uploads:

  • Maximum upload file size: The maximum amount of data you can store with this method.
  • Accepted media MIME types: The types of binary data you can store using this method.

You can make upload requests in any of the following ways. Specify the method you are using with the uploadType request parameter.

  • Multipart upload: uploadType=multipart. For quick transfer of smaller files and metadata; transfers the file along with metadata that describes it, all in a single request.
  • Resumable upload: uploadType=resumable. For reliable transfer, especially important with larger files. With this method, you use a session initiating request, which optionally can include metadata. This is a good strategy to use for most applications, since it also works for smaller files at the cost of one additional HTTP request per upload.

When you upload media, you use a special URI. In fact, methods that support media uploads have two URI endpoints:

  • The /upload URI, for the media. The format of the upload endpoint is the standard resource URI with an “/upload” prefix. Use this URI when transferring the media data itself.

    Example: POST /upload/bigquery/v2/projects/projectId/jobs

  • The standard resource URI, for the metadata. If the resource contains any data fields, those fields are used to store metadata describing the uploaded file. You can use this URI when creating or updating metadata values.

    Example: POST /bigquery/v2/projects/projectId/jobs

Multipart upload

If you have metadata that you want to send along with the data to upload, you can make a single multipart/related request. This is a good choice if the data you are sending is small enough to upload again in its entirety if the connection fails.

To use multipart upload, make a POST request to the method's /upload URI and add the query parameter uploadType=multipart, for example:

POST https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=multipart

The top-level HTTP headers to use when making a multipart upload request include:

  • Content-Type. Set to multipart/related and include the boundary string you're using to identify the parts of the request.
  • Content-Length. Set to the total number of bytes in the request body. The media portion of the request must be less than the maximum file size specified for this method.

The body of the request is formatted as a multipart/related content type [RFC2387] and contains exactly two parts. The parts are identified by a boundary string, and the final boundary string is followed by two hyphens.

Each part of the multipart request needs an additional Content-Type header:

  1. Metadata part: Must come first, and Content-Type must match one of the accepted metadata formats.
  2. Media part: Must come second, and Content-Type must match one the method's accepted media MIME types.

See the API reference for each method's list of accepted media MIME types and size limits for uploaded files.

Note: To create or update the metadata portion only, without uploading the associated data, simply send a POST or PUT request to the standard resource endpoint: https://www.googleapis.com/bigquery/v2/projects/projectId/jobs

Example: Multipart upload

The example below shows a multipart upload request for the BigQuery API.

POST /upload/bigquery/v2/projects/projectId/jobs?uploadType=multipart HTTP/1.1
Host: www.googleapis.com
Authorization: Bearer your_auth_token
Content-Type: multipart/related; boundary=foo_bar_baz
Content-Length: number_of_bytes_in_entire_request_body

--foo_bar_baz
Content-Type: application/json; charset=UTF-8

{
  "configuration": {
    "load": {
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "schema": {
        "fields": [
          {"name": "f1", "type": "STRING"},
          {"name": "f2", "type": "INTEGER"}
        ]
      },
      "destinationTable": {
        "projectId": "projectId",
        "datasetId": "datasetId",
        "tableId": "tableId"
      }
    }
  }
}


--foo_bar_baz
Content-Type: */*

CSV, JSON, AVRO, PARQUET, or ORC data
--foo_bar_baz--

If the request succeeds, the server returns the HTTP 200 OK status code along with any metadata:

HTTP/1.1 200
Content-Type: application/json

{
  "configuration": {
    "load": {
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "schema": {
        "fields": [
          {"name": "f1", "type": "STRING"},
          {"name": "f2", "type": "INTEGER"}
        ]
      },
      "destinationTable": {
        "projectId": "projectId",
        "datasetId": "datasetId",
        "tableId": "tableId"
      }
    }
  }
}

Resumable upload

To upload data files more reliably, you can use the resumable upload protocol. This protocol allows you to resume an upload operation after a communication failure has interrupted the flow of data. It is especially useful if you are transferring large files and the likelihood of a network interruption or some other transmission failure is high, for example, when uploading from a mobile client app. It can also reduce your bandwidth usage in the event of network failures because you don't have to restart large file uploads from the beginning.

The steps for using resumable upload include:

  1. Start a resumable session. Make an initial request to the upload URI that includes the metadata, if any.
  2. Save the resumable session URI. Save the session URI returned in the response of the initial request; you'll use it for the remaining requests in this session.
  3. Upload the file. Send the media file to the resumable session URI.

In addition, apps that use resumable upload need to have code to resume an interrupted upload. If an upload is interrupted, find out how much data was successfully received, and then resume the upload starting from that point.

Note: An upload URI expires after one week.

Step 1: Start a resumable session

To initiate a resumable upload, make a POST request to the method's /upload URI and add the query parameter uploadType=resumable, for example:

POST https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable

For this initiating request, the body is either empty or it contains the metadata only; you'll transfer the actual contents of the file you want to upload in subsequent requests.

Use the following HTTP headers with the initial request:

  • X-Upload-Content-Type. Set to the media MIME type of the upload data to be transferred in subsequent requests.
  • X-Upload-Content-Length. Set to the number of bytes of upload data to be transferred in subsequent requests.  If the length is unknown at the time of this request, you can omit this header.
  • If providing metadata: Content-Type. Set according to the metadata's data type.
  • Content-Length. Set to the number of bytes provided in the body of this initial request. Not required if you are using chunked transfer encoding.

See the API reference for each method's list of accepted media MIME types and size limits for uploaded files.

Example: Resumable session initiation request

The following example shows how to initiate a resumable session for the BigQuery API.

POST /upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable HTTP/1.1
Host: www.googleapis.com
Authorization: Bearer your_auth_token
Content-Length: 38
Content-Type: application/json; charset=UTF-8
X-Upload-Content-Type: */*
X-Upload-Content-Length: 2000000

{
  "configuration": {
    "load": {
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "schema": {
        "fields": [
          {"name": "f1", "type": "STRING"},
          {"name": "f2", "type": "INTEGER"}
        ]
      },
      "destinationTable": {
        "projectId": "projectId",
        "datasetId": "datasetId",
        "tableId": "tableId"
      }
    }
  }
}

Note: For an initial resumable update request without metadata, leave the body of the request empty, and set the Content-Length header to 0.

The next section describes how to handle the response.

Step 2: Save the resumable session URI

If the session initiation request succeeds, the API server responds with a 200 OK HTTP status code. In addition, it provides a Location header that specifies your resumable session URI. The Location header, shown in the example below, includes an upload_id query parameter portion that gives the unique upload ID to use for this session.

Example: Resumable session initiation response

Here is the response to the request in Step 1:

HTTP/1.1 200 OK
Location: https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable&upload_id=xa298sd_sdlkj2
Content-Length: 0

The value of the Location header, as shown in the above example response, is the session URI you'll use as the HTTP endpoint for doing the actual file upload or querying the upload status.

Copy and save the session URI so you can use it for subsequent requests.

Step 3: Upload the file

To upload the file, send a PUT request to the upload URI that you obtained in the previous step. The format of the upload request is:

PUT session_uri

The HTTP headers to use when making the resumable file upload requests includes Content-Length. Set this to the number of bytes you are uploading in this request, which is generally the upload file size.

Example: Resumable file upload request

Here is a resumable request to upload the entire 2,000,000 byte CSV, JSON, AVRO, PARQUET, or ORC file for the current example.

PUT https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable&upload_id=xa298sd_sdlkj2 HTTP/1.1
Content-Length: 2000000
Content-Type: */*

bytes 0-1999999

If the request succeeds, the server responds with an HTTP 201 Created, along with any metadata associated with this resource. If the initial request of the resumable session had been a PUT, to update an existing resource, the success response would be  200 OK, along with any metadata associated with this resource.

If the upload request is interrupted or if you receive an HTTP 503 Service Unavailable or any other 5xx response from the server, follow the procedure outlined in resume an interrupted upload.  


Uploading the file in chunks

With resumable uploads, you can break a file into chunks and send a series of requests to upload each chunk in sequence. This is not the preferred approach since there are performance costs associated with the additional requests, and it is generally not needed. However, you might need to use chunking to reduce the amount of data transferred in any single request. This is helpful when there is a fixed time limit for individual requests, as is true for certain classes of Google App Engine requests. It also lets you do things like providing upload progress indications for legacy browsers that don't have upload progress support by default.