This page discusses loading data from a local data source.
For tutorials on loading data from a local data source, see:
- Cloud Console: Quickstart using the web UI
- Command line: Quickstart using the bq command-line tool
Overview
You can load data from a readable data source (such as your local machine) by:
- Using the Cloud Console or the classic BigQuery web UI
- Using the CLI's
bq load
command - Using the API
- Using the client libraries
When you load data using the Cloud Console, the classic BigQuery web UI, or the CLI, a load job is automatically created.
Limitations
Loading data from a local data source is subject to the following limitations:
- Wildcards and comma separated lists are not supported when you load files from a local data source. Files must be loaded individually.
- When using the classic BigQuery web UI, files loaded from a local data source must be 10 MB or less and must contain fewer than 16,000 rows.
Required permissions
At a minimum, to load data into BigQuery, you must be granted the following permissions:
bigquery.tables.create
to create a new tablebigquery.tables.updateData
if you are overwriting or appending a tablebigquery.jobs.create
to run the load job
The following predefined Cloud IAM roles include both
bigquery.tables.create
and bigquery.tables.updateData
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
The following predefined Cloud IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to load data into
tables in the dataset.
For more information on Cloud IAM roles and permissions in BigQuery, see Access control.
Loading data from a local data source
To load data from a local data source:
Console
Open the BigQuery web UI in the Cloud Console.
Go to the Cloud ConsoleIn the navigation panel, in the Resources section, expand your project and select a dataset.
On the right side of the window, in the details panel, click Create table. The process for loading data is the same as the process for creating an empty table.
On the Create table page, in the Source section:
For Create table from, select Upload.
Below Select file click Browse.
Browse to the file, and click Open. Note that wildcards and comma-separated lists are not supported for local files.
For File format, select CSV, JSON (newline delimited), Avro, Parquet, or ORC.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Table name field, enter the name of the table you're creating in BigQuery.
Verify that Table type is set to Native table.
In the Schema section, enter the schema definition.
For CSV and JSON files, you can check the Auto-detect option to enable schema auto-detect. Schema information is self-described in the source data for other supported file types.
You can also enter schema information manually by:
Clicking Edit as text and entering the table schema as a JSON array:
Using Add Field to manually input the schema.
Select applicable items in the Advanced options section and then click Create Table. For information on the available options, see CSV options and JSON options.
Classic UI
Go to the BigQuery web UI.
Go to the BigQuery web UIIn the navigation panel, hover on a dataset, click the down arrow icon
, and click Create new table. The process for loading data is the same as the process for creating an empty table.
On the Create Table page, in the Source Data section:
- For Location, select File upload, click Choose file, browse to the file, and click Open. Note that wildcards and comma-separated lists are not supported for local files.
- For File format, select (CSV), JSON (newline delimited), Avro, Parquet, or ORC.
On the Create Table page, in the Destination Table section:
- For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're creating in BigQuery.
- Verify that Table type is set to Native table.
In the Schema section, enter the schema definition.
For CSV and JSON files, you can check the Auto-detect option to enable schema auto-detect. Schema information is self-described in the source data for other supported file types.
You can also enter schema information manually by:
Clicking Edit as text and entering the table schema as a JSON array:
Using Add Field to manually input the schema:
Select applicable items in the Options section and then click Create Table. For information on the available options, see CSV options and JSON options.
CLI
Use the bq load
command, specify the source_format
, and include the path
to the local file.
(Optional) Supply the --location
flag and set the value to your
location.
If you are loading data in a project other than your default project, add
the project ID to the dataset in the following format:
project_id:dataset
.
bq --location=location load \ --source_format=format \ project_id:dataset.table \ path_to_source \ schema
where:
- location is your location. The
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file. - format is
CSV
,AVRO
,PARQUET
,ORC
, orNEWLINE_DELIMITED_JSON
. - project_id is your project ID.
- dataset is an existing dataset.
- table is the name of the table into which you're loading data.
- path_to_source is the path to the local file.
- schema is a valid schema. The schema can be a local JSON file,
or it can be typed inline as part of the command. You can also use the
--autodetect
flag instead of supplying a schema definition.
In addition, you can add flags for options that allow you to control how
BigQuery parses your data. For example, you can use the
--skip_leading_rows
flag to ignore header rows in a CSV file. For more
information, see CSV options and JSON options.
Examples:
The following command loads a newline-delimited JSON file (mydata.json
)
from your local machine into a table named mytable
in mydataset
in your
default project. The schema is defined in a local schema file named
myschema.json
.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
./mydata.json \
./myschema.json
The following command loads a CSV file (mydata.csv
) from your local
machine into a table named mytable
in mydataset
in myotherproject
. The
schema is defined inline in the format
field:data_type, field:data_type
.
bq load \
--source_format=CSV \
myotherproject:mydataset.mytable \
./mydata.csv \
qtr:STRING,sales:FLOAT,year:STRING
The following command loads a CSV file (mydata.csv
) from your local
machine into a table named mytable
in mydataset
in your default project.
The schema is defined using schema auto-detection.
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
./mydata.csv
C#
Before trying this sample, follow the C# setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery C# API reference documentation
.
UploadCsvOptions
.
Go
Before trying this sample, follow the Go setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery Go API reference documentation
.
NewReaderSource
to the appropriate format.
Java
Before trying this sample, follow the Java setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery Java API reference documentation
.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery Node.js API reference documentation
.
metadata
parameter of the
load
function to the appropriate format.
PHP
Before trying this sample, follow the PHP setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery PHP API reference documentation
.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery Python API reference documentation
.
Ruby
Before trying this sample, follow the Ruby setup instructions in the
BigQuery Quickstart Using Client Libraries
.
For more information, see the
BigQuery Ruby API reference documentation
.
format
parameter of the
Table#load_job
method to the appropriate format.
Appending to or overwriting a table using a local file
You can load additional data into a table either from source files or by appending query results. If the schema of the data does not match the schema of the destination table or partition, you can update the schema when you append to it or overwrite it.
If you update the schema when appending data, BigQuery allows you to:
- Add new fields
- Relax
REQUIRED
fields toNULLABLE
If you are overwriting a table, the schema is always overwritten. Schema updates are not restricted when you overwrite a table.
In the console or the classic BigQuery web UI, you use the Write preference option to specify what action to take when you load data from a source file or from a query result. The CLI and API include the following options:
Console option | Classic UI option | CLI flag | BigQuery API property | Description |
---|---|---|---|---|
Write if empty | Write if empty | None | WRITE_EMPTY | Writes the data only if the table is empty. |
Append to table | Append to table | --noreplace or --replace=false ; if
--replace is unspecified, the default is append |
WRITE_APPEND | (Default) Appends the data to the end of the table. |
Overwrite table | Overwrite table | --replace or --replace=true |
WRITE_TRUNCATE | Erases all existing data in a table before writing the new data. |
To load CSV, JSON, Avro, Parquet, or ORC data from a local file and to append to or overwrite a BigQuery table:
Console
Open the BigQuery web UI in the Cloud Console.
Go to the Cloud ConsoleIn the navigation panel, in the Resources section, expand your project and select a dataset.
On the right side of the window, in the details panel, click Create table. The process for loading data is the same as the process for creating an empty table.
On the Create table page, in the Source section:
For Create table from, select Upload.
Below Select file click Browse.
Browse to the file, and click Open. Note that wildcards and comma-separated lists are not supported for local files.
For File format, select CSV, JSON (newline delimited), Avro, Parquet, or ORC.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Table name field, enter the name of the table you're creating in BigQuery.
Verify that Table type is set to Native table.
In the Schema section, enter the schema definition.
For CSV and JSON files, you can check the Auto-detect option to enable schema auto-detect. Schema information is self-described in the source data for other supported file types.
You can also enter schema information manually by:
Clicking Edit as text and entering the table schema as a JSON array:
Using Add Field to manually input the schema.
In the Advanced options section, for Write preference, choose Write if empty, Append to table, or Overwrite table.
Click Create Table.
Classic UI
- On the Create Table page, in the Source Data section:
- For Location, select File upload, click Choose file, browse to the file, and click Open. Note that wildcards and comma-separated lists are not supported for local files.
- For File format, select (CSV), JSON (newline delimited), Avro, Parquet, or ORC.
- On the Create Table page, in the Destination Table section:
- For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're appending or overwriting.
- Verify that Table type is set to Native table.
In the Schema section, enter the schema definition. To update the schema, you can add new fields or change (relax) fields from
REQUIRED
toNULLABLE
.For JSON files, you can check the Auto-detect option to enable schema auto-detection.
You can also enter schema information manually by:
Clicking Edit as text and entering the table schema as a JSON array:
Using Add Field to manually input the schema:
In the Options section, for Write preference, choose Write if empty, Append to table, or Overwrite table.
Click Create Table.
CLI
Enter the bq load
command with the --replace
flag to overwrite the
table. Use the --noreplace
flag to append data to the table. If no flag is
specified, the default is to append data.
(Optional) Supply the --location
flag and
set the value to your location.
When appending or overwriting a table, you can use the
--schema_update_option
flag to update the schema of the destination table
with the schema of the new data. The following options can be used with the
--schema_update_option
flag:
ALLOW_FIELD_ADDITION
: Adds new fields to the schema; new fields cannot beREQUIRED
ALLOW_FIELD_RELAXATION
: Relaxes required fields to nullable; repeat this option to specify a list of values
bq --location=location load \ --[no]replace \ dataset.table \ path_to_source \ schema
where:
- location is your location. The
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file. - dataset is an existing dataset.
- table is the name of the table into which you're loading data.
- path_to_source is a the path to the local file. Note that wildcards and comma-separated lists are not supported for local files.
- schema is a valid schema. The schema can be a local JSON file,
or it can be typed inline as part of the command. You can also use the
--autodetect
flag instead of supplying a schema definition.
In addition, you can add flags for JSON options and CSV options that allow you to control how BigQuery parses your data.
Examples:
The following command loads data from mydata.json
and
overwrites a table named mytable
in mydataset
. The schema is defined
using schema auto-detection.
bq load \
--autodetect \
--replace \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
./mydata.json
The following command loads data from mydata.json
and
appends data to a table named mytable
in mydataset
. The schema is
defined using a JSON schema file — myschema.json
.
bq load \
--autodetect \
--noreplace \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
./mydata.json \
./myschema.json
The following command loads data from mydata.json
and
appends data to a table named mytable
in mydataset
. A local JSON schema
file named myschema.json
is used. The schema definition contains new
fields not present in the destination table.
bq load \
--noreplace \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
./mydata.json \
./myschema.json
The following command loads data from mydata.csv
and
appends data to a table named mytable
in mydataset
. A local JSON schema
file named myschema.json
is used. The schema definition changes (relaxes)
two REQUIRED
fields to NULLABLE
.
bq load \
--noreplace \
--schema_update_option=ALLOW_FIELD_RELAXATION \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
./mydata.csv \
./myschema.json
API uploads
The media upload feature allows the BigQuery API to store data in the cloud and make it available to the server. The kind of data that one might want to upload include photos, videos, PDF files, zip files, or any other type of data.
Upload options
The BigQuery API allows you to upload certain types of binary data, or media. The specific characteristics of the data you can upload are specified on the reference page for any method that supports media uploads:
- Maximum upload file size: The maximum amount of data you can store with this method.
- Accepted media MIME types: The types of binary data you can store using this method.
You can make upload requests in any of the following ways. Specify the method you are using with the uploadType
request parameter.
- Multipart upload:
uploadType=multipart
. For quick transfer of smaller files and metadata; transfers the file along with metadata that describes it, all in a single request. - Resumable upload:
uploadType=resumable
. For reliable transfer, especially important with larger files. With this method, you use a session initiating request, which optionally can include metadata. This is a good strategy to use for most applications, since it also works for smaller files at the cost of one additional HTTP request per upload.
When you upload media, you use a special URI. In fact, methods that support media uploads have two URI endpoints:
- The /upload URI, for the media. The format of the upload endpoint is the
standard resource URI with an “/upload” prefix. Use this URI when
transferring the media data itself. Example:
POST /upload/bigquery/v2/projects/projectId/jobs
. - The standard resource URI, for the metadata. If the resource contains any
data fields, those fields are used to store metadata describing the uploaded
file. You can use this URI when creating or updating metadata values. Example:
POST /bigquery/v2/projects/projectId/jobs
.
Multipart upload
If you have metadata that you want to send along with the data to upload, you can make a single multipart/related
request. This is a good choice if the data you are sending is small enough to upload again in its entirety if the connection fails.
To use multipart upload, make a POST
request to the method's /upload URI and add the query parameter
uploadType=multipart
, for example:
POST https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=multipart
The top-level HTTP headers to use when making a multipart upload request include:
Content-Type
. Set to multipart/related and include the boundary string you're using to identify the parts of the request.Content-Length
. Set to the total number of bytes in the request body. The media portion of the request must be less than the maximum file size specified for this method.
The body of the request is formatted as a multipart/related
content type [RFC2387] and contains exactly two parts. The parts are identified by a boundary string, and the final boundary string is followed by two hyphens.
Each part of the multipart request needs an additional Content-Type
header:
- Metadata part: Must come first, and
Content-Type
must match one of the accepted metadata formats. - Media part: Must come second, and
Content-Type
must match one the method's accepted media MIME types.
See the API reference for each method's list of accepted media MIME types and size limits for uploaded files.
Note: To create or update the metadata portion
only, without uploading the associated data, simply send a POST
or PUT
request to the standard resource endpoint:
https://www.googleapis.com/bigquery/v2/projects/projectId/jobs
Example: Multipart upload
The example below shows a multipart upload request for the BigQuery API.
POST /upload/bigquery/v2/projects/projectId/jobs?uploadType=multipart HTTP/1.1 Host: www.googleapis.com Authorization: Bearer your_auth_token Content-Type: multipart/related; boundary=foo_bar_baz Content-Length: number_of_bytes_in_entire_request_body --foo_bar_baz Content-Type: application/json; charset=UTF-8 { "configuration": { "load": { "sourceFormat": "NEWLINE_DELIMITED_JSON", "schema": { "fields": [ {"name": "f1", "type": "STRING"}, {"name": "f2", "type": "INTEGER"} ] }, "destinationTable": { "projectId": "projectId", "datasetId": "datasetId", "tableId": "tableId" } } } } --foo_bar_baz Content-Type: */* CSV, JSON, AVRO, PARQUET, or ORC data --foo_bar_baz--
If the request succeeds, the server returns the HTTP 200 OK
status code along with any metadata:
HTTP/1.1 200 Content-Type: application/json { "configuration": { "load": { "sourceFormat": "NEWLINE_DELIMITED_JSON", "schema": { "fields": [ {"name": "f1", "type": "STRING"}, {"name": "f2", "type": "INTEGER"} ] }, "destinationTable": { "projectId": "projectId", "datasetId": "datasetId", "tableId": "tableId" } } } }
Resumable upload
To upload data files more reliably, you can use the resumable upload protocol. This protocol allows you to resume an upload operation after a communication failure has interrupted the flow of data. It is especially useful if you are transferring large files and the likelihood of a network interruption or some other transmission failure is high, for example, when uploading from a mobile client app. It can also reduce your bandwidth usage in the event of network failures because you don't have to restart large file uploads from the beginning.
The steps for using resumable upload include:
- Start a resumable session. Make an initial request to the upload URI that includes the metadata, if any.
- Save the resumable session URI. Save the session URI returned in the response of the initial request; you'll use it for the remaining requests in this session.
- Upload the file. Send the media file to the resumable session URI.
In addition, apps that use resumable upload need to have code to resume an interrupted upload. If an upload is interrupted, find out how much data was successfully received, and then resume the upload starting from that point.
Note: An upload URI expires after one week.
Step 1: Start a resumable session
To initiate a resumable upload, make a POST
request to the method's /upload URI and add the query parameter
uploadType=resumable
, for example:
POST https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable
For this initiating request, the body is either empty or it contains the metadata only; you'll transfer the actual contents of the file you want to upload in subsequent requests.
Use the following HTTP headers with the initial request:X-Upload-Content-Type
. Set to the media MIME type of the upload data to be transferred in subsequent requests.X-Upload-Content-Length
. Set to the number of bytes of upload data to be transferred in subsequent requests. If the length is unknown at the time of this request, you can omit this header.- If providing metadata:
Content-Type
. Set according to the metadata's data type. Content-Length
. Set to the number of bytes provided in the body of this initial request. Not required if you are using chunked transfer encoding.
See the API reference for each method's list of accepted media MIME types and size limits for uploaded files.
Example: Resumable session initiation request
The following example shows how to initiate a resumable session for the BigQuery API.
POST /upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable HTTP/1.1 Host: www.googleapis.com Authorization: Bearer your_auth_token Content-Length: 38 Content-Type: application/json; charset=UTF-8 X-Upload-Content-Type: */* X-Upload-Content-Length: 2000000 { "configuration": { "load": { "sourceFormat": "NEWLINE_DELIMITED_JSON", "schema": { "fields": [ {"name": "f1", "type": "STRING"}, {"name": "f2", "type": "INTEGER"} ] }, "destinationTable": { "projectId": "projectId", "datasetId": "datasetId", "tableId": "tableId" } } } }
Note: For an initial resumable update request without metadata, leave the body of the request empty, and set the Content-Length
header to 0
.
The next section describes how to handle the response.
Step 2: Save the resumable session URI
If the session initiation request succeeds, the API server responds with a 200 OK
HTTP status code. In addition, it provides a Location
header that specifies your resumable session URI. The Location
header, shown in the example below, includes an upload_id
query parameter portion that gives the unique upload ID to use for this session.
Example: Resumable session initiation response
Here is the response to the request in Step 1:
HTTP/1.1 200 OK Location: https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable&upload_id=xa298sd_sdlkj2 Content-Length: 0
The value of the Location
header, as shown in the above example response, is the session URI you'll use as the HTTP endpoint for doing the actual file upload or querying the upload status.
Copy and save the session URI so you can use it for subsequent requests.
Step 3: Upload the file
To upload the file, send a PUT
request to the upload URI that you obtained in the previous step. The format of the upload request is:
PUT session_uri
The HTTP headers to use when making the resumable file upload requests includes Content-Length
. Set this to the number of bytes you are uploading in this request, which is generally the upload file size.
Example: Resumable file upload request
Here is a resumable request to upload the entire 2,000,000 byte CSV, JSON, AVRO, PARQUET, or ORC file for the current example.
PUT https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobs?uploadType=resumable&upload_id=xa298sd_sdlkj2 HTTP/1.1 Content-Length: 2000000 Content-Type: */* bytes 0-1999999
If the request succeeds, the server responds with an HTTP 201 Created
, along with any metadata associated with this resource. If the initial request of the resumable session had been a PUT
, to update an existing resource, the success response would be 200 OK
, along with any metadata associated with this resource.
If the upload request is interrupted or if you receive an HTTP 503 Service Unavailable
or any other 5xx
response from the server, follow the procedure outlined in resume an interrupted upload.
Uploading the file in chunks
With resumable uploads, you can break a file into chunks and send a series of requests to upload each chunk in sequence. This is not the preferred approach since there are performance costs associated with the additional requests, and it is generally not needed. However, you might need to use chunking to reduce the amount of data transferred in any single request. This is helpful when there is a fixed time limit for individual requests, as is true for certain classes of Google App Engine requests. It also lets you do things like providing upload progress indications for legacy browsers that don't have upload progress support by default.
Resume an interrupted upload
If an upload request is terminated before receiving a response or if you receive an HTTP 503 Service Unavailable
response from the server, then you need to resume the interrupted upload. To do this:
- Request status. Query the current status of the upload by issuing an empty
PUT
request to the upload URI. For this request, the HTTP headers should include aContent-Range
header indicating that the current position in the file is unknown. For example, set theContent-Range
to*/2000000
if your total file length is 2,000,000. If you don't know the full size of the file, set theContent-Range
to*/*
.Note: You can request the status between chunks, not just if the upload is interrupted. This is useful, for example, if you want to show upload progress indications for legacy browsers.
- Get number of bytes uploaded. Process the response from the status query. The server uses the
Range
header in its response to specify which bytes it has received so far. For example, aRange
header of0-299999
indicates that the first 300,000 bytes of the file have been received. - Upload remaining data. Finally, now that you know where to resume the request, send the remaining data or current chunk. Note that you need to treat the remaining data as a separate chunk in either case, so you need to send the
Content-Range
header when you resume the upload.
Example: Resuming an interrupted upload
1) Request the upload status.
The following request uses the Content-Range
header to indicate that the current position in the 2,000,000 byte file is unknown.
PUT {session_uri} HTTP/1.1 Content-Length: 0 Content-Range: bytes */2000000
2) Extract the number of bytes uploaded so far from the response.
The server's response uses the Range
header to indicate that it has received the first 43 bytes of the file so far. Use the upper value of the Range
header to determine where to start the resumed upload.
HTTP/1.1 308 Resume Incomplete Content-Length: 0 Range: 0-42
Note: It is possible that the status response could be 201 Created
or 200 OK
if the upload is complete. This could happen if the connection broke after all bytes were uploaded but before the client received a response from the server.
3) Resume the upload from the point where it left off.
The following request resumes the upload by sending the remaining bytes of the file, starting at byte 43.
PUT {session_uri} HTTP/1.1 Content-Length: 1999957 Content-Range: bytes 43-1999999/2000000 bytes 43-1999999
Best practices
When uploading media, it is helpful to be aware of some best practices related to error handling.
- Resume or retry uploads that fail due to connection interruptions or any
5xx
errors, including:500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
- Use an exponential backoff strategy if any
5xx
server error is returned when resuming or retrying upload requests. These errors can occur if a server is getting overloaded. Exponential backoff can help alleviate these kinds of problems during periods of high volume of requests or heavy network traffic. - Other kinds of requests should not be handled by exponential backoff but you can still retry a number of them. When retrying these requests, limit the number of times you retry them. For example your code could limit to ten retries or less before reporting an error.
- Handle
404 Not Found
and410 Gone
errors when doing resumable uploads by starting the entire upload over from the beginning.
Exponential backoff
Exponential backoff is a standard error handling strategy for network applications in which the client periodically retries a failed request over an increasing amount of time. If a high volume of requests or heavy network traffic causes the server to return errors, exponential backoff may be a good strategy for handling those errors. Conversely, it is not a relevant strategy for dealing with errors unrelated to network volume or response times, such as invalid authorization credentials or file not found errors.
Used properly, exponential backoff increases the efficiency of bandwidth usage, reduces the number of requests required to get a successful response, and maximizes the throughput of requests in concurrent environments.
The flow for implementing simple exponential backoff is as follows:
- Make a request to the API.
- Receive an
HTTP 503
response, which indicates you should retry the request. - Wait 1 second + random_number_milliseconds and retry the request.
- Receive an
HTTP 503
response, which indicates you should retry the request. - Wait 2 seconds + random_number_milliseconds, and retry the request.
- Receive an
HTTP 503
response, which indicates you should retry the request. - Wait 4 seconds + random_number_milliseconds, and retry the request.
- Receive an
HTTP 503
response, which indicates you should retry the request. - Wait 8 seconds + random_number_milliseconds, and retry the request.
- Receive an
HTTP 503
response, which indicates you should retry the request. - Wait 16 seconds + random_number_milliseconds, and retry the request.
- Stop. Report or log an error.
In the above flow, random_number_milliseconds is a random number of milliseconds less than or equal to 1000. This is necessary, since introducing a small random delay helps distribute the load more evenly and avoid the possibility of stampeding the server. The value of random_number_milliseconds must be redefined after each wait.
Note: The wait is always (2 ^ n) + random_number_milliseconds, where n is a monotonically increasing integer initially defined as 0. The integer n is incremented by 1 for each iteration (each request).
The algorithm is set to terminate when n is 5. This ceiling prevents clients from retrying infinitely, and results in a total delay of around 32 seconds before a request is deemed "an unrecoverable error." A larger maximum number of retries is fine, especially if a long upload is in progress; just be sure to cap the retry delay at something reasonable, say, less than one minute.