Best Practices

This document contains recommendations on how to provide images to the Google Cloud Vision API. These guidelines are designed for greater efficiency and accuracy, as well as reasonable response times from the service. Use of the Vision API works best when images are within the parameters described within this document. As a result, we recommend you inspect your images beforehand, and preprocess them to the following parameters, where possible, when providing them to the Vision API.

For information on Vision API quotas on image size, bandwidth, and number of requests, consult the Usage Limits documentation.

Image Formatting

Digital images come in a wide variety of different shapes and sizes. For various types of image recognition, the following image characteristics are particularly important:

  • Image Types
  • Image Data
  • Image Sizing
  • File Sizes

The following sections will discuss each of these image characteristics.

Image Types

Currently, the Google Cloud Vision API supports the following image types:

  • JPEG
  • PNG8
  • PNG24
  • GIF
  • Animated GIF (first frame only)
  • BMP
  • WEBP
  • RAW
  • ICO

Note that some of these image formats are "lossy" (for example, JPEG). Reducing file sizes for such lossy formats may result in a degradation of image quality, and hence, Vision API accuracy.

Image Data

Images sent to the Google Cloud Vision API can be supplied in two ways:

  • Using Google Cloud Storage URIs of the form gs://bucketname/path/to/image_filename
  • As image data sent within the JSON request. Because image data must be supplied as ASCII text, all image data should be escaped using base64 encoding.

Google Cloud Storage Image Files

A simple way to provide image data is through use of a Google Cloud Storage URI, consisting of the bucket and object name:

        "source": {
      "features": [

Only one image can be sent per source field. You may batch images up into several request objects, but you must ensure that your overall file size is not too large (see File Sizes below for more information).

Note that an object in Google Cloud Storage is a single entity; permissions affect only that object. "Directory permissions" do not exist (though default bucket permissions do exist). Make sure the code which performs your request has access to that image.

Base64 Encoding

How you perform base64 encoding of image files depends on the way in which you send your requests.

The base64 command-line tool can encode a binary image into ASCII text data. Most development environments contain a native "base64" command-line tool. To encode an image file:


base64 input.jpg > output.jpg


base64 -i input.jpg -o output.jpg

You could then use this output image data natively within the JSON request:

        "content": "base64-encoded data"
      "features": [

Each programming language has its own way of base64 encoding image files:


In Python, you can base64 encode image files as follows:

# Import the base64 encoding library.
import base64

# Pass the image data to an encoding function.
def encode_image(image):
  image_content =
  return base64.b64encode(image_content)


In Node.js, you can base64 encode image files as follows:

// Read the file into memory.
var fs = require('fs');
var imageFile = fs.readFileSync('/path/to/file');

// Covert the image data to a Buffer and base64 encode it.
var encoded = new Buffer(imageFile).toString('base64');


In Java, the Google APIs client library provides an encodeContent(byte[] content) method, which will base64 encode image bytes in the request.

// Import the Google Cloud Vision API client library.

// Encode the image.
Image image = new Image().encodeContent(data));

Image Sizing

To enable accurate image detection within the Google Cloud Vision API, images should generally be a minimum of 640 x 480 pixels (about 300k pixels). Full details for different types of Vision API Feature requests are shown below:

Vision API Feature Recommended Size * Notes
FACE_DETECTION 1600 x 1200 Distance between eyes is most important
TEXT_DETECTION and DOCUMENT_TEXT_DETECTION 1024 x 768 OCR requires more resolution to detect characters

* Note: generally, the Vision API requires images to be a sufficient size so that important features within the request can be easily distinguished. Sizes smaller or larger than these recommended sizes may work. However, smaller sizes may result in lower accuracy, while larger sizes may increase processing time and bandwidth usage without providing comparable benefits in accuracy.

These recommended sizes differ based on the feature being detected. For example, FACE_DETECTION requests generally requires larger image sizes because the features being detected (faces) are smaller than the image itself. LABEL_DETECTION requests, on the other hand, generally evaluate an entire image.

In practice, a standard size of 640 x 480 pixels works well in most cases; sizes larger than this may not gain much in accuracy, while greatly diminishing throughput. When at all possible, pre-process your images to reduce their size to these minimum standards.

File Sizes

Image files sent to the Google Cloud Vision API should not exceed 4 MB. Reducing your file size can significantly improve throughput; however, be careful not to reduce image quality in the process. If you are batching images and sending them in one request, also note that the Vision API imposes an 8 MB per request limit.

Most photos taken with digital cameras currently default to "raw" file sizes based on megapixel constraints, resulting in images often in excess of 4 MB, if those images are not compressed. Be sure that you appropriately preprocess such images to reduce them to a more reasonable image size, while also downsampling them to a reasonable file size.

Send feedback about...

Google Cloud Vision API Documentation