The Vision API supports the following image types:
- Animated GIF (first frame only)
Note that some of these image formats are "lossy" (for example, JPEG). Reducing file sizes for such lossy formats may result in a degradation of image quality, and hence, Vision API accuracy.
To enable accurate image detection within the Vision API, images should generally be a minimum of 640 x 480 pixels (about 300k pixels). Full details for different types of Vision API Feature requests are shown below:
|Vision API Feature||Recommended Size *||Notes|
|FACE_DETECTION||1600 x 1200||Distance between eyes is most important|
|LANDMARK_DETECTION||640 x 480|
|LOGO_DETECTION||640 x 480|
|LABEL_DETECTION||640 x 480|
|TEXT_DETECTION and DOCUMENT_TEXT_DETECTION||1024 x 768||OCR requires more resolution to detect characters|
|SAFE_SEARCH_DETECTION||640 x 480|
These recommended sizes differ based on the feature being detected. For example,
FACE_DETECTION requests generally requires larger image sizes because the
features being detected (faces) are smaller than the image itself.
LABEL_DETECTION requests, on the other hand, generally evaluate an entire
In practice, a standard size of 640 x 480 pixels works well in most cases; sizes larger than this may not gain much in accuracy, while greatly diminishing throughput. When at all possible, pre-process your images to reduce their size to these minimum standards.
Image files sent to the Vision API should not exceed 20MB. Reducing your file size can significantly improve throughput; however, be careful not to reduce image quality in the process. Note that the Vision API imposes a 10MB JSON request size limit; larger files should be hosted on Cloud Storage or on the web, rather than being passed as base64-encoded content in the JSON itself.