Features list

Cloud Vision API currently allows you to use the following features:

All feature types

Face detection 1

image with 2 faces with and without annotations
  • Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values.
  • Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present).
  • Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, or VERY_LIKELY.

Landmark detection 2

St Basil's Cathedral image
  • Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark.
  • Gives coordinates for the detected entity.

Logo detection 3

annotated logo
  • Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file.

Label detection 4

Shanghai street image
  • Provides generalized labels for an image.
  • For each label returns a textual description, confidence score, and topicality rating.

Text detection

Road sign image
  • Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image.
  • Images: Optimized for sparse areas of text within a larger image.
  • Response: Returns both a list of words identifed with text, bounding boxes, and confidence scores (textAnnotations), as well as the structural hierarchy for the OCR detected text (fullTextAnnotation).
    • Hierarchy of extracted text structure:
      • TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol.
      • Each structural component from Page on may further have their own properties such as detected languages, breaks, etc.
  • Languages supported: Works with currently supported, mapped, and experimental languages.
  • Feature enum value: TEXT_DETECTION.

Document text detection (dense text / handwriting)

Dense image with annotations
handwriting image
  • Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text.
  • Files: Optimized for document files (PDF/TIFF).
  • Images: Optimized for dense areas of text in an image (images that are documents), and images that contain handwriting.
  • Response: Returns the structural hierarchy for the OCR detected text (fullTextAnnotation).
    • Hierarchy of extracted text structure:
      • TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol.
      • Each structural component from Page on may further have their own properties such as detected languages, breaks, etc.
  • Languages supported: Works with currently supported, mapped, and experimental languages.
  • Feature enum value: DOCUMENT_TEXT_DETECTION.
    • Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are requested.

Image properties 5

Bali image with properties
  • Returns dominant colors in an image.
  • Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1].

Object localization 6

image with bounding boxes
  • Provides general label and bounding box annotations for multiple objects recognized in a single image.
  • For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object.

Crop hint detection 7

image with cropped version
  • Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request.
  • You can provide up to 16 image ratio values (width:height) for a single image.

Web entities and pages 8

image with web entities table
  • Provides a series of related Web content to an image.
  • Returns the following information:
    • Web entities: Inferred entities (labels/descriptions) from similar images on the Web.
    • Full matching images: A list of URLs for fully matching images of any size on the Internet.
    • Partial matching images: A list of URLs for images that share key-point features, such as a cropped version of the original image.
    • Pages with matching images: A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above.
    • Visually similar images: A list of URLs for images that share some features with the original image.
    • Best guess label: A best guess as to the topic of the requested image inferred from similar images on the Internet.

Explicit content detection (Safe Search)

  • Provides likelihood ratings for the following explicit content catgories: adult, spoof, medical, violence, and racy.
  • Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, or VERY_LIKELY.

1. Image credit: Himanshu Singh Gurjar on Unsplash (annotations added).

2. Image credit: Nikolay Vorobyev on Unsplash (annotations added).

3. Image credit: Robert Scoble (CC BY 2.0, annotation added).

4. Image credit: Alex Knight on Unsplash.

5. Image credit: Jeremy Bishop on Unsplash.

6. Image credit: Bogdan Dada on Unsplash (annotations added).

7. Image credit: Yasmin Dangor on Unsplash (original and cropped image shown).

8. Image credit: Quinten de Graaf on Unsplash.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Vision API Documentation
Need help? Visit our support page.