Features list

Vision API currently allows you to use the following features:

All feature types

Text detection

Road sign image
  • Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image.
  • Images: Optimized for sparse areas of text within a larger image.
  • Response: Returns both a list of words identifed with text, bounding boxes, and textAnnotations, as well as the structural hierarchy for the OCR detected text (fullTextAnnotation).
    • Hierarchy of extracted text structure:
      • TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol.
      • Each structural component from Page on may further have their own properties such as detected languages, breaks, etc.
  • Languages supported: Works with currently supported, mapped, and experimental languages.
  • Feature enum value: TEXT_DETECTION.

Document text detection (dense text / handwriting)

Dense image with annotations
handwriting image
  • Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text.
  • Files: Optimized for document files (PDF/TIFF).
  • Images: Optimized for dense areas of text in an image (images that are documents), and images that contain handwriting.
  • Response: Returns the structural hierarchy for the OCR detected text (fullTextAnnotation).
    • Hierarchy of extracted text structure:
      • TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol.
      • Each structural component from Page on may further have their own properties such as detected languages, breaks, etc.
  • Languages supported: Works with currently supported, mapped, and experimental languages.
  • Feature enum value: DOCUMENT_TEXT_DETECTION.
    • Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are requested.

Landmark detection 1

St Basil's Cathedral image
  • Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark.
  • Gives coordinates for the detected entity.

Logo detection 2

annotated logo
  • Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file.

Label detection 3

Shanghai street image
  • Provides generalized labels for an image.
  • For each label returns a textual description, confidence score, and topicality rating.

Image properties 4

Bali image with properties
  • Returns dominant colors in an image.
  • Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1].

Object localization 5

image with bounding boxes
  • Provides general label and bounding box annotations for multiple objects recognized in a single image.
  • For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object.

Crop hint detection 6

image with cropped version
  • Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request.
  • You can provide up to 16 image ratio values (width:height) for a single image.

Web entities and pages 7

image with web entities table
  • Provides a series of related Web content to an image.
  • Returns the following information:
    • Web entities: Inferred entities (labels/descriptions) from similar images on the Web.
    • Full matching images: A list of URLs for fully matching images of any size on the Internet.
    • Partial matching images: A list of URLs for images that share key-point features, such as a cropped version of the original image.
    • Pages with matching images: A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above.
    • Visually similar images: A list of URLs for images that share some features with the original image.
    • Best guess label: A best guess as to the topic of the requested image inferred from similar images on the Internet.

Explicit content detection (SafeSearch)

  • Provides likelihood ratings for the following explicit content categories: adult, spoof, medical, violence, and racy.
  • Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, or VERY_LIKELY.

Face detection

sample image with face detection
  • Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values.
  • Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present).
  • Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, or VERY_LIKELY.
  • Specific individual Facial Recognition is not supported.

1. Image credit: Nikolay Vorobyev on Unsplash (annotations added).

2. Image credit: Robert Scoble (CC BY 2.0, annotation added).

3. Image credit: Alex Knight on Unsplash.

4. Image credit: Jeremy Bishop on Unsplash.

5. Image credit: Bogdan Dada on Unsplash (annotations added).

6. Image credit: Yasmin Dangor on Unsplash (original and cropped image shown).

7. Image credit: Quinten de Graaf on Unsplash.