Features list

Vision API currently allows you to use the following features:

All feature types
Text detection	Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image. Images: Optimized for sparse areas of text within a larger image. Response: Returns both a list of words identifed with text, bounding boxes, and `textAnnotations`, as well as the structural hierarchy for the OCR detected text (`fullTextAnnotation`). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currently supported, mapped, and experimental languages. Feature enum value: `TEXT_DETECTION`.
Document text detection (dense text / handwriting)	Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text. Files: Optimized for document files (PDF/TIFF). Images: Optimized for *dense* areas of text in an image (images that are documents), and images that contain handwriting. Response: Returns the structural hierarchy for the OCR detected text (`fullTextAnnotation`). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currently supported, mapped, and experimental languages. Feature enum value: `DOCUMENT_TEXT_DETECTION`. Takes precedence when both `DOCUMENT_TEXT_DETECTION` and `TEXT_DETECTION` are requested. If you are detecting text in scanned documents, try Document AI for optical character recognition, structured form parsing, and entity extraction. You can use the Document AI Toolbox to convert output from the Document AI format to the Cloud Vision format.
Landmark detection ¹	Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark. Gives coordinates for the detected entity.
Logo detection ²	Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file.
Label detection ³	Provides generalized labels for an image. For each label returns a textual description, confidence score, and topicality rating.
Image properties ⁴	Returns dominant colors in an image. Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1].
Object localization ⁵	Provides general label and bounding box annotations for multiple objects recognized in a single image. For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object. Need customized object detection? With AutoML Vision Object Detection you can create a custom machine learning model for your specific image object detection use case.
Crop hint detection ⁶	Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request. You can provide up to 16 image ratio values (width:height) for a single image.
Web entities and pages ⁷	Provides a series of related Web content to an image. Returns the following information: Web entities: Inferred entities (labels/descriptions) from similar images on the Web. Full matching images: A list of URLs for fully matching images of any size on the Internet. Partial matching images: A list of URLs for images that share key-point features, such as a cropped version of the original image. Pages with matching images: A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above. Visually similar images: A list of URLs for images that share some features with the original image. Best guess label: A best guess as to the topic of the requested image inferred from similar images on the Internet.
Explicit content detection (SafeSearch)	Provides likelihood ratings for the following explicit content categories: `adult`, `spoof`, `medical`, `violence`, and `racy`. Likelihoods ratings are expressed as 6 different values: `UNKNOWN`, `VERY_UNLIKELY`, `UNLIKELY`, `POSSIBLE`, `LIKELY`, or `VERY_LIKELY`.
Face detection	Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values. Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present). Likelihoods ratings are expressed as 6 different values: `UNKNOWN`, `VERY_UNLIKELY`, `UNLIKELY`, `POSSIBLE`, `LIKELY`, or `VERY_LIKELY`. Specific individual Facial Recognition is not supported.

^{1.
Image credit:
Nikolay Vorobyev on
Unsplash (annotations added).
↩}

^{2.
Image credit:
Robert Scoble
(CC BY 2.0, annotation added).
↩}

^{3.
Image credit:
Alex Knight on Unsplash.
↩}

^{4.
Image credit:
Jeremy Bishop on Unsplash.
↩}

^{5.
Image credit:
Bogdan Dada on Unsplash
(annotations added).
↩}

^{6.
Image credit:

Yasmin Dangor on Unsplash (original and cropped image shown).
↩}

^{7.
Image credit:
Quinten de Graaf on
Unsplash.
↩}