Package google.cloud.vision.v1

Index

ImageAnnotator

Service that performs Google Cloud Vision API detection tasks over client images, such as face, landmark, logo, label, and text detection. The ImageAnnotator service returns detected entities from the images.

BatchAnnotateImages

rpc BatchAnnotateImages(BatchAnnotateImagesRequest) returns (BatchAnnotateImagesResponse)

Run image detection and annotation for a batch of images.

Authorization

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Auth Guide.

AnnotateImageRequest

Request for performing Google Cloud Vision API tasks over a user-provided image, with user-requested features.

Fields
image

Image

The image to be processed.

features[]

Feature

Requested features.

image_context

ImageContext

Additional context that may accompany the image.

AnnotateImageResponse

Response to an image annotation request.

Fields
face_annotations[]

FaceAnnotation

If present, face detection has completed successfully.

landmark_annotations[]

EntityAnnotation

If present, landmark detection has completed successfully.

logo_annotations[]

EntityAnnotation

If present, logo detection has completed successfully.

label_annotations[]

EntityAnnotation

If present, label detection has completed successfully.

text_annotations[]

EntityAnnotation

If present, text (OCR) detection has completed successfully.

full_text_annotation

TextAnnotation

If present, text (OCR) detection or document (OCR) text detection has completed successfully. This annotation provides the structural hierarchy for the OCR detected text.

safe_search_annotation

SafeSearchAnnotation

If present, safe-search annotation has completed successfully.

image_properties_annotation

ImageProperties

If present, image properties were extracted successfully.

crop_hints_annotation

CropHintsAnnotation

If present, crop hints have completed successfully.

web_detection

WebDetection

If present, web detection has completed successfully.

error

Status

If set, represents the error message for the operation. Note that filled-in image annotations are guaranteed to be correct, even when error is set.

BatchAnnotateImagesRequest

Multiple image annotation requests are batched into a single service call.

Fields
requests[]

AnnotateImageRequest

Individual image annotation requests for this batch.

BatchAnnotateImagesResponse

Response to a batch image annotation request.

Fields
responses[]

AnnotateImageResponse

Individual responses to image annotation requests within the batch.

Block

Logical element on the page.

Fields
property

TextProperty

Additional information detected for the block.

bounding_box

BoundingPoly

The bounding box for the block. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

paragraphs[]

Paragraph

List of paragraphs in this block (if this blocks is of type text).

block_type

BlockType

Detected block type (text, image etc) for this block.

BlockType

Type of a block (text, image etc) as identified by OCR.

Enums
UNKNOWN Unknown block type.
TEXT Regular text block.
TABLE Table block.
PICTURE Image block.
RULER Horizontal/vertical line box.
BARCODE Barcode block.

BoundingPoly

A bounding polygon for the detected image annotation.

Fields
vertices[]

Vertex

The bounding polygon vertices.

ColorInfo

Color information consists of RGB channels, score, and the fraction of the image that the color occupies in the image.

Fields
color

Color

RGB components of the color.

score

float

Image-specific score for this color. Value in range [0, 1].

pixel_fraction

float

The fraction of pixels the color occupies in the image. Value in range [0, 1].

CropHint

Single crop hint that is used to generate a new crop when serving an image.

Fields
bounding_poly

BoundingPoly

The bounding polygon for the crop region. The coordinates of the bounding box are in the original image's scale, as returned in ImageParams.

confidence

float

Confidence of this being a salient region. Range [0, 1].

importance_fraction

float

Fraction of importance of this salient region with respect to the original image.

CropHintsAnnotation

Set of crop hints that are used to generate new crops when serving images.

Fields
crop_hints[]

CropHint

Crop hint results.

CropHintsParams

Parameters for crop hints annotation request.

Fields
aspect_ratios[]

float

Aspect ratios in floats, representing the ratio of the width to the height of the image. For example, if the desired aspect ratio is 4/3, the corresponding float value should be 1.33333. If not specified, the best possible crop is returned. The number of provided aspect ratios is limited to a maximum of 16; any aspect ratios provided after the 16th are ignored.

DominantColorsAnnotation

Set of dominant colors and their corresponding scores.

Fields
colors[]

ColorInfo

RGB color values with their score and pixel fraction.

EntityAnnotation

Set of detected entity features.

Fields
mid

string

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API.

locale

string

The language code for the locale in which the entity textual description is expressed.

description

string

Entity textual description, expressed in its locale language.

score

float

Overall score of the result. Range [0, 1].

confidence

float

The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1].

topicality

float

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1].

bounding_poly

BoundingPoly

Image region to which this entity belongs. Currently not produced for LABEL_DETECTION features. For TEXT_DETECTION (OCR), boundingPolys are produced for the entire text detected in an image region, followed by boundingPolys for each word within the detected text.

locations[]

LocationInfo

The location information for the detected entity. Multiple LocationInfo elements can be present because one location may indicate the location of the scene in the image, and another location may indicate the location of the place where the image was taken. Location information is usually present for landmarks.

properties[]

Property

Some entities may have optional user-supplied Property (name/value) fields, such a score or string that qualifies the entity.

FaceAnnotation

A face annotation object contains the results of face detection.

Fields
bounding_poly

BoundingPoly

The bounding polygon around the face. The coordinates of the bounding box are in the original image's scale, as returned in ImageParams. The bounding box is computed to "frame" the face in accordance with human expectations. It is based on the landmarker results. Note that one or more x and/or y coordinates may not be generated in the BoundingPoly (the polygon will be unbounded) if only a partial face appears in the image to be annotated.

fd_bounding_poly

BoundingPoly

The fd_bounding_poly bounding polygon is tighter than the boundingPoly, and encloses only the skin part of the face. Typically, it is used to eliminate the face from any image analysis that detects the "amount of skin" visible in an image. It is not based on the landmarker results, only on the initial face detection, hence the

fd

(face detection) prefix.

landmarks[]

Landmark

Detected face landmarks.

roll_angle

float

Roll angle, which indicates the amount of clockwise/anti-clockwise rotation of the face relative to the image vertical about the axis perpendicular to the face. Range [-180,180].

pan_angle

float

Yaw angle, which indicates the leftward/rightward angle that the face is pointing relative to the vertical plane perpendicular to the image. Range [-180,180].

tilt_angle

float

Pitch angle, which indicates the upwards/downwards angle that the face is pointing relative to the image's horizontal plane. Range [-180,180].

detection_confidence

float

Detection confidence. Range [0, 1].

landmarking_confidence

float

Face landmarking confidence. Range [0, 1].

joy_likelihood

Likelihood

Joy likelihood.

sorrow_likelihood

Likelihood

Sorrow likelihood.

anger_likelihood

Likelihood

Anger likelihood.

surprise_likelihood

Likelihood

Surprise likelihood.

under_exposed_likelihood

Likelihood

Under-exposed likelihood.

blurred_likelihood

Likelihood

Blurred likelihood.

headwear_likelihood

Likelihood

Headwear likelihood.

Landmark

A face-specific landmark (for example, a face feature). Landmark positions may fall outside the bounds of the image if the face is near one or more edges of the image. Therefore it is NOT guaranteed that 0 <= x < width or 0 <= y < height.

Fields
type

Type

Face landmark type.

position

Position

Face landmark position.

Type

Face landmark (feature) type. Left and right are defined from the vantage of the viewer of the image without considering mirror projections typical of photos. So, LEFT_EYE, typically, is the person's right eye.

Enums
UNKNOWN_LANDMARK Unknown face landmark detected. Should not be filled.
LEFT_EYE Left eye.
RIGHT_EYE Right eye.
LEFT_OF_LEFT_EYEBROW Left of left eyebrow.
RIGHT_OF_LEFT_EYEBROW Right of left eyebrow.
LEFT_OF_RIGHT_EYEBROW Left of right eyebrow.
RIGHT_OF_RIGHT_EYEBROW Right of right eyebrow.
MIDPOINT_BETWEEN_EYES Midpoint between eyes.
NOSE_TIP Nose tip.
UPPER_LIP Upper lip.
LOWER_LIP Lower lip.
MOUTH_LEFT Mouth left.
MOUTH_RIGHT Mouth right.
MOUTH_CENTER Mouth center.
NOSE_BOTTOM_RIGHT Nose, bottom right.
NOSE_BOTTOM_LEFT Nose, bottom left.
NOSE_BOTTOM_CENTER Nose, bottom center.
LEFT_EYE_TOP_BOUNDARY Left eye, top boundary.
LEFT_EYE_RIGHT_CORNER Left eye, right corner.
LEFT_EYE_BOTTOM_BOUNDARY Left eye, bottom boundary.
LEFT_EYE_LEFT_CORNER Left eye, left corner.
RIGHT_EYE_TOP_BOUNDARY Right eye, top boundary.
RIGHT_EYE_RIGHT_CORNER Right eye, right corner.
RIGHT_EYE_BOTTOM_BOUNDARY Right eye, bottom boundary.
RIGHT_EYE_LEFT_CORNER Right eye, left corner.
LEFT_EYEBROW_UPPER_MIDPOINT Left eyebrow, upper midpoint.
RIGHT_EYEBROW_UPPER_MIDPOINT Right eyebrow, upper midpoint.
LEFT_EAR_TRAGION Left ear tragion.
RIGHT_EAR_TRAGION Right ear tragion.
LEFT_EYE_PUPIL Left eye pupil.
RIGHT_EYE_PUPIL Right eye pupil.
FOREHEAD_GLABELLA Forehead glabella.
CHIN_GNATHION Chin gnathion.
CHIN_LEFT_GONION Chin left gonion.
CHIN_RIGHT_GONION Chin right gonion.

Feature

Users describe the type of Google Cloud Vision API tasks to perform over images by using *Feature*s. Each Feature indicates a type of image detection task to perform. Features encode the Cloud Vision API vertical to operate on and the number of top-scoring results to return.

Fields
type

Type

The feature type.

max_results

int32

Maximum number of results of this type.

Type

Type of image feature.

Enums
TYPE_UNSPECIFIED Unspecified feature type.
FACE_DETECTION Run face detection.
LANDMARK_DETECTION Run landmark detection.
LOGO_DETECTION Run logo detection.
LABEL_DETECTION Run label detection.
TEXT_DETECTION Run OCR.
DOCUMENT_TEXT_DETECTION Run dense text document OCR. Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are present.
SAFE_SEARCH_DETECTION Run computer vision models to compute image safe-search properties.
IMAGE_PROPERTIES Compute a set of image properties, such as the image's dominant colors.
CROP_HINTS Run crop hints.
WEB_DETECTION Run web detection.

Image

Client image to perform Google Cloud Vision API tasks over.

Fields
content

bytes

Image content, represented as a stream of bytes. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

source

ImageSource

Google Cloud Storage image location. If both content and source are provided for an image, content takes precedence and is used to perform the image annotation request.

ImageContext

Image context and/or feature-specific parameters.

Fields
lat_long_rect

LatLongRect

lat/long rectangle that specifies the location of the image.

language_hints[]

string

List of languages to use for TEXT_DETECTION. In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting language_hints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

crop_hints_params

CropHintsParams

Parameters for crop hints annotation request.

ImageProperties

Stores image properties, such as dominant colors.

Fields
dominant_colors

DominantColorsAnnotation

If present, dominant colors completed successfully.

ImageSource

External image source (Google Cloud Storage image location).

Fields
gcs_image_uri

string

NOTE: For new code image_uri below is preferred. Google Cloud Storage image URI, which must be in the following form: gs://bucket_name/object_name (for details, see Google Cloud Storage Request URIs). NOTE: Cloud Storage object versioning is not supported.

image_uri

string

Image URI which supports: 1) Google Cloud Storage image URI, which must be in the following form: gs://bucket_name/object_name (for details, see Google Cloud Storage Request URIs). NOTE: Cloud Storage object versioning is not supported. 2) Publicly accessible image HTTP/HTTPS URL. This is preferred over the legacy gcs_image_uri above. When both gcs_image_uri and image_uri are specified, image_uri takes precedence.

LatLongRect

Rectangle determined by min and max LatLng pairs.

Fields
min_lat_lng

LatLng

Min lat/long pair.

max_lat_lng

LatLng

Max lat/long pair.

Likelihood

A bucketized representation of likelihood, which is intended to give clients highly stable results across model upgrades.

Enums
UNKNOWN Unknown likelihood.
VERY_UNLIKELY It is very unlikely that the image belongs to the specified vertical.
UNLIKELY It is unlikely that the image belongs to the specified vertical.
POSSIBLE It is possible that the image belongs to the specified vertical.
LIKELY It is likely that the image belongs to the specified vertical.
VERY_LIKELY It is very likely that the image belongs to the specified vertical.

LocationInfo

Detected entity location information.

Fields
lat_lng

LatLng

lat/long location coordinates.

Page

Detected page from OCR.

Fields
property

TextProperty

Additional information detected on the page.

width

int32

Page width in pixels.

height

int32

Page height in pixels.

blocks[]

Block

List of blocks of text, images etc on this page.

Paragraph

Structural unit of text representing a number of words in certain order.

Fields
property

TextProperty

Additional information detected for the paragraph.

bounding_box

BoundingPoly

The bounding box for the paragraph. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

words[]

Word

List of words in this paragraph.

Position

A 3D position in the image, used primarily for Face detection landmarks. A valid Position must have both x and y coordinates. The position coordinates are in the same scale as the original image.

Fields
x

float

X coordinate.

y

float

Y coordinate.

z

float

Z coordinate (or depth).

Property

A Property consists of a user-supplied name/value pair.

Fields
name

string

Name of the property.

value

string

Value of the property.

uint64_value

uint64

Value of numeric properties.

SafeSearchAnnotation

Set of features pertaining to the image, computed by computer vision methods over safe-search verticals (for example, adult, spoof, medical, violence).

Fields
adult

Likelihood

Represents the adult content likelihood for the image.

spoof

Likelihood

Spoof likelihood. The likelihood that an modification was made to the image's canonical version to make it appear funny or offensive.

medical

Likelihood

Likelihood that this is a medical image.

violence

Likelihood

Violence likelihood.

Symbol

A single symbol representation.

Fields
property

TextProperty

Additional information detected for the symbol.

bounding_box

BoundingPoly

The bounding box for the symbol. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

text

string

The actual UTF-8 representation of the symbol.

TextAnnotation

TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol Each structural component, starting from Page, may further have their own properties. Properties describe detected languages, breaks etc.. Please refer to the google.cloud.vision.v1.TextAnnotation.TextProperty message definition below for more detail.

Fields
pages[]

Page

List of pages detected by OCR.

text

string

UTF-8 text detected on the pages.

DetectedBreak

Detected start or end of a structural component.

Fields
type

BreakType

Detected break type.

is_prefix

bool

True if break prepends the element.

BreakType

Enum to denote the type of break found. New line, space etc.

Enums
UNKNOWN Unknown break label type.
SPACE Regular space.
SURE_SPACE Sure space (very wide).
EOL_SURE_SPACE Line-wrapping break.
HYPHEN End-line hyphen that is not present in text; does not co-occur with SPACE, LEADER_SPACE, or LINE_BREAK.
LINE_BREAK Line break that ends a paragraph.

DetectedLanguage

Detected language for a structural component.

Fields
language_code

string

The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

confidence

float

Confidence of detected language. Range [0, 1].

TextProperty

Additional information detected on the structural component.

Fields
detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

detected_break

DetectedBreak

Detected start or end of a text segment.

Vertex

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

Fields
x

int32

X coordinate.

y

int32

Y coordinate.

WebDetection

Relevant information for the image from the Internet.

Fields
web_entities[]

WebEntity

Deduced entities from similar images on the Internet.

full_matching_images[]

WebImage

Fully matching images from the Internet. Can include resized copies of the query image.

partial_matching_images[]

WebImage

Partial matching images from the Internet. Those images are similar enough to share some key-point features. For example an original image will likely have partial matching for its crops.

pages_with_matching_images[]

WebPage

Web pages containing the matching images from the Internet.

visually_similar_images[]

WebImage

The visually similar image results.

WebEntity

Entity deduced from similar images on the Internet.

Fields
entity_id

string

Opaque entity ID.

score

float

Overall relevancy score for the entity. Not normalized and not comparable across different image queries.

description

string

Canonical description of the entity, in English.

WebImage

Metadata for online images.

Fields
url

string

The result image URL.

score

float

Overall relevancy score for the image. Not normalized and not comparable across different image queries.

WebPage

Metadata for web pages.

Fields
url

string

The result web page URL.

score

float

Overall relevancy score for the web page. Not normalized and not comparable across different image queries.

Word

A word representation.

Fields
property

TextProperty

Additional information detected for the word.

bounding_box

BoundingPoly

The bounding box for the word. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

symbols[]

Symbol

List of symbols in the word. The order of the symbols follows the natural reading order.

Send feedback about...

Google Cloud Vision API