Method: images.annotate

Run image detection and annotation for a batch of images.

HTTP request

POST https://vision.googleapis.com/v1/images:annotate

The URL uses Google API HTTP annotation syntax.

Request body

The request body contains data with the following structure:

JSON representation
{
  "requests": [
    {
      object(AnnotateImageRequest)
    }
  ],
}
Fields
requests[]

object(AnnotateImageRequest)

Individual image annotation requests for this batch.

Response body

If successful, the response body contains data with the following structure:

Response to a batch image annotation request.

JSON representation
{
  "responses": [
    {
      object(AnnotateImageResponse)
    }
  ],
}
Fields
responses[]

object(AnnotateImageResponse)

Individual responses to image annotation requests within the batch.

Authorization

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Auth Guide.

AnnotateImageRequest

Request for performing Google Cloud Vision API tasks over a user-provided image, with user-requested features.

JSON representation
{
  "image": {
    object(Image)
  },
  "features": [
    {
      object(Feature)
    }
  ],
  "imageContext": {
    object(ImageContext)
  },
}
Fields
image

object(Image)

The image to be processed.

features[]

object(Feature)

Requested features.

imageContext

object(ImageContext)

Additional context that may accompany the image.

Image

Client image to perform Google Cloud Vision API tasks over.

JSON representation
{
  "content": string,
  "source": {
    object(ImageSource)
  },
}
Fields
content

string (bytes format)

Image content, represented as a stream of bytes. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

source

object(ImageSource)

Google Cloud Storage image location. If both content and source are provided for an image, content takes precedence and is used to perform the image annotation request.

ImageSource

External image source (Google Cloud Storage image location).

JSON representation
{
  "gcsImageUri": string,
  "imageUri": string,
}
Fields
gcsImageUri

string

NOTE: For new code imageUri below is preferred. Google Cloud Storage image URI, which must be in the following form: gs://bucket_name/object_name (for details, see Google Cloud Storage Request URIs). NOTE: Cloud Storage object versioning is not supported.

imageUri

string

Image URI which supports: 1) Google Cloud Storage image URI, which must be in the following form: gs://bucket_name/object_name (for details, see Google Cloud Storage Request URIs). NOTE: Cloud Storage object versioning is not supported. 2) Publicly accessible image HTTP/HTTPS URL. This is preferred over the legacy gcsImageUri above. When both gcsImageUri and imageUri are specified, imageUri takes precedence.

Feature

Users describe the type of Google Cloud Vision API tasks to perform over images by using *Feature*s. Each Feature indicates a type of image detection task to perform. Features encode the Cloud Vision API vertical to operate on and the number of top-scoring results to return.

JSON representation
{
  "type": enum(Type),
  "maxResults": number,
}
Fields
type

enum(Type)

The feature type.

maxResults

number

Maximum number of results of this type.

Type

Type of image feature.

Enums
TYPE_UNSPECIFIED Unspecified feature type.
FACE_DETECTION Run face detection.
LANDMARK_DETECTION Run landmark detection.
LOGO_DETECTION Run logo detection.
LABEL_DETECTION Run label detection.
TEXT_DETECTION Run OCR.
DOCUMENT_TEXT_DETECTION Run dense text document OCR. Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are present.
SAFE_SEARCH_DETECTION Run computer vision models to compute image safe-search properties.
IMAGE_PROPERTIES Compute a set of image properties, such as the image's dominant colors.
CROP_HINTS Run crop hints.
WEB_DETECTION Run web detection.

ImageContext

Image context and/or feature-specific parameters.

JSON representation
{
  "latLongRect": {
    object(LatLongRect)
  },
  "languageHints": [
    string
  ],
  "cropHintsParams": {
    object(CropHintsParams)
  },
}
Fields
latLongRect

object(LatLongRect)

lat/long rectangle that specifies the location of the image.

languageHints[]

string

List of languages to use for TEXT_DETECTION. In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

cropHintsParams

object(CropHintsParams)

Parameters for crop hints annotation request.

LatLongRect

Rectangle determined by min and max LatLng pairs.

JSON representation
{
  "minLatLng": {
    object(LatLng)
  },
  "maxLatLng": {
    object(LatLng)
  },
}
Fields
minLatLng

object(LatLng)

Min lat/long pair.

maxLatLng

object(LatLng)

Max lat/long pair.

LatLng

An object representing a latitude/longitude pair. This is expressed as a pair of doubles representing degrees latitude and degrees longitude. Unless specified otherwise, this must conform to the WGS84 standard. Values must be within normalized ranges.

Example of normalization code in Python:

def NormalizeLongitude(longitude):
  """Wraps decimal degrees longitude to [-180.0, 180.0]."""
  q, r = divmod(longitude, 360.0)
  if r > 180.0 or (r == 180.0 and q <= -1.0):
    return r - 360.0
  return r

def NormalizeLatLng(latitude, longitude):
  """Wraps decimal degrees latitude and longitude to
  [-90.0, 90.0] and [-180.0, 180.0], respectively."""
  r = latitude % 360.0
  if r <= 90.0:
    return r, NormalizeLongitude(longitude)
  elif r >= 270.0:
    return r - 360, NormalizeLongitude(longitude)
  else:
    return 180 - r, NormalizeLongitude(longitude + 180.0)

assert 180.0 == NormalizeLongitude(180.0)
assert -180.0 == NormalizeLongitude(-180.0)
assert -179.0 == NormalizeLongitude(181.0)
assert (0.0, 0.0) == NormalizeLatLng(360.0, 0.0)
assert (0.0, 0.0) == NormalizeLatLng(-360.0, 0.0)
assert (85.0, 180.0) == NormalizeLatLng(95.0, 0.0)
assert (-85.0, -170.0) == NormalizeLatLng(-95.0, 10.0)
assert (90.0, 10.0) == NormalizeLatLng(90.0, 10.0)
assert (-90.0, -10.0) == NormalizeLatLng(-90.0, -10.0)
assert (0.0, -170.0) == NormalizeLatLng(-180.0, 10.0)
assert (0.0, -170.0) == NormalizeLatLng(180.0, 10.0)
assert (-90.0, 10.0) == NormalizeLatLng(270.0, 10.0)
assert (90.0, 10.0) == NormalizeLatLng(-270.0, 10.0)

The code in logs/storage/validator/logs_validator_traits.cc treats this type as if it were annotated as ST_LOCATION.

JSON representation
{
  "latitude": number,
  "longitude": number,
}
Fields
latitude

number

The latitude in degrees. It must be in the range [-90.0, +90.0].

longitude

number

The longitude in degrees. It must be in the range [-180.0, +180.0].

CropHintsParams

Parameters for crop hints annotation request.

JSON representation
{
  "aspectRatios": [
    number
  ],
}
Fields
aspectRatios[]

number

Aspect ratios in floats, representing the ratio of the width to the height of the image. For example, if the desired aspect ratio is 4/3, the corresponding float value should be 1.33333. If not specified, the best possible crop is returned. The number of provided aspect ratios is limited to a maximum of 16; any aspect ratios provided after the 16th are ignored.

AnnotateImageResponse

Response to an image annotation request.

JSON representation
{
  "faceAnnotations": [
    {
      object(FaceAnnotation)
    }
  ],
  "landmarkAnnotations": [
    {
      object(EntityAnnotation)
    }
  ],
  "logoAnnotations": [
    {
      object(EntityAnnotation)
    }
  ],
  "labelAnnotations": [
    {
      object(EntityAnnotation)
    }
  ],
  "textAnnotations": [
    {
      object(EntityAnnotation)
    }
  ],
  "fullTextAnnotation": {
    object(TextAnnotation)
  },
  "safeSearchAnnotation": {
    object(SafeSearchAnnotation)
  },
  "imagePropertiesAnnotation": {
    object(ImageProperties)
  },
  "cropHintsAnnotation": {
    object(CropHintsAnnotation)
  },
  "webDetection": {
    object(WebDetection)
  },
  "error": {
    object(Status)
  },
}
Fields
faceAnnotations[]

object(FaceAnnotation)

If present, face detection has completed successfully.

landmarkAnnotations[]

object(EntityAnnotation)

If present, landmark detection has completed successfully.

logoAnnotations[]

object(EntityAnnotation)

If present, logo detection has completed successfully.

labelAnnotations[]

object(EntityAnnotation)

If present, label detection has completed successfully.

textAnnotations[]

object(EntityAnnotation)

If present, text (OCR) detection has completed successfully.

fullTextAnnotation

object(TextAnnotation)

If present, text (OCR) detection or document (OCR) text detection has completed successfully. This annotation provides the structural hierarchy for the OCR detected text.

safeSearchAnnotation

object(SafeSearchAnnotation)

If present, safe-search annotation has completed successfully.

imagePropertiesAnnotation

object(ImageProperties)

If present, image properties were extracted successfully.

cropHintsAnnotation

object(CropHintsAnnotation)

If present, crop hints have completed successfully.

webDetection

object(WebDetection)

If present, web detection has completed successfully.

error

object(Status)

If set, represents the error message for the operation. Note that filled-in image annotations are guaranteed to be correct, even when error is set.

FaceAnnotation

A face annotation object contains the results of face detection.

JSON representation
{
  "boundingPoly": {
    object(BoundingPoly)
  },
  "fdBoundingPoly": {
    object(BoundingPoly)
  },
  "landmarks": [
    {
      object(Landmark)
    }
  ],
  "rollAngle": number,
  "panAngle": number,
  "tiltAngle": number,
  "detectionConfidence": number,
  "landmarkingConfidence": number,
  "joyLikelihood": enum(Likelihood),
  "sorrowLikelihood": enum(Likelihood),
  "angerLikelihood": enum(Likelihood),
  "surpriseLikelihood": enum(Likelihood),
  "underExposedLikelihood": enum(Likelihood),
  "blurredLikelihood": enum(Likelihood),
  "headwearLikelihood": enum(Likelihood),
}
Fields
boundingPoly

object(BoundingPoly)

The bounding polygon around the face. The coordinates of the bounding box are in the original image's scale, as returned in ImageParams. The bounding box is computed to "frame" the face in accordance with human expectations. It is based on the landmarker results. Note that one or more x and/or y coordinates may not be generated in the BoundingPoly (the polygon will be unbounded) if only a partial face appears in the image to be annotated.

fdBoundingPoly

object(BoundingPoly)

The fdBoundingPoly bounding polygon is tighter than the boundingPoly, and encloses only the skin part of the face. Typically, it is used to eliminate the face from any image analysis that detects the "amount of skin" visible in an image. It is not based on the landmarker results, only on the initial face detection, hence the

fd

(face detection) prefix.

landmarks[]

object(Landmark)

Detected face landmarks.

rollAngle

number

Roll angle, which indicates the amount of clockwise/anti-clockwise rotation of the face relative to the image vertical about the axis perpendicular to the face. Range [-180,180].

panAngle

number

Yaw angle, which indicates the leftward/rightward angle that the face is pointing relative to the vertical plane perpendicular to the image. Range [-180,180].

tiltAngle

number

Pitch angle, which indicates the upwards/downwards angle that the face is pointing relative to the image's horizontal plane. Range [-180,180].

detectionConfidence

number

Detection confidence. Range [0, 1].

landmarkingConfidence

number

Face landmarking confidence. Range [0, 1].

joyLikelihood

enum(Likelihood)

Joy likelihood.

sorrowLikelihood

enum(Likelihood)

Sorrow likelihood.

angerLikelihood

enum(Likelihood)

Anger likelihood.

surpriseLikelihood

enum(Likelihood)

Surprise likelihood.

underExposedLikelihood

enum(Likelihood)

Under-exposed likelihood.

blurredLikelihood

enum(Likelihood)

Blurred likelihood.

headwearLikelihood

enum(Likelihood)

Headwear likelihood.

BoundingPoly

A bounding polygon for the detected image annotation.

JSON representation
{
  "vertices": [
    {
      object(Vertex)
    }
  ],
}
Fields
vertices[]

object(Vertex)

The bounding polygon vertices.

Vertex

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

JSON representation
{
  "x": number,
  "y": number,
}
Fields
x

number

X coordinate.

y

number

Y coordinate.

Landmark

A face-specific landmark (for example, a face feature). Landmark positions may fall outside the bounds of the image if the face is near one or more edges of the image. Therefore it is NOT guaranteed that 0 <= x < width or 0 <= y < height.

JSON representation
{
  "type": enum(Type),
  "position": {
    object(Position)
  },
}
Fields
type

enum(Type)

Face landmark type.

position

object(Position)

Face landmark position.

Type

Face landmark (feature) type. Left and right are defined from the vantage of the viewer of the image without considering mirror projections typical of photos. So, LEFT_EYE, typically, is the person's right eye.

Enums
UNKNOWN_LANDMARK Unknown face landmark detected. Should not be filled.
LEFT_EYE Left eye.
RIGHT_EYE Right eye.
LEFT_OF_LEFT_EYEBROW Left of left eyebrow.
RIGHT_OF_LEFT_EYEBROW Right of left eyebrow.
LEFT_OF_RIGHT_EYEBROW Left of right eyebrow.
RIGHT_OF_RIGHT_EYEBROW Right of right eyebrow.
MIDPOINT_BETWEEN_EYES Midpoint between eyes.
NOSE_TIP Nose tip.
UPPER_LIP Upper lip.
LOWER_LIP Lower lip.
MOUTH_LEFT Mouth left.
MOUTH_RIGHT Mouth right.
MOUTH_CENTER Mouth center.
NOSE_BOTTOM_RIGHT Nose, bottom right.
NOSE_BOTTOM_LEFT Nose, bottom left.
NOSE_BOTTOM_CENTER Nose, bottom center.
LEFT_EYE_TOP_BOUNDARY Left eye, top boundary.
LEFT_EYE_RIGHT_CORNER Left eye, right corner.
LEFT_EYE_BOTTOM_BOUNDARY Left eye, bottom boundary.
LEFT_EYE_LEFT_CORNER Left eye, left corner.
RIGHT_EYE_TOP_BOUNDARY Right eye, top boundary.
RIGHT_EYE_RIGHT_CORNER Right eye, right corner.
RIGHT_EYE_BOTTOM_BOUNDARY Right eye, bottom boundary.
RIGHT_EYE_LEFT_CORNER Right eye, left corner.
LEFT_EYEBROW_UPPER_MIDPOINT Left eyebrow, upper midpoint.
RIGHT_EYEBROW_UPPER_MIDPOINT Right eyebrow, upper midpoint.
LEFT_EAR_TRAGION Left ear tragion.
RIGHT_EAR_TRAGION Right ear tragion.
LEFT_EYE_PUPIL Left eye pupil.
RIGHT_EYE_PUPIL Right eye pupil.
FOREHEAD_GLABELLA Forehead glabella.
CHIN_GNATHION Chin gnathion.
CHIN_LEFT_GONION Chin left gonion.
CHIN_RIGHT_GONION Chin right gonion.

Position

A 3D position in the image, used primarily for Face detection landmarks. A valid Position must have both x and y coordinates. The position coordinates are in the same scale as the original image.

JSON representation
{
  "x": number,
  "y": number,
  "z": number,
}
Fields
x

number

X coordinate.

y

number

Y coordinate.

z

number

Z coordinate (or depth).

Likelihood

A bucketized representation of likelihood, which is intended to give clients highly stable results across model upgrades.

Enums
UNKNOWN Unknown likelihood.
VERY_UNLIKELY It is very unlikely that the image belongs to the specified vertical.
UNLIKELY It is unlikely that the image belongs to the specified vertical.
POSSIBLE It is possible that the image belongs to the specified vertical.
LIKELY It is likely that the image belongs to the specified vertical.
VERY_LIKELY It is very likely that the image belongs to the specified vertical.

EntityAnnotation

Set of detected entity features.

JSON representation
{
  "mid": string,
  "locale": string,
  "description": string,
  "score": number,
  "confidence": number,
  "topicality": number,
  "boundingPoly": {
    object(BoundingPoly)
  },
  "locations": [
    {
      object(LocationInfo)
    }
  ],
  "properties": [
    {
      object(Property)
    }
  ],
}
Fields
mid

string

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API.

locale

string

The language code for the locale in which the entity textual description is expressed.

description

string

Entity textual description, expressed in its locale language.

score

number

Overall score of the result. Range [0, 1].

confidence

number

The accuracy of the entity detection in an image. For example, for an image in which the "Eiffel Tower" entity is detected, this field represents the confidence that there is a tower in the query image. Range [0, 1].

topicality

number

The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1].

boundingPoly

object(BoundingPoly)

Image region to which this entity belongs. Currently not produced for LABEL_DETECTION features. For TEXT_DETECTION (OCR), boundingPolys are produced for the entire text detected in an image region, followed by boundingPolys for each word within the detected text.

locations[]

object(LocationInfo)

The location information for the detected entity. Multiple LocationInfo elements can be present because one location may indicate the location of the scene in the image, and another location may indicate the location of the place where the image was taken. Location information is usually present for landmarks.

properties[]

object(Property)

Some entities may have optional user-supplied Property (name/value) fields, such a score or string that qualifies the entity.

LocationInfo

Detected entity location information.

JSON representation
{
  "latLng": {
    object(LatLng)
  },
}
Fields
latLng

object(LatLng)

lat/long location coordinates.

Property

A Property consists of a user-supplied name/value pair.

JSON representation
{
  "name": string,
  "value": string,
  "uint64Value": string,
}
Fields
name

string

Name of the property.

value

string

Value of the property.

uint64Value

string

Value of numeric properties.

TextAnnotation

TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol Each structural component, starting from Page, may further have their own properties. Properties describe detected languages, breaks etc.. Please refer to the google.cloud.vision.v1.TextAnnotation.TextProperty message definition below for more detail.

JSON representation
{
  "pages": [
    {
      object(Page)
    }
  ],
  "text": string,
}
Fields
pages[]

object(Page)

List of pages detected by OCR.

text

string

UTF-8 text detected on the pages.

Page

Detected page from OCR.

JSON representation
{
  "property": {
    object(TextProperty)
  },
  "width": number,
  "height": number,
  "blocks": [
    {
      object(Block)
    }
  ],
}
Fields
property

object(TextProperty)

Additional information detected on the page.

width

number

Page width in pixels.

height

number

Page height in pixels.

blocks[]

object(Block)

List of blocks of text, images etc on this page.

TextProperty

Additional information detected on the structural component.

JSON representation
{
  "detectedLanguages": [
    {
      object(DetectedLanguage)
    }
  ],
  "detectedBreak": {
    object(DetectedBreak)
  },
}
Fields
detectedLanguages[]

object(DetectedLanguage)

A list of detected languages together with confidence.

detectedBreak

object(DetectedBreak)

Detected start or end of a text segment.

DetectedLanguage

Detected language for a structural component.

JSON representation
{
  "languageCode": string,
  "confidence": number,
}
Fields
languageCode

string

The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

confidence

number

Confidence of detected language. Range [0, 1].

DetectedBreak

Detected start or end of a structural component.

JSON representation
{
  "type": enum(BreakType),
  "isPrefix": boolean,
}
Fields
type

enum(BreakType)

isPrefix

boolean

True if break prepends the element.

BreakType

Enum to denote the type of break found. New line, space etc.

Enums
UNKNOWN Unknown break label type.
SPACE Regular space.
SURE_SPACE Sure space (very wide).
EOL_SURE_SPACE Line-wrapping break.
HYPHEN End-line hyphen that is not present in text; does
LINE_BREAK not co-occur with SPACE, LEADER_SPACE, or LINE_BREAK. Line break that ends a paragraph.

Block

Logical element on the page.

JSON representation
{
  "property": {
    object(TextProperty)
  },
  "boundingBox": {
    object(BoundingPoly)
  },
  "paragraphs": [
    {
      object(Paragraph)
    }
  ],
  "blockType": enum(BlockType),
}
Fields
property

object(TextProperty)

Additional information detected for the block.

boundingBox

object(BoundingPoly)

The bounding box for the block. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

paragraphs[]

object(Paragraph)

List of paragraphs in this block (if this blocks is of type text).

blockType

enum(BlockType)

Detected block type (text, image etc) for this block.

Paragraph

Structural unit of text representing a number of words in certain order.

JSON representation
{
  "property": {
    object(TextProperty)
  },
  "boundingBox": {
    object(BoundingPoly)
  },
  "words": [
    {
      object(Word)
    }
  ],
}
Fields
property

object(TextProperty)

Additional information detected for the paragraph.

boundingBox

object(BoundingPoly)

The bounding box for the paragraph. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

words[]

object(Word)

List of words in this paragraph.

Word

A word representation.

JSON representation
{
  "property": {
    object(TextProperty)
  },
  "boundingBox": {
    object(BoundingPoly)
  },
  "symbols": [
    {
      object(Symbol)
    }
  ],
}
Fields
property

object(TextProperty)

Additional information detected for the word.

boundingBox

object(BoundingPoly)

The bounding box for the word. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

symbols[]

object(Symbol)

List of symbols in the word. The order of the symbols follows the natural reading order.

Symbol

A single symbol representation.

JSON representation
{
  "property": {
    object(TextProperty)
  },
  "boundingBox": {
    object(BoundingPoly)
  },
  "text": string,
}
Fields
property

object(TextProperty)

Additional information detected for the symbol.

boundingBox

object(BoundingPoly)

The bounding box for the symbol. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example: * when the text is horizontal it might look like: 0----1 | | 3----2 * when it's rotated 180 degrees around the top-left corner it becomes: 2----3 | | 1----0 and the vertice order will still be (0, 1, 2, 3).

text

string

The actual UTF-8 representation of the symbol.

BlockType

Type of a block (text, image etc) as identified by OCR.

Enums
UNKNOWN Unknown block type.
TEXT Regular text block.
TABLE Table block.
PICTURE Image block.
RULER Horizontal/vertical line box.
BARCODE Barcode block.

SafeSearchAnnotation

Set of features pertaining to the image, computed by computer vision methods over safe-search verticals (for example, adult, spoof, medical, violence).

JSON representation
{
  "adult": enum(Likelihood),
  "spoof": enum(Likelihood),
  "medical": enum(Likelihood),
  "violence": enum(Likelihood),
}
Fields
adult

enum(Likelihood)

Represents the adult content likelihood for the image.

spoof

enum(Likelihood)

Spoof likelihood. The likelihood that an modification was made to the image's canonical version to make it appear funny or offensive.

medical

enum(Likelihood)

Likelihood that this is a medical image.

violence

enum(Likelihood)

Violence likelihood.

ImageProperties

Stores image properties, such as dominant colors.

JSON representation
{
  "dominantColors": {
    object(DominantColorsAnnotation)
  },
}
Fields
dominantColors

object(DominantColorsAnnotation)

If present, dominant colors completed successfully.

DominantColorsAnnotation

Set of dominant colors and their corresponding scores.

JSON representation
{
  "colors": [
    {
      object(ColorInfo)
    }
  ],
}
Fields
colors[]

object(ColorInfo)

RGB color values with their score and pixel fraction.

ColorInfo

Color information consists of RGB channels, score, and the fraction of the image that the color occupies in the image.

JSON representation
{
  "color": {
    object(Color)
  },
  "score": number,
  "pixelFraction": number,
}
Fields
color

object(Color)

RGB components of the color.

score

number

Image-specific score for this color. Value in range [0, 1].

pixelFraction

number

The fraction of pixels the color occupies in the image. Value in range [0, 1].

Color

Represents a color in the RGBA color space. This representation is designed for simplicity of conversion to/from color representations in various languages over compactness; for example, the fields of this representation can be trivially provided to the constructor of "java.awt.Color" in Java; it can also be trivially provided to UIColor's "+colorWithRed:green:blue:alpha" method in iOS; and, with just a little work, it can be easily formatted into a CSS "rgba()" string in JavaScript, as well. Here are some examples:

Example (Java):

 import com.google.type.Color;

 // ...
 public static java.awt.Color fromProto(Color protocolor) {
   float alpha = protocolor.hasAlpha()
       ? protocolor.getAlpha().getValue()
       : 1.0;

   return new java.awt.Color(
       protocolor.getRed(),
       protocolor.getGreen(),
       protocolor.getBlue(),
       alpha);
 }

 public static Color toProto(java.awt.Color color) {
   float red = (float) color.getRed();
   float green = (float) color.getGreen();
   float blue = (float) color.getBlue();
   float denominator = 255.0;
   Color.Builder resultBuilder =
       Color
           .newBuilder()
           .setRed(red / denominator)
           .setGreen(green / denominator)
           .setBlue(blue / denominator);
   int alpha = color.getAlpha();
   if (alpha != 255) {
     result.setAlpha(
         FloatValue
             .newBuilder()
             .setValue(((float) alpha) / denominator)
             .build());
   }
   return resultBuilder.build();
 }
 // ...

Example (iOS / Obj-C):

 // ...
 static UIColor* fromProto(Color* protocolor) {
    float red = [protocolor red];
    float green = [protocolor green];
    float blue = [protocolor blue];
    FloatValue* alpha_wrapper = [protocolor alpha];
    float alpha = 1.0;
    if (alpha_wrapper != nil) {
      alpha = [alpha_wrapper value];
    }
    return [UIColor colorWithRed:red green:green blue:blue alpha:alpha];
 }

 static Color* toProto(UIColor* color) {
     CGFloat red, green, blue, alpha;
     if (![color getRed:&red green:&green blue:&blue alpha:&alpha]) {
       return nil;
     }
     Color* result = [Color alloc] init];
     [result setRed:red];
     [result setGreen:green];
     [result setBlue:blue];
     if (alpha <= 0.9999) {
       [result setAlpha:floatWrapperWithValue(alpha)];
     }
     [result autorelease];
     return result;
}
// ...

Example (JavaScript):

// ...

var protoToCssColor = function(rgb_color) {
   var redFrac = rgb_color.red || 0.0;
   var greenFrac = rgb_color.green || 0.0;
   var blueFrac = rgb_color.blue || 0.0;
   var red = Math.floor(redFrac * 255);
   var green = Math.floor(greenFrac * 255);
   var blue = Math.floor(blueFrac * 255);

   if (!('alpha' in rgb_color)) {
      return rgbToCssColor_(red, green, blue);
   }

   var alphaFrac = rgb_color.alpha.value || 0.0;
   var rgbParams = [red, green, blue].join(',');
   return ['rgba(', rgbParams, ',', alphaFrac, ')'].join('');
};

var rgbToCssColor_ = function(red, green, blue) {
  var rgbNumber = new Number((red << 16) | (green << 8) | blue);
  var hexString = rgbNumber.toString(16);
  var missingZeros = 6 - hexString.length;
  var resultBuilder = ['#'];
  for (var i = 0; i < missingZeros; i++) {
     resultBuilder.push('0');
  }
  resultBuilder.push(hexString);
  return resultBuilder.join('');
};

// ...
JSON representation
{
  "red": number,
  "green": number,
  "blue": number,
  "alpha": number,
}
Fields
red

number

The amount of red in the color as a value in the interval [0, 1].

green

number

The amount of green in the color as a value in the interval [0, 1].

blue

number

The amount of blue in the color as a value in the interval [0, 1].

alpha

number

The fraction of this color that should be applied to the pixel. That is, the final pixel color is defined by the equation:

pixel color = alpha * (this color) + (1.0 - alpha) * (background color)

This means that a value of 1.0 corresponds to a solid color, whereas a value of 0.0 corresponds to a completely transparent color. This uses a wrapper message rather than a simple float scalar so that it is possible to distinguish between a default value and the value being unset. If omitted, this color object is to be rendered as a solid color (as if the alpha value had been explicitly given with a value of 1.0).

CropHintsAnnotation

Set of crop hints that are used to generate new crops when serving images.

JSON representation
{
  "cropHints": [
    {
      object(CropHint)
    }
  ],
}
Fields
cropHints[]

object(CropHint)

CropHint

Single crop hint that is used to generate a new crop when serving an image.

JSON representation
{
  "boundingPoly": {
    object(BoundingPoly)
  },
  "confidence": number,
  "importanceFraction": number,
}
Fields
boundingPoly

object(BoundingPoly)

The bounding polygon for the crop region. The coordinates of the bounding box are in the original image's scale, as returned in ImageParams.

confidence

number

Confidence of this being a salient region. Range [0, 1].

importanceFraction

number

Fraction of importance of this salient region with respect to the original image.

WebDetection

Relevant information for the image from the Internet.

JSON representation
{
  "webEntities": [
    {
      object(WebEntity)
    }
  ],
  "fullMatchingImages": [
    {
      object(WebImage)
    }
  ],
  "partialMatchingImages": [
    {
      object(WebImage)
    }
  ],
  "pagesWithMatchingImages": [
    {
      object(WebPage)
    }
  ],
}
Fields
webEntities[]

object(WebEntity)

Deduced entities from similar images on the Internet.

fullMatchingImages[]

object(WebImage)

Fully matching images from the Internet. They're definite neardups and most often a copy of the query image with merely a size change.

partialMatchingImages[]

object(WebImage)

Partial matching images from the Internet. Those images are similar enough to share some key-point features. For example an original image will likely have partial matching for its crops.

pagesWithMatchingImages[]

object(WebPage)

Web pages containing the matching images from the Internet.

WebEntity

Entity deduced from similar images on the Internet.

JSON representation
{
  "entityId": string,
  "score": number,
  "description": string,
}
Fields
entityId

string

Opaque entity ID.

score

number

Overall relevancy score for the entity. Not normalized and not comparable across different image queries.

description

string

Canonical description of the entity, in English.

WebImage

Metadata for online images.

JSON representation
{
  "url": string,
  "score": number,
}
Fields
url

string

The result image URL.

score

number

Overall relevancy score for the image. Not normalized and not comparable across different image queries.

WebPage

Metadata for web pages.

JSON representation
{
  "url": string,
  "score": number,
}
Fields
url

string

The result web page URL.

score

number

Overall relevancy score for the web page. Not normalized and not comparable across different image queries.

Status

The Status type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by gRPC. The error model is designed to be:

  • Simple to use and understand for most users
  • Flexible enough to meet unexpected needs

Overview

The Status message contains three pieces of data: error code, error message, and error details. The error code should be an enum value of google.rpc.Code, but it may accept additional error codes if needed. The error message should be a developer-facing English message that helps developers understand and resolve the error. If a localized user-facing error message is needed, put the localized message in the error details or localize it in the client. The optional error details may contain arbitrary information about the error. There is a predefined set of error detail types in the package google.rpc which can be used for common error conditions.

Language mapping

The Status message is the logical representation of the error model, but it is not necessarily the actual wire format. When the Status message is exposed in different client libraries and different wire protocols, it can be mapped differently. For example, it will likely be mapped to some exceptions in Java, but more likely mapped to some error codes in C.

Other uses

The error model and the Status message can be used in a variety of environments, either with or without APIs, to provide a consistent developer experience across different environments.

Example uses of this error model include:

  • Partial errors. If a service needs to return partial errors to the client, it may embed the Status in the normal response to indicate the partial errors.

  • Workflow errors. A typical workflow has multiple steps. Each step may have a Status message for error reporting purpose.

  • Batch operations. If a client uses batch request and batch response, the Status message should be used directly inside batch response, one for each error sub-response.

  • Asynchronous operations. If an API call embeds asynchronous operation results in its response, the status of those operations should be represented directly using the Status message.

  • Logging. If some API errors are stored in logs, the message Status could be used directly after any stripping needed for security/privacy reasons.

JSON representation
{
  "code": number,
  "message": string,
  "details": [
    {
      "@type": string,
      field1: ...,
      ...
    }
  ],
}
Fields
code

number

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There will be a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Try it!

Send feedback about...

Google Cloud Vision API