- JSON representation
- FaceAnnotation
- Landmark
- Type
- Position
- Likelihood
- EntityAnnotation
- LocationInfo
- Property
- LocalizedObjectAnnotation
- TextAnnotation
- Page
- TextProperty
- DetectedLanguage
- DetectedBreak
- BreakType
- Block
- Paragraph
- Word
- Symbol
- BlockType
- SafeSearchAnnotation
- ImageProperties
- DominantColorsAnnotation
- ColorInfo
- Color
- CropHintsAnnotation
- CropHint
- WebDetection
- WebEntity
- WebImage
- WebPage
- WebLabel
- ProductSearchResults
- Result
- GroupedResult
- ObjectAnnotation
- ImageAnnotationContext
Response to an image annotation request.
JSON representation |
---|
{ "faceAnnotations": [ { object ( |
Fields | |
---|---|
faceAnnotations[] |
If present, face detection has completed successfully. |
landmarkAnnotations[] |
If present, landmark detection has completed successfully. |
logoAnnotations[] |
If present, logo detection has completed successfully. |
labelAnnotations[] |
If present, label detection has completed successfully. |
localizedObjectAnnotations[] |
If present, localized object detection has completed successfully. This will be sorted descending by confidence score. |
textAnnotations[] |
If present, text (OCR) detection has completed successfully. |
fullTextAnnotation |
If present, text (OCR) detection or document (OCR) text detection has completed successfully. This annotation provides the structural hierarchy for the OCR detected text. |
safeSearchAnnotation |
If present, safe-search annotation has completed successfully. |
imagePropertiesAnnotation |
If present, image properties were extracted successfully. |
cropHintsAnnotation |
If present, crop hints have completed successfully. |
webDetection |
If present, web detection has completed successfully. |
productSearchResults |
If present, product search has completed successfully. |
error |
If set, represents the error message for the operation. Note that filled-in image annotations are guaranteed to be correct, even when |
context |
If present, contextual information is needed to understand where this image comes from. |
FaceAnnotation
A face annotation object contains the results of face detection.
JSON representation |
---|
{ "boundingPoly": { object ( |
Fields | |
---|---|
boundingPoly |
The bounding polygon around the face. The coordinates of the bounding box are
in the original image's scale. The bounding box is computed to "frame" the
face in accordance with human expectations. It is based on the landmarker
results. Note that one or more x and/or y coordinates may not be generated in
the |
fdBoundingPoly |
The
(face detection) prefix. |
landmarks[] |
Detected face landmarks. |
rollAngle |
Roll angle, which indicates the amount of clockwise/anti-clockwise rotation of the face relative to the image vertical about the axis perpendicular to the face. Range [-180,180]. |
panAngle |
Yaw angle, which indicates the leftward/rightward angle that the face is pointing relative to the vertical plane perpendicular to the image. Range [-180,180]. |
tiltAngle |
Pitch angle, which indicates the upwards/downwards angle that the face is pointing relative to the image's horizontal plane. Range [-180,180]. |
detectionConfidence |
Detection confidence. Range [0, 1]. |
landmarkingConfidence |
Face landmarking confidence. Range [0, 1]. |
joyLikelihood |
Joy likelihood. |
sorrowLikelihood |
Sorrow likelihood. |
angerLikelihood |
Anger likelihood. |
surpriseLikelihood |
Surprise likelihood. |
underExposedLikelihood |
Under-exposed likelihood. |
blurredLikelihood |
Blurred likelihood. |
headwearLikelihood |
Headwear likelihood. |
Landmark
A face-specific landmark (for example, a face feature).
Type
Face landmark (feature) type. Left and right are defined from the vantage of the viewer of the image without considering mirror projections typical of photos. So, LEFT_EYE
, typically, is the person's right eye.
Enums | |
---|---|
UNKNOWN_LANDMARK |
Unknown face landmark detected. Should not be filled. |
LEFT_EYE |
Left eye. |
RIGHT_EYE |
Right eye. |
LEFT_OF_LEFT_EYEBROW |
Left of left eyebrow. |
RIGHT_OF_LEFT_EYEBROW |
Right of left eyebrow. |
LEFT_OF_RIGHT_EYEBROW |
Left of right eyebrow. |
RIGHT_OF_RIGHT_EYEBROW |
Right of right eyebrow. |
MIDPOINT_BETWEEN_EYES |
Midpoint between eyes. |
NOSE_TIP |
Nose tip. |
UPPER_LIP |
Upper lip. |
LOWER_LIP |
Lower lip. |
MOUTH_LEFT |
Mouth left. |
MOUTH_RIGHT |
Mouth right. |
MOUTH_CENTER |
Mouth center. |
NOSE_BOTTOM_RIGHT |
Nose, bottom right. |
NOSE_BOTTOM_LEFT |
Nose, bottom left. |
NOSE_BOTTOM_CENTER |
Nose, bottom center. |
LEFT_EYE_TOP_BOUNDARY |
Left eye, top boundary. |
LEFT_EYE_RIGHT_CORNER |
Left eye, right corner. |
LEFT_EYE_BOTTOM_BOUNDARY |
Left eye, bottom boundary. |
LEFT_EYE_LEFT_CORNER |
Left eye, left corner. |
RIGHT_EYE_TOP_BOUNDARY |
Right eye, top boundary. |
RIGHT_EYE_RIGHT_CORNER |
Right eye, right corner. |
RIGHT_EYE_BOTTOM_BOUNDARY |
Right eye, bottom boundary. |
RIGHT_EYE_LEFT_CORNER |
Right eye, left corner. |
LEFT_EYEBROW_UPPER_MIDPOINT |
Left eyebrow, upper midpoint. |
RIGHT_EYEBROW_UPPER_MIDPOINT |
Right eyebrow, upper midpoint. |
LEFT_EAR_TRAGION |
Left ear tragion. |
RIGHT_EAR_TRAGION |
Right ear tragion. |
LEFT_EYE_PUPIL |
Left eye pupil. |
RIGHT_EYE_PUPIL |
Right eye pupil. |
FOREHEAD_GLABELLA |
Forehead glabella. |
CHIN_GNATHION |
Chin gnathion. |
CHIN_LEFT_GONION |
Chin left gonion. |
CHIN_RIGHT_GONION |
Chin right gonion. |
LEFT_CHEEK_CENTER |
Left cheek center. |
RIGHT_CHEEK_CENTER |
Right cheek center. |
Position
A 3D position in the image, used primarily for Face detection landmarks. A valid Position must have both x and y coordinates. The position coordinates are in the same scale as the original image.
JSON representation |
---|
{ "x": number, "y": number, "z": number } |
Fields | |
---|---|
x |
X coordinate. |
y |
Y coordinate. |
z |
Z coordinate (or depth). |
Likelihood
A bucketized representation of likelihood, which is intended to give clients highly stable results across model upgrades.
Enums | |
---|---|
UNKNOWN |
Unknown likelihood. |
VERY_UNLIKELY |
It is very unlikely. |
UNLIKELY |
It is unlikely. |
POSSIBLE |
It is possible. |
LIKELY |
It is likely. |
VERY_LIKELY |
It is very likely. |
EntityAnnotation
Set of detected entity features.
JSON representation |
---|
{ "mid": string, "locale": string, "description": string, "score": number, "confidence": number, "topicality": number, "boundingPoly": { object ( |
Fields | |
---|---|
mid |
Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API. |
locale |
The language code for the locale in which the entity textual |
description |
Entity textual description, expressed in its |
score |
Overall score of the result. Range [0, 1]. |
confidence |
Deprecated. Use |
topicality |
The relevancy of the ICA (Image Content Annotation) label to the image. For example, the relevancy of "tower" is likely higher to an image containing the detected "Eiffel Tower" than to an image containing a detected distant towering building, even though the confidence that there is a tower in each image may be the same. Range [0, 1]. |
boundingPoly |
Image region to which this entity belongs. Not produced for |
locations[] |
The location information for the detected entity. Multiple |
properties[] |
Some entities may have optional user-supplied |
LocationInfo
Detected entity location information.
JSON representation |
---|
{
"latLng": {
object ( |
Fields | |
---|---|
latLng |
lat/long location coordinates. |
Property
A Property
consists of a user-supplied name/value pair.
JSON representation |
---|
{ "name": string, "value": string, "uint64Value": string } |
Fields | |
---|---|
name |
Name of the property. |
value |
Value of the property. |
uint64Value |
Value of numeric properties. |
LocalizedObjectAnnotation
Set of detected objects with bounding boxes.
JSON representation |
---|
{
"mid": string,
"languageCode": string,
"name": string,
"score": number,
"boundingPoly": {
object ( |
Fields | |
---|---|
mid |
Object ID that should align with EntityAnnotation mid. |
languageCode |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
name |
Object name, expressed in its |
score |
Score of the result. Range [0, 1]. |
boundingPoly |
Image region to which this object belongs. This must be populated. |
TextAnnotation
TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol Each structural component, starting from Page, may further have their own properties. Properties describe detected languages, breaks etc.. Please refer to the TextAnnotation.TextProperty
message definition below for more detail.
JSON representation |
---|
{
"pages": [
{
object ( |
Fields | |
---|---|
pages[] |
List of pages detected by OCR. |
text |
UTF-8 text detected on the pages. |
Page
Detected page from OCR.
JSON representation |
---|
{ "property": { object ( |
Fields | |
---|---|
property |
Additional information detected on the page. |
width |
Page width. For PDFs the unit is points. For images (including TIFFs) the unit is pixels. |
height |
Page height. For PDFs the unit is points. For images (including TIFFs) the unit is pixels. |
blocks[] |
List of blocks of text, images etc on this page. |
confidence |
Confidence of the OCR results on the page. Range [0, 1]. |
TextProperty
Additional information detected on the structural component.
JSON representation |
---|
{ "detectedLanguages": [ { object ( |
Fields | |
---|---|
detectedLanguages[] |
A list of detected languages together with confidence. |
detectedBreak |
Detected start or end of a text segment. |
DetectedLanguage
Detected language for a structural component.
JSON representation |
---|
{ "languageCode": string, "confidence": number } |
Fields | |
---|---|
languageCode |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
confidence |
Confidence of detected language. Range [0, 1]. |
DetectedBreak
Detected start or end of a structural component.
JSON representation |
---|
{
"type": enum ( |
Fields | |
---|---|
type |
Detected break type. |
isPrefix |
True if break prepends the element. |
BreakType
Enum to denote the type of break found. New line, space etc.
Enums | |
---|---|
UNKNOWN |
Unknown break label type. |
SPACE |
Regular space. |
SURE_SPACE |
Sure space (very wide). |
EOL_SURE_SPACE |
Line-wrapping break. |
HYPHEN |
End-line hyphen that is not present in text; does not co-occur with SPACE , LEADER_SPACE , or LINE_BREAK . |
LINE_BREAK |
Line break that ends a paragraph. |
Block
Logical element on the page.
JSON representation |
---|
{ "property": { object ( |
Fields | |
---|---|
property |
Additional information detected for the block. |
boundingBox |
The bounding box for the block. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example:
and the vertex order will still be (0, 1, 2, 3). |
paragraphs[] |
List of paragraphs in this block (if this blocks is of type text). |
blockType |
Detected block type (text, image etc) for this block. |
confidence |
Confidence of the OCR results on the block. Range [0, 1]. |
Paragraph
Structural unit of text representing a number of words in certain order.
JSON representation |
---|
{ "property": { object ( |
Fields | |
---|---|
property |
Additional information detected for the paragraph. |
boundingBox |
The bounding box for the paragraph. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example:
and the vertex order will still be (0, 1, 2, 3). |
words[] |
List of all words in this paragraph. |
confidence |
Confidence of the OCR results for the paragraph. Range [0, 1]. |
Word
A word representation.
JSON representation |
---|
{ "property": { object ( |
Fields | |
---|---|
property |
Additional information detected for the word. |
boundingBox |
The bounding box for the word. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example:
and the vertex order will still be (0, 1, 2, 3). |
symbols[] |
List of symbols in the word. The order of the symbols follows the natural reading order. |
confidence |
Confidence of the OCR results for the word. Range [0, 1]. |
Symbol
A single symbol representation.
JSON representation |
---|
{ "property": { object ( |
Fields | |
---|---|
property |
Additional information detected for the symbol. |
boundingBox |
The bounding box for the symbol. The vertices are in the order of top-left, top-right, bottom-right, bottom-left. When a rotation of the bounding box is detected the rotation is represented as around the top-left corner as defined when the text is read in the 'natural' orientation. For example:
and the vertex order will still be (0, 1, 2, 3). |
text |
The actual UTF-8 representation of the symbol. |
confidence |
Confidence of the OCR results for the symbol. Range [0, 1]. |
BlockType
Type of a block (text, image etc) as identified by OCR.
Enums | |
---|---|
UNKNOWN |
Unknown block type. |
TEXT |
Regular text block. |
TABLE |
Table block. |
PICTURE |
Image block. |
RULER |
Horizontal/vertical line box. |
BARCODE |
Barcode block. |
SafeSearchAnnotation
Set of features pertaining to the image, computed by computer vision methods over safe-search verticals (for example, adult, spoof, medical, violence).
JSON representation |
---|
{ "adult": enum ( |
Fields | |
---|---|
adult |
Represents the adult content likelihood for the image. Adult content may contain elements such as nudity, pornographic images or cartoons, or sexual activities. |
spoof |
Spoof likelihood. The likelihood that a modification was made to the image's canonical version to make it appear funny or offensive. |
medical |
Likelihood that this is a medical image. |
violence |
Likelihood that this image contains violent content. Violent content may include death, serious harm, or injury to individuals or groups of individuals. |
racy |
Likelihood that the request image contains racy content. Racy content may include (but is not limited to) skimpy or sheer clothing, strategically covered nudity, lewd or provocative poses, or close-ups of sensitive body areas. |
ImageProperties
Stores image properties, such as dominant colors.
JSON representation |
---|
{
"dominantColors": {
object ( |
Fields | |
---|---|
dominantColors |
If present, dominant colors completed successfully. |
DominantColorsAnnotation
Set of dominant colors and their corresponding scores.
JSON representation |
---|
{
"colors": [
{
object ( |
Fields | |
---|---|
colors[] |
RGB color values with their score and pixel fraction. |
ColorInfo
Color information consists of RGB channels, score, and the fraction of the image that the color occupies in the image.
JSON representation |
---|
{
"color": {
object ( |
Fields | |
---|---|
color |
RGB components of the color. |
score |
Image-specific score for this color. Value in range [0, 1]. |
pixelFraction |
The fraction of pixels the color occupies in the image. Value in range [0, 1]. |
Color
Represents a color in the RGBA color space. This representation is designed for simplicity of conversion to/from color representations in various languages over compactness. For example, the fields of this representation can be trivially provided to the constructor of java.awt.Color
in Java; it can also be trivially provided to UIColor's +colorWithRed:green:blue:alpha
method in iOS; and, with just a little work, it can be easily formatted into a CSS rgba()
string in JavaScript.
This reference page doesn't carry information about the absolute color space that should be used to interpret the RGB value (e.g. sRGB, Adobe RGB, DCI-P3, BT.2020, etc.). By default, applications should assume the sRGB color space.
When color equality needs to be decided, implementations, unless documented otherwise, treat two colors as equal if all their red, green, blue, and alpha values each differ by at most 1e-5.
Example (Java):
import com.google.type.Color;
// ...
public static java.awt.Color fromProto(Color protocolor) {
float alpha = protocolor.hasAlpha()
? protocolor.getAlpha().getValue()
: 1.0;
return new java.awt.Color(
protocolor.getRed(),
protocolor.getGreen(),
protocolor.getBlue(),
alpha);
}
public static Color toProto(java.awt.Color color) {
float red = (float) color.getRed();
float green = (float) color.getGreen();
float blue = (float) color.getBlue();
float denominator = 255.0;
Color.Builder resultBuilder =
Color
.newBuilder()
.setRed(red / denominator)
.setGreen(green / denominator)
.setBlue(blue / denominator);
int alpha = color.getAlpha();
if (alpha != 255) {
result.setAlpha(
FloatValue
.newBuilder()
.setValue(((float) alpha) / denominator)
.build());
}
return resultBuilder.build();
}
// ...
Example (iOS / Obj-C):
// ...
static UIColor* fromProto(Color* protocolor) {
float red = [protocolor red];
float green = [protocolor green];
float blue = [protocolor blue];
FloatValue* alpha_wrapper = [protocolor alpha];
float alpha = 1.0;
if (alpha_wrapper != nil) {
alpha = [alpha_wrapper value];
}
return [UIColor colorWithRed:red green:green blue:blue alpha:alpha];
}
static Color* toProto(UIColor* color) {
CGFloat red, green, blue, alpha;
if (![color getRed:&red green:&green blue:&blue alpha:&alpha]) {
return nil;
}
Color* result = [[Color alloc] init];
[result setRed:red];
[result setGreen:green];
[result setBlue:blue];
if (alpha <= 0.9999) {
[result setAlpha:floatWrapperWithValue(alpha)];
}
[result autorelease];
return result;
}
// ...
Example (JavaScript):
// ...
var protoToCssColor = function(rgb_color) {
var redFrac = rgb_color.red || 0.0;
var greenFrac = rgb_color.green || 0.0;
var blueFrac = rgb_color.blue || 0.0;
var red = Math.floor(redFrac * 255);
var green = Math.floor(greenFrac * 255);
var blue = Math.floor(blueFrac * 255);
if (!('alpha' in rgb_color)) {
return rgbToCssColor(red, green, blue);
}
var alphaFrac = rgb_color.alpha.value || 0.0;
var rgbParams = [red, green, blue].join(',');
return ['rgba(', rgbParams, ',', alphaFrac, ')'].join('');
};
var rgbToCssColor = function(red, green, blue) {
var rgbNumber = new Number((red << 16) | (green << 8) | blue);
var hexString = rgbNumber.toString(16);
var missingZeros = 6 - hexString.length;
var resultBuilder = ['#'];
for (var i = 0; i < missingZeros; i++) {
resultBuilder.push('0');
}
resultBuilder.push(hexString);
return resultBuilder.join('');
};
// ...
JSON representation |
---|
{ "red": number, "green": number, "blue": number, "alpha": number } |
Fields | |
---|---|
red |
The amount of red in the color as a value in the interval [0, 1]. |
green |
The amount of green in the color as a value in the interval [0, 1]. |
blue |
The amount of blue in the color as a value in the interval [0, 1]. |
alpha |
The fraction of this color that should be applied to the pixel. That is, the final pixel color is defined by the equation:
This means that a value of 1.0 corresponds to a solid color, whereas a value of 0.0 corresponds to a completely transparent color. This uses a wrapper message rather than a simple float scalar so that it is possible to distinguish between a default value and the value being unset. If omitted, this color object is rendered as a solid color (as if the alpha value had been explicitly given a value of 1.0). |
CropHintsAnnotation
Set of crop hints that are used to generate new crops when serving images.
JSON representation |
---|
{
"cropHints": [
{
object ( |
Fields | |
---|---|
cropHints[] |
Crop hint results. |
CropHint
Single crop hint that is used to generate a new crop when serving an image.
JSON representation |
---|
{
"boundingPoly": {
object ( |
Fields | |
---|---|
boundingPoly |
The bounding polygon for the crop region. The coordinates of the bounding box are in the original image's scale. |
confidence |
Confidence of this being a salient region. Range [0, 1]. |
importanceFraction |
Fraction of importance of this salient region with respect to the original image. |
WebDetection
Relevant information for the image from the Internet.
JSON representation |
---|
{ "webEntities": [ { object ( |
Fields | |
---|---|
webEntities[] |
Deduced entities from similar images on the Internet. |
fullMatchingImages[] |
Fully matching images from the Internet. Can include resized copies of the query image. |
partialMatchingImages[] |
Partial matching images from the Internet. Those images are similar enough to share some key-point features. For example an original image will likely have partial matching for its crops. |
pagesWithMatchingImages[] |
Web pages containing the matching images from the Internet. |
visuallySimilarImages[] |
The visually similar image results. |
bestGuessLabels[] |
The service's best guess as to the topic of the request image. Inferred from similar images on the open web. |
WebEntity
Entity deduced from similar images on the Internet.
JSON representation |
---|
{ "entityId": string, "score": number, "description": string } |
Fields | |
---|---|
entityId |
Opaque entity ID. |
score |
Overall relevancy score for the entity. Not normalized and not comparable across different image queries. |
description |
Canonical description of the entity, in English. |
WebImage
Metadata for online images.
JSON representation |
---|
{ "url": string, "score": number } |
Fields | |
---|---|
url |
The result image URL. |
score |
(Deprecated) Overall relevancy score for the image. |
WebPage
Metadata for web pages.
Fields | |
---|---|
url |
The result web page URL. |
score |
(Deprecated) Overall relevancy score for the web page. |
pageTitle |
Title for the web page, may contain HTML markups. |
fullMatchingImages[] |
Fully matching images on the page. Can include resized copies of the query image. |
partialMatchingImages[] |
Partial matching images on the page. Those images are similar enough to share some key-point features. For example an original image will likely have partial matching for its crops. |
WebLabel
Label to provide extra metadata for the web detection.
JSON representation |
---|
{ "label": string, "languageCode": string } |
Fields | |
---|---|
label |
Label for extra metadata. |
languageCode |
The BCP-47 language code for |
ProductSearchResults
Results for a product search request.
JSON representation |
---|
{ "indexTime": string, "results": [ { object ( |
Fields | |
---|---|
indexTime |
Timestamp of the index which provided these results. Products added to the product set and products removed from the product set after this time are not reflected in the current results. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
results[] |
List of results, one for each product match. |
productGroupedResults[] |
List of results grouped by products detected in the query image. Each entry corresponds to one bounding polygon in the query image, and contains the matching products specific to that region. There may be duplicate product matches in the union of all the per-product results. |
Result
Information about a product.
JSON representation |
---|
{
"product": {
object ( |
Fields | |
---|---|
product |
The Product. |
score |
A confidence level on the match, ranging from 0 (no confidence) to 1 (full confidence). |
image |
The resource name of the image from the product that is the closest match to the query. |
GroupedResult
Information about the products similar to a single product in a query image.
JSON representation |
---|
{ "boundingPoly": { object ( |
Fields | |
---|---|
boundingPoly |
The bounding polygon around the product detected in the query image. |
results[] |
List of results, one for each product match. |
objectAnnotations[] |
List of generic predictions for the object in the bounding box. |
ObjectAnnotation
Prediction for what the object in the bounding box is.
JSON representation |
---|
{ "mid": string, "languageCode": string, "name": string, "score": number } |
Fields | |
---|---|
mid |
Object ID that should align with EntityAnnotation mid. |
languageCode |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
name |
Object name, expressed in its |
score |
Score of the result. Range [0, 1]. |
ImageAnnotationContext
If an image was produced from a file (e.g. a PDF), this message gives information about the source of that image.
JSON representation |
---|
{ "uri": string, "pageNumber": integer } |
Fields | |
---|---|
uri |
The URI of the file used to produce the image. |
pageNumber |
If the file was a PDF or TIFF, this field gives the page number within the file used to produce the image. |