- JSON representation
- Style
- TextAnchor
- TextSegment
- Color
- FontSize
- Page
- Dimension
- Layout
- Orientation
- DetectedLanguage
- Block
- Paragraph
- Line
- Token
- DetectedBreak
- Type
- VisualElement
- Table
- TableRow
- TableCell
- FormField
- Entity
- NormalizedValue
- Money
- Date
- DateTime
- TimeZone
- EntityRelation
- Translation
- ShardInfo
- Label
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
JSON representation | |
---|---|
{ "mimeType": string, "text": string, "textStyles": [ { object ( |
Fields | ||
---|---|---|
mimeType |
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml. |
|
text |
UTF-8 encoded text in reading order from the document. |
|
textStyles[] |
Styles for the |
|
pages[] |
Visual page layout for the |
|
entities[] |
A list of entities detected on |
|
entityRelations[] |
Relationship among |
|
translations[] |
A list of translations on |
|
shardInfo |
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified. |
|
labels[] |
|
|
error |
Any error that occurred while processing this document. |
|
Union field source . Original source document from the user. source can be only one of the following: |
||
uri |
Currently supports Google Cloud Storage URI of the form |
|
content |
Inline document content, represented as a stream of bytes. Note: As with all A base64-encoded string. |
Style
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
JSON representation | |
---|---|
{ "textAnchor": { object ( |
Fields | |
---|---|
textAnchor |
Text anchor indexing into the |
color |
Text color. |
backgroundColor |
Text background color. |
fontWeight |
Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp |
textStyle |
Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp |
textDecoration |
Text decoration. Follows CSS standard. |
fontSize |
Font size. |
TextAnchor
Text reference indexing into the Document.text
.
JSON representation | |
---|---|
{
"textSegments": [
{
object ( |
Fields | |
---|---|
textSegments[] |
The text segments from the |
TextSegment
A text segment in the Document.text
. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset
JSON representation | |
---|---|
{ "startIndex": string, "endIndex": string } |
Fields | |
---|---|
startIndex |
|
endIndex |
|
Color
Represents a color in the RGBA color space. This representation is designed for simplicity of conversion to/from color representations in various languages over compactness; for example, the fields of this representation can be trivially provided to the constructor of "java.awt.Color" in Java; it can also be trivially provided to UIColor's "+colorWithRed:green:blue:alpha" method in iOS; and, with just a little work, it can be easily formatted into a CSS "rgba()" string in JavaScript, as well.
Note: this proto does not carry information about the absolute color space that should be used to interpret the RGB value (e.g. sRGB, Adobe RGB, DCI-P3, BT.2020, etc.). By default, applications SHOULD assume the sRGB color space.
Note: when color equality needs to be decided, implementations, unless documented otherwise, will treat two colors to be equal if all their red, green, blue and alpha values each differ by at most 1e-5.
Example (Java):
import com.google.type.Color;
// ...
public static java.awt.Color fromProto(Color protocolor) {
float alpha = protocolor.hasAlpha()
? protocolor.getAlpha().getValue()
: 1.0;
return new java.awt.Color(
protocolor.getRed(),
protocolor.getGreen(),
protocolor.getBlue(),
alpha);
}
public static Color toProto(java.awt.Color color) {
float red = (float) color.getRed();
float green = (float) color.getGreen();
float blue = (float) color.getBlue();
float denominator = 255.0;
Color.Builder resultBuilder =
Color
.newBuilder()
.setRed(red / denominator)
.setGreen(green / denominator)
.setBlue(blue / denominator);
int alpha = color.getAlpha();
if (alpha != 255) {
result.setAlpha(
FloatValue
.newBuilder()
.setValue(((float) alpha) / denominator)
.build());
}
return resultBuilder.build();
}
// ...
Example (iOS / Obj-C):
// ...
static UIColor* fromProto(Color* protocolor) {
float red = [protocolor red];
float green = [protocolor green];
float blue = [protocolor blue];
FloatValue* alpha_wrapper = [protocolor alpha];
float alpha = 1.0;
if (alpha_wrapper != nil) {
alpha = [alpha_wrapper value];
}
return [UIColor colorWithRed:red green:green blue:blue alpha:alpha];
}
static Color* toProto(UIColor* color) {
CGFloat red, green, blue, alpha;
if (![color getRed:&red green:&green blue:&blue alpha:&alpha]) {
return nil;
}
Color* result = [[Color alloc] init];
[result setRed:red];
[result setGreen:green];
[result setBlue:blue];
if (alpha <= 0.9999) {
[result setAlpha:floatWrapperWithValue(alpha)];
}
[result autorelease];
return result;
}
// ...
Example (JavaScript):
// ...
var protoToCssColor = function(rgb_color) {
var redFrac = rgb_color.red || 0.0;
var greenFrac = rgb_color.green || 0.0;
var blueFrac = rgb_color.blue || 0.0;
var red = Math.floor(redFrac * 255);
var green = Math.floor(greenFrac * 255);
var blue = Math.floor(blueFrac * 255);
if (!('alpha' in rgb_color)) {
return rgbToCssColor_(red, green, blue);
}
var alphaFrac = rgb_color.alpha.value || 0.0;
var rgbParams = [red, green, blue].join(',');
return ['rgba(', rgbParams, ',', alphaFrac, ')'].join('');
};
var rgbToCssColor_ = function(red, green, blue) {
var rgbNumber = new Number((red << 16) | (green << 8) | blue);
var hexString = rgbNumber.toString(16);
var missingZeros = 6 - hexString.length;
var resultBuilder = ['#'];
for (var i = 0; i < missingZeros; i++) {
resultBuilder.push('0');
}
resultBuilder.push(hexString);
return resultBuilder.join('');
};
// ...
JSON representation | |
---|---|
{ "red": number, "green": number, "blue": number, "alpha": number } |
Fields | |
---|---|
red |
The amount of red in the color as a value in the interval [0, 1]. |
green |
The amount of green in the color as a value in the interval [0, 1]. |
blue |
The amount of blue in the color as a value in the interval [0, 1]. |
alpha |
The fraction of this color that should be applied to the pixel. That is, the final pixel color is defined by the equation: pixel color = alpha * (this color) + (1.0 - alpha) * (background color) This means that a value of 1.0 corresponds to a solid color, whereas a value of 0.0 corresponds to a completely transparent color. This uses a wrapper message rather than a simple float scalar so that it is possible to distinguish between a default value and the value being unset. If omitted, this color object is to be rendered as a solid color (as if the alpha value had been explicitly given with a value of 1.0). |
FontSize
Font size with unit.
JSON representation | |
---|---|
{ "size": number, "unit": string } |
Fields | |
---|---|
size |
Font size for the text. |
unit |
Unit for the font size. Follows CSS naming (in, px, pt, etc.). |
Page
A page in a Document
.
JSON representation | |
---|---|
{ "pageNumber": integer, "dimension": { object ( |
Fields | |
---|---|
pageNumber |
1-based index for current |
dimension |
Physical dimension of the page. |
layout |
|
detectedLanguages[] |
A list of detected languages together with confidence. |
blocks[] |
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation. |
paragraphs[] |
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph. |
lines[] |
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line. |
tokens[] |
A list of visually detected tokens on the page. |
visualElements[] |
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page. |
tables[] |
A list of visually detected tables on the page. |
formFields[] |
A list of visually detected form fields on the page. |
Dimension
Dimension for the page.
JSON representation | |
---|---|
{ "width": number, "height": number, "unit": string } |
Fields | |
---|---|
width |
Page width. |
height |
Page height. |
unit |
Dimension unit. |
Layout
Visual element describing a layout unit on a page.
JSON representation | |
---|---|
{ "textAnchor": { object ( |
Fields | |
---|---|
textAnchor |
Text anchor indexing into the |
confidence |
Confidence of the current |
boundingPoly |
The bounding polygon for the |
orientation |
Detected orientation for the |
Orientation
Detected human reading orientation.
Enums | |
---|---|
ORIENTATION_UNSPECIFIED |
Unspecified orientation. |
PAGE_UP |
Orientation is aligned with page up. |
PAGE_RIGHT |
Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read. |
PAGE_DOWN |
Orientation is aligned with page down. Turn the head 180 degrees from upright to read. |
PAGE_LEFT |
Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read. |
DetectedLanguage
Detected language for a structural component.
JSON representation | |
---|---|
{ "languageCode": string, "confidence": number } |
Fields | |
---|---|
languageCode |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
confidence |
Confidence of detected language. Range [0, 1]. |
Block
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
detectedLanguages[] |
A list of detected languages together with confidence. |
Paragraph
A collection of lines that a human would perceive as a paragraph.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
detectedLanguages[] |
A list of detected languages together with confidence. |
Line
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
detectedLanguages[] |
A list of detected languages together with confidence. |
Token
A detected token.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
detectedBreak |
Detected break at the end of a |
detectedLanguages[] |
A list of detected languages together with confidence. |
DetectedBreak
Detected break at the end of a Token
.
JSON representation | |
---|---|
{
"type": enum ( |
Fields | |
---|---|
type |
Detected break type. |
Type
Enum to denote the type of break found.
Enums | |
---|---|
TYPE_UNSPECIFIED |
Unspecified break type. |
SPACE |
A single whitespace. |
WIDE_SPACE |
A wider whitespace. |
HYPHEN |
A hyphen that indicates that a token has been split across lines. |
VisualElement
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
type |
Type of the |
detectedLanguages[] |
A list of detected languages together with confidence. |
Table
A table representation similar to HTML table structure.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
headerRows[] |
Header rows of the table. |
bodyRows[] |
Body rows of the table. |
detectedLanguages[] |
A list of detected languages together with confidence. |
TableRow
A row of table cells.
JSON representation | |
---|---|
{
"cells": [
{
object ( |
Fields | |
---|---|
cells[] |
Cells that make up this row. |
TableCell
A cell representation inside the table.
JSON representation | |
---|---|
{ "layout": { object ( |
Fields | |
---|---|
layout |
|
rowSpan |
How many rows this cell spans. |
colSpan |
How many columns this cell spans. |
detectedLanguages[] |
A list of detected languages together with confidence. |
FormField
A form field detected on the page.
JSON representation | |
---|---|
{ "fieldName": { object ( |
Fields | |
---|---|
fieldName |
|
fieldValue |
|
nameDetectedLanguages[] |
A list of detected languages for name together with confidence. |
valueDetectedLanguages[] |
A list of detected languages for value together with confidence. |
valueType |
If the value is non-textual, this field represents the type. Current valid values are: - blank (this indicates the fieldValue is normal text) - "unfilled_checkbox" - "filled_checkbox" |
Entity
A phrase in the text that is a known entity type, such as a person, an organization, or location.
JSON representation | |
---|---|
{ "textAnchor": { object ( |
Fields | |
---|---|
textAnchor |
Provenance of the entity. Text anchor indexing into the |
type |
Entity type from a schema e.g. |
mentionText |
Text value in the document e.g. |
mentionId |
Deprecated. Use |
confidence |
Optional. Confidence of detected Schema entity. Range [0, 1]. |
normalizedValue |
Optional. Normalized entity value. Absent if the extracted value could not be converted or the type (e.g. address) is not supported for certain parsers. This field is also only populated for certain supported document types. |
redacted |
Optional. Whether the entity will be redacted for de-identification purposes. |
NormalizedValue
Parsed and normalized entity value.
JSON representation | |
---|---|
{ "text": string, // Union field |
Fields | ||
---|---|---|
text |
Required. Normalized entity value stored as a string. This field is populated for supported document type (e.g. Invoice). For some entity types, one of respective 'structured_value' fields may also be populated.
|
|
Union field structured_value . Structured entity value. Must match entity type defined in schema if known. If this field is present, the 'text' field is still populated. structured_value can be only one of the following: |
||
moneyValue |
Money value. See also: https://github.com/googleapis/googleapis/blob/master/google/type/money.proto |
|
dateValue |
Date value. Includes year, month, day. See also: https://github.com/googleapis/googleapis/blob/master/google/type/date.proto |
|
datetimeValue |
DateTime value. Includes date, time, and timezone. See also: https://github.com/googleapis/googleapis/blob/master/google/type/date.proto |
Money
Represents an amount of money with its currency type.
JSON representation | |
---|---|
{ "currencyCode": string, "units": string, "nanos": integer } |
Fields | |
---|---|
currencyCode |
The 3-letter currency code defined in ISO 4217. |
units |
The whole units of the amount. For example if |
nanos |
Number of nano (10^-9) units of the amount. The value must be between -999,999,999 and +999,999,999 inclusive. If |
Date
Represents a whole or partial calendar date, e.g. a birthday. The time of day and time zone are either specified elsewhere or are not significant. The date is relative to the Proleptic Gregorian Calendar. This can represent:
- A full date, with non-zero year, month and day values
- A month and day value, with a zero year, e.g. an anniversary
- A year on its own, with zero month and day values
- A year and month value, with a zero day, e.g. a credit card expiration date
Related types are google.type.TimeOfDay
and google.protobuf.Timestamp
.
JSON representation | |
---|---|
{ "year": integer, "month": integer, "day": integer } |
Fields | |
---|---|
year |
Year of date. Must be from 1 to 9999, or 0 if specifying a date without a year. |
month |
Month of year. Must be from 1 to 12, or 0 if specifying a year without a month and day. |
day |
Day of month. Must be from 1 to 31 and valid for the year and month, or 0 if specifying a year by itself or a year and month where the day is not significant. |
DateTime
Represents civil time in one of a few possible ways:
- When utcOffset is set and timeZone is unset: a civil time on a calendar day with a particular offset from UTC.
- When timeZone is set and utcOffset is unset: a civil time on a calendar day in a particular time zone.
- When neither timeZone nor utcOffset is set: a civil time on a calendar day in local time.
The date is relative to the Proleptic Gregorian Calendar.
If year is 0, the DateTime is considered not to have a specific year. month and day must have valid, non-zero values.
This type is more flexible than some applications may want. Make sure to document and validate your application's limitations.
JSON representation | |
---|---|
{ "year": integer, "month": integer, "day": integer, "hours": integer, "minutes": integer, "seconds": integer, "nanos": integer, // Union field |
Fields | ||
---|---|---|
year |
Optional. Year of date. Must be from 1 to 9999, or 0 if specifying a datetime without a year. |
|
month |
Required. Month of year. Must be from 1 to 12. |
|
day |
Required. Day of month. Must be from 1 to 31 and valid for the year and month. |
|
hours |
Required. Hours of day in 24 hour format. Should be from 0 to 23. An API may choose to allow the value "24:00:00" for scenarios like business closing time. |
|
minutes |
Required. Minutes of hour of day. Must be from 0 to 59. |
|
seconds |
Required. Seconds of minutes of the time. Must normally be from 0 to 59. An API may allow the value 60 if it allows leap-seconds. |
|
nanos |
Required. Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999. |
|
Union field time_offset . Optional. Specifies either the UTC offset or the time zone of the DateTime. Choose carefully between them, considering that time zone data may change in the future (for example, a country modifies their DST start/end dates, and future DateTimes in the affected range had already been stored). If omitted, the DateTime is considered to be in local time. time_offset can be only one of the following: |
||
utcOffset |
UTC offset. Must be whole seconds, between -18 hours and +18 hours. For example, a UTC offset of -4:00 would be represented as { seconds: -14400 }. A duration in seconds with up to nine fractional digits, terminated by ' |
|
timeZone |
Time zone. |
TimeZone
Represents a time zone from the IANA Time Zone Database.
JSON representation | |
---|---|
{ "id": string, "version": string } |
Fields | |
---|---|
id |
IANA Time Zone Database time zone, e.g. "America/New_York". |
version |
Optional. IANA Time Zone Database version number, e.g. "2019a". |
EntityRelation
Relationship between Entities
.
JSON representation | |
---|---|
{ "subjectId": string, "objectId": string, "relation": string } |
Fields | |
---|---|
subjectId |
Subject entity id. |
objectId |
Object entity id. |
relation |
Relationship description. |
Translation
A translation of the text segment.
JSON representation | |
---|---|
{
"textAnchor": {
object ( |
Fields | |
---|---|
textAnchor |
Provenance of the translation. Text anchor indexing into the |
languageCode |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
translatedText |
Text translated into the target language. |
ShardInfo
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
JSON representation | |
---|---|
{ "shardIndex": string, "shardCount": string, "textOffset": string } |
Fields | |
---|---|
shardIndex |
The 0-based index of this shard. |
shardCount |
Total number of shards. |
textOffset |
The index of the first character in |
Label
Label attaches schema information and/or other metadata to segments within a Document
. Multiple Label
s on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.
JSON representation | |
---|---|
{ "name": string, "confidence": number, "automlModel": string } |
Fields | |
---|---|
name |
Name of the label. When the label is generated from AutoML Text Classification model, this field represents the name of the category. |
confidence |
Confidence score between 0 and 1 for label assignment. |
automlModel |
Label is generated AutoML model. This field stores the full resource name of the AutoML model. Format: |