Document

Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

JSON representation
{
  "mimeType": string,
  "text": string,
  "textStyles": [
    {
      object (Style)
    }
  ],
  "pages": [
    {
      object (Page)
    }
  ],
  "entities": [
    {
      object (Entity)
    }
  ],
  "entityRelations": [
    {
      object (EntityRelation)
    }
  ],
  "shardInfo": {
    object (ShardInfo)
  },
  "error": {
    object (Status)
  },

  // Union field source can be only one of the following:
  "uri": string,
  "content": string
  // End of list of possible types for union field source.
}
Fields
mimeType

string

An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml.

text

string

UTF-8 encoded text in reading order from the document.

textStyles[]

object (Style)

Styles for the Document.text.

pages[]

object (Page)

Visual page layout for the Document.

entities[]

object (Entity)

A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries.

entityRelations[]

object (EntityRelation)

Relationship among Document.entities.

shardInfo

object (ShardInfo)

Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.

error

object (Status)

Any error that occurred while processing this document.

Union field source. Original source document from the user. source can be only one of the following:
uri

string

Currently supports Google Cloud Storage URI of the form gs://bucket_name/object_name. Object versioning is not supported. See Google Cloud Storage Request URIs for more info.

content

string (bytes format)

Inline document content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

Style

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

JSON representation
{
  "textAnchor": {
    object (TextAnchor)
  },
  "color": {
    object (Color)
  },
  "backgroundColor": {
    object (Color)
  },
  "fontWeight": string,
  "textStyle": string,
  "textDecoration": string,
  "fontSize": {
    object (FontSize)
  }
}
Fields
textAnchor

object (TextAnchor)

Text anchor indexing into the Document.text.

color

object (Color)

Text color.

backgroundColor

object (Color)

Text background color.

fontWeight

string

Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp

textStyle

string

Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp

textDecoration

string

Text decoration. Follows CSS standard. https://www.w3schools.com/cssref/pr_text_text-decoration.asp

fontSize

object (FontSize)

Font size.

TextAnchor

Text reference indexing into the Document.text.

JSON representation
{
  "textSegments": [
    {
      object (TextSegment)
    }
  ]
}
Fields
textSegments[]

object (TextSegment)

The text segments from the Document.text.

TextSegment

A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset

JSON representation
{
  "startIndex": string,
  "endIndex": string
}
Fields
startIndex

string (int64 format)

TextSegment start UTF-8 char index in the Document.text.

endIndex

string (int64 format)

TextSegment half open end UTF-8 char index in the Document.text.

Color

Represents a color in the RGBA color space. This representation is designed for simplicity of conversion to/from color representations in various languages over compactness; for example, the fields of this representation can be trivially provided to the constructor of "java.awt.Color" in Java; it can also be trivially provided to UIColor's "+colorWithRed:green:blue:alpha" method in iOS; and, with just a little work, it can be easily formatted into a CSS "rgba()" string in JavaScript, as well.

Note: this proto does not carry information about the absolute color space that should be used to interpret the RGB value (e.g. sRGB, Adobe RGB, DCI-P3, BT.2020, etc.). By default, applications SHOULD assume the sRGB color space.

Example (Java):

 import com.google.type.Color;

 // ...
 public static java.awt.Color fromProto(Color protocolor) {
   float alpha = protocolor.hasAlpha()
       ? protocolor.getAlpha().getValue()
       : 1.0;

   return new java.awt.Color(
       protocolor.getRed(),
       protocolor.getGreen(),
       protocolor.getBlue(),
       alpha);
 }

 public static Color toProto(java.awt.Color color) {
   float red = (float) color.getRed();
   float green = (float) color.getGreen();
   float blue = (float) color.getBlue();
   float denominator = 255.0;
   Color.Builder resultBuilder =
       Color
           .newBuilder()
           .setRed(red / denominator)
           .setGreen(green / denominator)
           .setBlue(blue / denominator);
   int alpha = color.getAlpha();
   if (alpha != 255) {
     result.setAlpha(
         FloatValue
             .newBuilder()
             .setValue(((float) alpha) / denominator)
             .build());
   }
   return resultBuilder.build();
 }
 // ...

Example (iOS / Obj-C):

 // ...
 static UIColor* fromProto(Color* protocolor) {
    float red = [protocolor red];
    float green = [protocolor green];
    float blue = [protocolor blue];
    FloatValue* alpha_wrapper = [protocolor alpha];
    float alpha = 1.0;
    if (alpha_wrapper != nil) {
      alpha = [alpha_wrapper value];
    }
    return [UIColor colorWithRed:red green:green blue:blue alpha:alpha];
 }

 static Color* toProto(UIColor* color) {
     CGFloat red, green, blue, alpha;
     if (![color getRed:&red green:&green blue:&blue alpha:&alpha]) {
       return nil;
     }
     Color* result = [[Color alloc] init];
     [result setRed:red];
     [result setGreen:green];
     [result setBlue:blue];
     if (alpha <= 0.9999) {
       [result setAlpha:floatWrapperWithValue(alpha)];
     }
     [result autorelease];
     return result;
}
// ...

Example (JavaScript):

// ...

var protoToCssColor = function(rgb_color) {
   var redFrac = rgb_color.red || 0.0;
   var greenFrac = rgb_color.green || 0.0;
   var blueFrac = rgb_color.blue || 0.0;
   var red = Math.floor(redFrac * 255);
   var green = Math.floor(greenFrac * 255);
   var blue = Math.floor(blueFrac * 255);

   if (!('alpha' in rgb_color)) {
      return rgbToCssColor_(red, green, blue);
   }

   var alphaFrac = rgb_color.alpha.value || 0.0;
   var rgbParams = [red, green, blue].join(',');
   return ['rgba(', rgbParams, ',', alphaFrac, ')'].join('');
};

var rgbToCssColor_ = function(red, green, blue) {
  var rgbNumber = new Number((red << 16) | (green << 8) | blue);
  var hexString = rgbNumber.toString(16);
  var missingZeros = 6 - hexString.length;
  var resultBuilder = ['#'];
  for (var i = 0; i < missingZeros; i++) {
     resultBuilder.push('0');
  }
  resultBuilder.push(hexString);
  return resultBuilder.join('');
};

// ...
JSON representation
{
  "red": number,
  "green": number,
  "blue": number,
  "alpha": number
}
Fields
red

number

The amount of red in the color as a value in the interval [0, 1].

green

number

The amount of green in the color as a value in the interval [0, 1].

blue

number

The amount of blue in the color as a value in the interval [0, 1].

alpha

number

The fraction of this color that should be applied to the pixel. That is, the final pixel color is defined by the equation:

pixel color = alpha * (this color) + (1.0 - alpha) * (background color)

This means that a value of 1.0 corresponds to a solid color, whereas a value of 0.0 corresponds to a completely transparent color. This uses a wrapper message rather than a simple float scalar so that it is possible to distinguish between a default value and the value being unset. If omitted, this color object is to be rendered as a solid color (as if the alpha value had been explicitly given with a value of 1.0).

FontSize

Font size with unit.

JSON representation
{
  "size": number,
  "unit": string
}
Fields
size

number

Font size for the text.

unit

string

Unit for the font size. Follows CSS naming (in, px, pt, etc.).

Page

A page in a Document.

JSON representation
{
  "pageNumber": integer,
  "dimension": {
    object (Dimension)
  },
  "layout": {
    object (Layout)
  },
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ],
  "blocks": [
    {
      object (Block)
    }
  ],
  "paragraphs": [
    {
      object (Paragraph)
    }
  ],
  "lines": [
    {
      object (Line)
    }
  ],
  "tokens": [
    {
      object (Token)
    }
  ],
  "visualElements": [
    {
      object (VisualElement)
    }
  ],
  "tables": [
    {
      object (Table)
    }
  ],
  "formFields": [
    {
      object (FormField)
    }
  ]
}
Fields
pageNumber

integer

1-based index for current Page in a parent Document. Useful when a page is taken out of a Document for individual processing.

dimension

object (Dimension)

Physical dimension of the page.

layout

object (Layout)

Layout for the page.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

blocks[]

object (Block)

A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

paragraphs[]

object (Paragraph)

A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.

lines[]

object (Line)

A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.

tokens[]

object (Token)

A list of visually detected tokens on the page.

visualElements[]

object (VisualElement)

A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.

tables[]

object (Table)

A list of visually detected tables on the page.

formFields[]

object (FormField)

A list of visually detected form fields on the page.

Dimension

Dimension for the page.

JSON representation
{
  "width": number,
  "height": number,
  "unit": string
}
Fields
width

number

Page width.

height

number

Page height.

unit

string

Dimension unit.

Layout

Visual element describing a layout unit on a page.

JSON representation
{
  "textAnchor": {
    object (TextAnchor)
  },
  "confidence": number,
  "boundingPoly": {
    object (BoundingPoly)
  },
  "orientation": enum (Orientation)
}
Fields
textAnchor

object (TextAnchor)

Text anchor indexing into the Document.text.

confidence

number

Confidence of the current Layout within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].

boundingPoly

object (BoundingPoly)

The bounding polygon for the Layout.

orientation

enum (Orientation)

Detected orientation for the Layout.

DetectedLanguage

Detected language for a structural component.

JSON representation
{
  "languageCode": string,
  "confidence": number
}
Fields
languageCode

string

The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

confidence

number

Confidence of detected language. Range [0, 1].

Block

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for Block.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

Paragraph

A collection of lines that a human would perceive as a paragraph.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for Paragraph.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

Line

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for Line.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

Token

A detected token.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "detectedBreak": {
    object (DetectedBreak)
  },
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for Token.

detectedBreak

object (DetectedBreak)

Detected break at the end of a Token.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

DetectedBreak

Detected break at the end of a Token.

JSON representation
{
  "type": enum (Type)
}
Fields
type

enum (Type)

Detected break type.

VisualElement

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "type": string,
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for VisualElement.

type

string

Type of the VisualElement.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

Table

A table representation similar to HTML table structure.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "headerRows": [
    {
      object (TableRow)
    }
  ],
  "bodyRows": [
    {
      object (TableRow)
    }
  ],
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for Table.

headerRows[]

object (TableRow)

Header rows of the table.

bodyRows[]

object (TableRow)

Body rows of the table.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

TableRow

A row of table cells.

JSON representation
{
  "cells": [
    {
      object (TableCell)
    }
  ]
}
Fields
cells[]

object (TableCell)

Cells that make up this row.

TableCell

A cell representation inside the table.

JSON representation
{
  "layout": {
    object (Layout)
  },
  "rowSpan": integer,
  "colSpan": integer,
  "detectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
layout

object (Layout)

Layout for TableCell.

rowSpan

integer

How many rows this cell spans.

colSpan

integer

How many columns this cell spans.

detectedLanguages[]

object (DetectedLanguage)

A list of detected languages together with confidence.

FormField

A form field detected on the page.

JSON representation
{
  "fieldName": {
    object (Layout)
  },
  "fieldValue": {
    object (Layout)
  },
  "nameDetectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ],
  "valueDetectedLanguages": [
    {
      object (DetectedLanguage)
    }
  ]
}
Fields
fieldName

object (Layout)

Layout for the FormField name. e.g. Address, Email, Grand total, Phone number, etc.

fieldValue

object (Layout)

Layout for the FormField value.

nameDetectedLanguages[]

object (DetectedLanguage)

A list of detected languages for name together with confidence.

valueDetectedLanguages[]

object (DetectedLanguage)

A list of detected languages for value together with confidence.

Entity

A phrase in the text that is a known entity type, such as a person, an organization, or location.

JSON representation
{
  "textAnchor": {
    object (TextAnchor)
  },
  "type": string,
  "mentionText": string,
  "mentionId": string,
  "confidence": number
}
Fields
textAnchor

object (TextAnchor)

Provenance of the entity. Text anchor indexing into the Document.text.

type

string

Entity type from a schema e.g. Address.

mentionText

string

Text value in the document e.g. 1600 Amphitheatre Pkwy.

mentionId

string

Deprecated. Use id field instead.

confidence

number

Optional. Confidence of detected Schema entity. Range [0, 1].

EntityRelation

Relationship between Entities.

JSON representation
{
  "subjectId": string,
  "objectId": string,
  "relation": string
}
Fields
subjectId

string

Subject entity id.

objectId

string

Object entity id.

relation

string

Relationship description.

ShardInfo

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

JSON representation
{
  "shardIndex": string,
  "shardCount": string,
  "textOffset": string
}
Fields
shardIndex

string (int64 format)

The 0-based index of this shard.

shardCount

string (int64 format)

Total number of shards.

textOffset

string (int64 format)

The index of the first character in Document.text in the overall document global text.