Index
DocumentProcessorService
(interface)BatchProcessMetadata
(message)BatchProcessMetadata.IndividualProcessStatus
(message)BatchProcessMetadata.State
(enum)BatchProcessRequest
(message)BatchProcessRequest.BatchInputConfig
(message)BatchProcessRequest.BatchOutputConfig
(message)BatchProcessResponse
(message)BoundingPoly
(message)Document
(message)Document.Entity
(message)Document.Entity.NormalizedValue
(message)Document.EntityRelation
(message)Document.Page
(message)Document.Page.Block
(message)Document.Page.DetectedLanguage
(message)Document.Page.Dimension
(message)Document.Page.FormField
(message)Document.Page.Image
(message)Document.Page.Layout
(message)Document.Page.Layout.Orientation
(enum)Document.Page.Line
(message)Document.Page.Matrix
(message)Document.Page.Paragraph
(message)Document.Page.Table
(message)Document.Page.Table.TableCell
(message)Document.Page.Table.TableRow
(message)Document.Page.Token
(message)Document.Page.Token.DetectedBreak
(message)Document.Page.Token.DetectedBreak.Type
(enum)Document.Page.VisualElement
(message)Document.PageAnchor
(message)Document.PageAnchor.PageRef
(message)Document.PageAnchor.PageRef.LayoutType
(enum)Document.Provenance
(message)Document.Provenance.OperationType
(enum)Document.Provenance.Parent
(message)Document.Revision
(message)Document.Revision.HumanReview
(message)Document.ShardInfo
(message)Document.Style
(message)Document.Style.FontSize
(message)Document.TextAnchor
(message)Document.TextAnchor.TextSegment
(message)Document.TextChange
(message)Document.Translation
(message)NormalizedVertex
(message)ProcessRequest
(message)ProcessResponse
(message)ReviewDocumentOperationMetadata
(message)ReviewDocumentOperationMetadata.State
(enum)ReviewDocumentRequest
(message)ReviewDocumentResponse
(message)Vertex
(message)
DocumentProcessorService
Service to call Cloud DocumentAI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.
BatchProcessDocuments | |
---|---|
LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
|
ProcessDocument | |
---|---|
Processes a single document.
|
ReviewDocument | |
---|---|
Send a document for Human Review. The input document should be processed by the specified processor.
|
BatchProcessMetadata
The long running operation metadata for batch process method.
Fields | |
---|---|
state |
The state of the current batch processing. |
state_message |
A message providing more details about the current state of processing. For example, the error message if the operation is failed. |
create_time |
The creation time of the operation. |
update_time |
The last update time of the operation. |
individual_process_statuses[] |
The list of response details of each document. |
IndividualProcessStatus
The status of a each individual document in the batch process.
Fields | |
---|---|
input_gcs_source |
The source of the document, same as the [input_gcs_source] field in the request when the batch process started. The batch process is started by take snapshot of that document, since a user can move or change that document during the process. |
status |
The status of the processing of the document. |
output_gcs_destination |
The output_gcs_destination (in the request as 'output_gcs_destination') of the processed document if it was successful, otherwise empty. |
human_review_operation |
The name of the operation triggered by the processed document. If the human review process is not triggered, this field will be empty. It has the same response type and metadata as the long running operation returned by ReviewDocument method. |
State
Possible states of the batch processing operation.
Enums | |
---|---|
STATE_UNSPECIFIED |
The default value. This value is used if the state is omitted. |
WAITING |
Request operation is waiting for scheduling. |
RUNNING |
Request is being processed. |
SUCCEEDED |
The batch processing completed successfully. |
CANCELLING |
The batch processing was being cancelled. |
CANCELLED |
The batch processing was cancelled. |
FAILED |
The batch processing has failed. |
BatchProcessRequest
Request message for batch process document method.
Fields | |
---|---|
name |
Required. The processor resource name. Authorization requires the following IAM permission on the specified resource
|
input_configs[] |
The input config for each single document in the batch process. |
output_config |
The overall output config for batch process. |
BatchInputConfig
The message for input config in batch process.
Fields | |
---|---|
gcs_source |
The Cloud Storage location as the source of the document. |
mime_type |
Mimetype of the input. If the input is a raw document, the supported mimetypes are application/pdf, image/tiff, and image/gif. If the input is a [Document] proto, the type should be application/json. |
BatchOutputConfig
The message for output config in batch process.
Fields | |
---|---|
gcs_destination |
The output Cloud Storage directory to put the processed documents. |
BatchProcessResponse
Response message for batch process document method.
BoundingPoly
A bounding polygon for the detected image annotation.
Fields | |
---|---|
vertices[] |
The bounding polygon vertices. |
normalized_vertices[] |
The bounding polygon normalized vertices. |
Document
Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.
Fields | ||
---|---|---|
mime_type |
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml. |
|
text |
UTF-8 encoded text in reading order from the document. |
|
text_styles[] |
Styles for the |
|
pages[] |
Visual page layout for the |
|
entities[] |
A list of entities detected on |
|
entity_relations[] |
Relationship among |
|
translations[] |
A list of translations on |
|
text_changes[] |
A list of text corrections made to [Document.text]. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other. |
|
shard_info |
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified. |
|
error |
Any error that occurred while processing this document. |
|
revisions[] |
Revision history of this document. |
|
Union field source . Original source document from the user. source can be only one of the following: |
||
uri |
Currently supports Google Cloud Storage URI of the form |
|
content |
Inline document content, represented as a stream of bytes. Note: As with all |
Entity
A phrase in the text that is a known entity type, such as a person, an organization, or location.
Fields | |
---|---|
text_anchor |
Provenance of the entity. Text anchor indexing into the |
type |
Entity type from a schema e.g. |
mention_text |
Text value in the document e.g. |
mention_id |
Deprecated. Use |
confidence |
Optional. Confidence of detected Schema entity. Range [0, 1]. |
page_anchor |
Optional. Represents the provenance of this entity wrt. the location on the page where it was found. |
id |
Optional. Canonical id. This will be a unique value in the entity list for this document. |
normalized_value |
Optional. Normalized entity value. Absent if the extracted value could not be converted or the type (e.g. address) is not supported for certain parsers. This field is also only populated for certain supported document types. |
properties[] |
Optional. Entities can be nested to form a hierarchical data structure representing the content in the document. |
provenance |
Optional. The history of this annotation. |
redacted |
Optional. Whether the entity will be redacted for de-identification purposes. |
NormalizedValue
Parsed and normalized entity value.
Fields | ||
---|---|---|
text |
Required. Normalized entity value stored as a string. This field is populated for supported document type (e.g. Invoice). For some entity types, one of respective 'structured_value' fields may also be populated.
|
|
Union field structured_value . Structured entity value. Must match entity type defined in schema if known. If this field is present, the 'text' field is still populated. structured_value can be only one of the following: |
||
money_value |
Money value. See also: https: //github.com/googleapis/googleapis/blob/ // master/google/type/money.proto |
|
date_value |
Date value. Includes year, month, day. See also: https: //github.com/googleapis/googleapis/blob/master/google/type/date.proto |
|
datetime_value |
DateTime value. Includes date, time, and timezone. See also: https: //github.com/googleapis/googleapis/blob/ // master/google/type/datetime.proto |
|
address_value |
Postal address. See also: https: //github.com/googleapis/googleapis/blob/ // master/google/type/postal_address.proto |
EntityRelation
Relationship between Entities
.
Fields | |
---|---|
subject_id |
Subject entity id. |
object_id |
Object entity id. |
relation |
Relationship description. |
Page
A page in a Document
.
Fields | |
---|---|
page_number |
1-based index for current |
image |
Rendered image for this page. This image is preprocessed to remove any skew, rotation, and distortions such that the annotation bounding boxes can be upright and axis-aligned. |
transforms[] |
Transformation matrices that were applied to the original document image to produce |
dimension |
Physical dimension of the page. |
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
blocks[] |
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation. |
paragraphs[] |
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph. |
lines[] |
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line. |
tokens[] |
A list of visually detected tokens on the page. |
visual_elements[] |
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page. |
tables[] |
A list of visually detected tables on the page. |
form_fields[] |
A list of visually detected form fields on the page. |
Block
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
DetectedLanguage
Detected language for a structural component.
Fields | |
---|---|
language_code |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
confidence |
Confidence of detected language. Range [0, 1]. |
Dimension
Dimension for the page.
Fields | |
---|---|
width |
Page width. |
height |
Page height. |
unit |
Dimension unit. |
FormField
A form field detected on the page.
Fields | |
---|---|
field_name |
|
field_value |
|
name_detected_languages[] |
A list of detected languages for name together with confidence. |
value_detected_languages[] |
A list of detected languages for value together with confidence. |
value_type |
If the value is non-textual, this field represents the type. Current valid values are: - blank (this indicates the field_value is normal text) - "unfilled_checkbox" - "filled_checkbox" |
Image
Rendered image contents for this page.
Fields | |
---|---|
content |
Raw byte content of the image. |
mime_type |
Encoding mime type for the image. |
width |
Width of the image in pixels. |
height |
Height of the image in pixels. |
Layout
Visual element describing a layout unit on a page.
Fields | |
---|---|
text_anchor |
Text anchor indexing into the |
confidence |
Confidence of the current |
bounding_poly |
The bounding polygon for the |
orientation |
Detected orientation for the |
Orientation
Detected human reading orientation.
Enums | |
---|---|
ORIENTATION_UNSPECIFIED |
Unspecified orientation. |
PAGE_UP |
Orientation is aligned with page up. |
PAGE_RIGHT |
Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read. |
PAGE_DOWN |
Orientation is aligned with page down. Turn the head 180 degrees from upright to read. |
PAGE_LEFT |
Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read. |
Line
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
Matrix
Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
Fields | |
---|---|
rows |
Number of rows in the matrix. |
cols |
Number of columns in the matrix. |
type |
This encodes information about what data type the matrix uses. For example, 0 (CV_8U) is an unsigned 8-bit image. For the full list of OpenCV primitive data types, please refer to https://docs.opencv.org/4.3.0/d1/d1b/group__core__hal__interface.html |
data |
The matrix data. |
Paragraph
A collection of lines that a human would perceive as a paragraph.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
Table
A table representation similar to HTML table structure.
Fields | |
---|---|
layout |
|
header_rows[] |
Header rows of the table. |
body_rows[] |
Body rows of the table. |
detected_languages[] |
A list of detected languages together with confidence. |
TableCell
A cell representation inside the table.
Fields | |
---|---|
layout |
|
row_span |
How many rows this cell spans. |
col_span |
How many columns this cell spans. |
detected_languages[] |
A list of detected languages together with confidence. |
TableRow
A row of table cells.
Fields | |
---|---|
cells[] |
Cells that make up this row. |
Token
A detected token.
Fields | |
---|---|
layout |
|
detected_break |
Detected break at the end of a |
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
DetectedBreak
Detected break at the end of a Token
.
Fields | |
---|---|
type |
Detected break type. |
Type
Enum to denote the type of break found.
Enums | |
---|---|
TYPE_UNSPECIFIED |
Unspecified break type. |
SPACE |
A single whitespace. |
WIDE_SPACE |
A wider whitespace. |
HYPHEN |
A hyphen that indicates that a token has been split across lines. |
VisualElement
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
Fields | |
---|---|
layout |
|
type |
Type of the |
detected_languages[] |
A list of detected languages together with confidence. |
PageAnchor
Referencing the visual context of the entity in the Document.pages
. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.
Fields | |
---|---|
page_refs[] |
One or more references to visual page elements |
PageRef
Represents a weak reference to a page element within a document.
Fields | |
---|---|
page |
Required. Index into the |
layout_type |
Optional. The type of the layout element that is being referenced if any. |
layout_id |
Optional. Deprecated. Use |
bounding_poly |
Optional. Identifies the bounding polygon of a layout element on the page. |
LayoutType
The type of layout that is being referenced.
Enums | |
---|---|
LAYOUT_TYPE_UNSPECIFIED |
Layout Unspecified. |
BLOCK |
References a Page.blocks element. |
PARAGRAPH |
References a Page.paragraphs element. |
LINE |
References a Page.lines element. |
TOKEN |
References a Page.tokens element. |
VISUAL_ELEMENT |
References a Page.visual_elements element. |
TABLE |
Refrrences a Page.tables element. |
FORM_FIELD |
References a Page.form_fields element. |
Provenance
Structure to identify provenance relationships between annotations in different revisions.
Fields | |
---|---|
revision |
The index of the revision that produced this element. |
id |
The Id of this operation. Needs to be unique within the scope of the revision. |
parents[] |
References to the original elements that are replaced. |
type |
The type of provenance operation. |
OperationType
If a processor or agent does an explicit operation on existing elements.
Enums | |
---|---|
OPERATION_TYPE_UNSPECIFIED |
Operation type unspecified. |
ADD |
Add an element. Implicit if no parents are set for the provenance. |
REMOVE |
The element is removed. No parents should be set. |
REPLACE |
Explicitly replaces the element(s) identified by parents . |
EVAL_REQUESTED |
Element is requested for human review. |
EVAL_APPROVED |
Element is review and approved at human review, confidence will be set to 1.0 |
Parent
Structure for referencing parent provenances. When an element replaces one of more other elements parent references identify the elements that are replaced.
Fields | |
---|---|
revision |
The index of the [Document.revisions] identifying the parent revision. |
id |
The id of the parent provenance. |
Revision
Contains past or forward revisions of this document.
Fields | ||
---|---|---|
id |
Id of the revision. Unique within the context of the document. |
|
parent[] |
The revisions that this revision is based on. This can include one or more parent (when documents are merged.) This field represents the index into the |
|
create_time |
The time that the revision was created. |
|
human_review |
Human Review information of this revision. |
|
Union field source . Who/what made the change source can be only one of the following: |
||
agent |
If the change was made by a person specify the name or id of that person. |
|
processor |
If the annotation was made by processor identify the processor by its resource name. |
HumanReview
Human Review information of the document.
Fields | |
---|---|
state |
Human review state. e.g. |
state_message |
A message providing more details about the current state of processing. For example, the rejection reason when the state is |
ShardInfo
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Fields | |
---|---|
shard_index |
The 0-based index of this shard. |
shard_count |
Total number of shards. |
text_offset |
The index of the first character in |
Style
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
Fields | |
---|---|
text_anchor |
Text anchor indexing into the |
color |
Text color. |
background_color |
Text background color. |
font_weight |
Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp |
text_style |
Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp |
text_decoration |
Text decoration. Follows CSS standard. |
font_size |
Font size. |
FontSize
Font size with unit.
Fields | |
---|---|
size |
Font size for the text. |
unit |
Unit for the font size. Follows CSS naming (in, px, pt, etc.). |
TextAnchor
Text reference indexing into the Document.text
.
Fields | |
---|---|
text_segments[] |
The text segments from the |
content |
Contains the content of the text span so that users do not have to look it up in the text_segments. |
TextSegment
A text segment in the Document.text
. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset
Fields | |
---|---|
start_index |
|
end_index |
|
TextChange
This message is used for text changes aka. OCR corrections.
Fields | |
---|---|
text_anchor |
Provenance of the correction. Text anchor indexing into the |
changed_text |
The text that replaces the text identified in the |
provenance[] |
The history of this annotation. |
Translation
A translation of the text segment.
Fields | |
---|---|
text_anchor |
Provenance of the translation. Text anchor indexing into the |
language_code |
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see www.unicode.org/reports/tr35/#Unicode_locale_identifier. |
translated_text |
Text translated into the target language. |
provenance[] |
The history of this annotation. |
NormalizedVertex
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
Fields | |
---|---|
x |
X coordinate. |
y |
Y coordinate. |
ProcessRequest
Request message for the process document method.
Fields | |
---|---|
name |
Required. The processor resource name. Authorization requires the following IAM permission on the specified resource
|
document |
The document payload, the [content] and [mime_type] fields must be set. |
skip_human_review |
Whether Human Review feature should be skipped for this request. Default to false. |
ProcessResponse
Response message for the process document method.
Fields | |
---|---|
document |
The document payload, will populate fields based on the processor's behavior. |
human_review_operation |
The name of the operation triggered by the processed document. If the human review process is not triggered, this field will be empty. It has the same response type and metadata as the long running operation returned by ReviewDocument method. |
ReviewDocumentOperationMetadata
The long running operation metadata for review document method.
Fields | |
---|---|
state |
Used only when Operation.done is false. |
state_message |
A message providing more details about the current state of processing. For example, the error message if the operation is failed. |
create_time |
The creation time of the operation. |
update_time |
The last update time of the operation. |
State
State of the longrunning operation.
Enums | |
---|---|
STATE_UNSPECIFIED |
Unspecified state. |
RUNNING |
Operation is still running. |
CANCELLING |
Operation is being cancelled. |
SUCCEEDED |
Operation succeeded. |
FAILED |
Operation failed. |
CANCELLED |
Operation is cancelled. |
ReviewDocumentRequest
Request message for review document method.
Fields | |
---|---|
human_review_config |
Required. The resource name of the HumanReviewConfig that the document will be reviewed with. Authorization requires the following IAM permission on the specified resource
|
document |
The document that needs human review. |
ReviewDocumentResponse
Response message for review document method.
Fields | |
---|---|
gcs_destination |
The Cloud Storage uri for the human reviewed document. |
Vertex
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
Fields | |
---|---|
x |
X coordinate. |
y |
Y coordinate. |