Package google.cloud.documentai.v1beta2

Index

DocumentUnderstandingService

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.

BatchProcessDocuments

rpc BatchProcessDocuments(BatchProcessDocumentsRequest) returns (Operation)

LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.

ProcessDocument

rpc ProcessDocument(ProcessDocumentRequest) returns (Document)

Processes a single document.

AutoMlParams

Parameters to control AutoML model prediction behavior.

Fields
model

string

Resource name of the AutoML model.

Format: projects/{project-id}/locations/{location-id}/models/{model-id}.

BatchProcessDocumentsRequest

Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.

Fields
requests[]

ProcessDocumentRequest

Required. Individual requests for each document.

parent

string

Target project and location to make a call.

Format: projects/{project-id}/locations/{location-id}.

If no location is specified, a region will be chosen automatically.

BatchProcessDocumentsResponse

Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

Fields
responses[]

ProcessDocumentResponse

Responses for each individual document.

BoundingPoly

A bounding polygon for the detected image annotation.

Fields
vertices[]

Vertex

The bounding polygon vertices.

normalized_vertices[]

NormalizedVertex

The bounding polygon normalized vertices.

Document

Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

Fields
mime_type

string

An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml.

text

string

UTF-8 encoded text in reading order from the document.

text_styles[]

Style

Styles for the Document.text.

pages[]

Page

Visual page layout for the Document.

entities[]

Entity

A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries.

entity_relations[]

EntityRelation

Relationship among Document.entities.

translations[]

Translation

A list of translations on Document.text. For document shards, translations in this list may cross shard boundaries.

shard_info

ShardInfo

Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.

labels[]

Label

Labels for this document.

error

Status

Any error that occurred while processing this document.

Union field source. Original source document from the user. source can be only one of the following:
uri

string

Currently supports Google Cloud Storage URI of the form gs://bucket_name/object_name. Object versioning is not supported. See Google Cloud Storage Request URIs for more info.

content

bytes

Inline document content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

Entity

A phrase in the text that is a known entity type, such as a person, an organization, or location.

Fields
text_anchor

TextAnchor

Provenance of the entity. Text anchor indexing into the Document.text.

type

string

Entity type from a schema e.g. Address.

mention_text

string

Text value in the document e.g. 1600 Amphitheatre Pkwy.

mention_id

string

Deprecated. Use id field instead.

confidence

float

Optional. Confidence of detected Schema entity. Range [0, 1].

normalized_value

NormalizedValue

Optional. Normalized entity value. Absent if the extracted value could not be converted or the type (e.g. address) is not supported for certain parsers. This field is also only populated for certain supported document types.

redacted

bool

Optional. Whether the entity will be redacted for de-identification purposes.

NormalizedValue

Parsed and normalized entity value.

Fields
text

string

Required. Normalized entity value stored as a string. This field is populated for supported document type (e.g. Invoice). For some entity types, one of respective 'structured_value' fields may also be populated.

  • Money/Currency type (money_value) is in the ISO 4217 text format.
  • Date type (date_value) is in the ISO 8601 text format.
  • Datetime type (datetime_value) is in the ISO 8601 text format.
Union field structured_value. Structured entity value. Must match entity type defined in schema if known. If this field is present, the 'text' field is still populated. structured_value can be only one of the following:
money_value

Money

Money value. See also:

https://github.com/googleapis/googleapis/blob/master/google/type/money.proto

date_value

Date

Date value. Includes year, month, day. See also:

https://github.com/googleapis/googleapis/blob/master/google/type/money.proto

datetime_value

DateTime

DateTime value. Includes date, time, and timezone. See also:

https://github.com/googleapis/googleapis/blob/master/google/type/datetime.proto

EntityRelation

Relationship between Entities.

Fields
subject_id

string

Subject entity id.

object_id

string

Object entity id.

relation

string

Relationship description.

Label

Label attaches schema information and/or other metadata to segments within a Document. Multiple Labels on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.

Fields
name

string

Name of the label.

When the label is generated from AutoML Text Classification model, this field represents the name of the category.

confidence

float

Confidence score between 0 and 1 for label assignment.

automl_model

string

Label is generated AutoML model. This field stores the full resource name of the AutoML model.

Format: projects/{project-id}/locations/{location-id}/models/{model-id}

Page

A page in a Document.

Fields
page_number

int32

1-based index for current Page in a parent Document. Useful when a page is taken out of a Document for individual processing.

dimension

Dimension

Physical dimension of the page.

layout

Layout

Layout for the page.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

blocks[]

Block

A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

paragraphs[]

Paragraph

A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.

lines[]

Line

A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.

tokens[]

Token

A list of visually detected tokens on the page.

visual_elements[]

VisualElement

A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.

tables[]

Table

A list of visually detected tables on the page.

form_fields[]

FormField

A list of visually detected form fields on the page.

Block

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

Fields
layout

Layout

Layout for Block.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

DetectedLanguage

Detected language for a structural component.

Fields
language_code

string

The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

confidence

float

Confidence of detected language. Range [0, 1].

Dimension

Dimension for the page.

Fields
width

float

Page width.

height

float

Page height.

unit

string

Dimension unit.

FormField

A form field detected on the page.

Fields
field_name

Layout

Layout for the FormField name. e.g. Address, Email, Grand total, Phone number, etc.

field_value

Layout

Layout for the FormField value.

name_detected_languages[]

DetectedLanguage

A list of detected languages for name together with confidence.

value_detected_languages[]

DetectedLanguage

A list of detected languages for value together with confidence.

value_type

string

If the value is non-textual, this field represents the type. Current valid values are: - blank (this indicates the field_value is normal text) - "unfilled_checkbox" - "filled_checkbox"

Layout

Visual element describing a layout unit on a page.

Fields
text_anchor

TextAnchor

Text anchor indexing into the Document.text.

confidence

float

Confidence of the current Layout within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].

bounding_poly

BoundingPoly

The bounding polygon for the Layout.

orientation

Orientation

Detected orientation for the Layout.

Orientation

Detected human reading orientation.

Enums
ORIENTATION_UNSPECIFIED Unspecified orientation.
PAGE_UP Orientation is aligned with page up.
PAGE_RIGHT Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read.
PAGE_DOWN Orientation is aligned with page down. Turn the head 180 degrees from upright to read.
PAGE_LEFT Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read.

Line

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Fields
layout

Layout

Layout for Line.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

Paragraph

A collection of lines that a human would perceive as a paragraph.

Fields
layout

Layout

Layout for Paragraph.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

Table

A table representation similar to HTML table structure.

Fields
layout

Layout

Layout for Table.

header_rows[]

TableRow

Header rows of the table.

body_rows[]

TableRow

Body rows of the table.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

TableCell

A cell representation inside the table.

Fields
layout

Layout

Layout for TableCell.

row_span

int32

How many rows this cell spans.

col_span

int32

How many columns this cell spans.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

TableRow

A row of table cells.

Fields
cells[]

TableCell

Cells that make up this row.

Token

A detected token.

Fields
layout

Layout

Layout for Token.

detected_break

DetectedBreak

Detected break at the end of a Token.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

DetectedBreak

Detected break at the end of a Token.

Fields
type

Type

Detected break type.

Type

Enum to denote the type of break found.

Enums
TYPE_UNSPECIFIED Unspecified break type.
SPACE A single whitespace.
WIDE_SPACE A wider whitespace.
HYPHEN A hyphen that indicates that a token has been split across lines.

VisualElement

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

Fields
layout

Layout

Layout for VisualElement.

type

string

Type of the VisualElement.

detected_languages[]

DetectedLanguage

A list of detected languages together with confidence.

ShardInfo

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

Fields
shard_index

int64

The 0-based index of this shard.

shard_count

int64

Total number of shards.

text_offset

int64

The index of the first character in Document.text in the overall document global text.

Style

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

Fields
text_anchor

TextAnchor

Text anchor indexing into the Document.text.

color

Color

Text color.

background_color

Color

Text background color.

font_weight

string

Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp

text_style

string

Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp

text_decoration

string

Text decoration. Follows CSS standard. https://www.w3schools.com/cssref/pr_text_text-decoration.asp

font_size

FontSize

Font size.

FontSize

Font size with unit.

Fields
size

float

Font size for the text.

unit

string

Unit for the font size. Follows CSS naming (in, px, pt, etc.).

TextAnchor

Text reference indexing into the Document.text.

Fields
text_segments[]

TextSegment

The text segments from the Document.text.

TextSegment

A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset

Fields
start_index

int64

TextSegment start UTF-8 char index in the Document.text.

end_index

int64

TextSegment half open end UTF-8 char index in the Document.text.

Translation

A translation of the text segment.

Fields
text_anchor

TextAnchor

Provenance of the translation. Text anchor indexing into the Document.text. There can only be a single TextAnchor.text_segments element. If the start and end index of the text segment are the same, the text change is inserted before that index.

language_code

string

The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

translated_text

string

Text translated into the target language.

EntityExtractionParams

Parameters to control entity extraction behavior.

Fields
enabled

bool

Whether to enable entity extraction.

model_version

string

Model version of the entity extraction. Default is "builtin/stable". Specify "builtin/latest" for the latest model.

FormExtractionParams

Parameters to control form extraction behavior.

Fields
enabled

bool

Whether to enable form extraction.

key_value_pair_hints[]

KeyValuePairHint

Reserved for future use.

model_version

string

Model version of the form extraction system. Default is "builtin/stable". Specify "builtin/latest" for the latest model. For custom form models, specify: "custom/{model_name}". Model name format is "bucket_name/path/to/modeldir" corresponding to "gs://bucket_name/path/to/modeldir" where annotated examples are stored.

GcsDestination

The Google Cloud Storage location where the output file will be written to.

Fields
uri

string

GcsSource

The Google Cloud Storage location where the input file will be read from.

Fields
uri

string

InputConfig

The desired input location and metadata.

Fields
mime_type

string

Required. Mimetype of the input. Current supported mimetypes are application/pdf, image/tiff, and image/gif. In addition, application/json type is supported for requests with ProcessDocumentRequest.automl_params field set. The JSON file needs to be in Document format.

Union field source. Required. source can be only one of the following:
gcs_source

GcsSource

The Google Cloud Storage location to read the input from. This must be a single file.

contents

bytes

Content in bytes, represented as a stream of bytes. Note: As with all bytes fields, proto buffer messages use a pure binary representation, whereas JSON representations use base64.

This field only works for synchronous ProcessDocument method.

KeyValuePairHint

Reserved for future use.

Fields
key

string

The key text for the hint.

value_types[]

string

Type of the value. This is case-insensitive, and could be one of: ADDRESS, LOCATION, ORGANIZATION, PERSON, PHONE_NUMBER, ID, NUMBER, EMAIL, PRICE, TERMS, DATE, NAME. Types not in this list will be ignored.

NormalizedVertex

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

Fields
x

float

X coordinate.

y

float

Y coordinate.

OcrParams

Parameters to control Optical Character Recognition (OCR) behavior.

Fields
language_hints[]

string

List of languages to use for OCR. In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting language_hints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Document processing returns an error if one or more of the specified languages is not one of the supported languages.

OperationMetadata

Contains metadata for the BatchProcessDocuments operation.

Fields
state

State

The state of the current batch processing.

state_message

string

A message providing more details about the current state of processing.

create_time

Timestamp

The creation time of the operation.

update_time

Timestamp

The last update time of the operation.

State

Enums
STATE_UNSPECIFIED The default value. This value is used if the state is omitted.
ACCEPTED Request is received.
WAITING Request operation is waiting for scheduling.
RUNNING Request is being processed.
SUCCEEDED The batch processing completed successfully.
CANCELLED The batch processing was cancelled.
FAILED The batch processing has failed.

OutputConfig

The desired output location and metadata.

Fields
pages_per_shard

int32

The max number of pages to include into each output Document shard JSON on Google Cloud Storage.

The valid range is [1, 100]. If not specified, the default value is 20.

For example, for one pdf file with 100 pages, 100 parsed pages will be produced. If pages_per_shard = 20, then 5 Document shard JSON files each containing 20 parsed pages will be written under the prefix [OutputConfig.gcs_destination.uri][] and suffix pages-x-to-y.json where x and y are 1-indexed page numbers.

Example Google Cloud Storage outputs with 157 pages and pages_per_shard = 50:

pages-001-to-050.json pages-051-to-100.json pages-101-to-150.json pages-151-to-157.json

gcs_destination

GcsDestination

The Google Cloud Storage location to write the output to.

ProcessDocumentRequest

Request to process one document.

Fields
parent

string

Target project and location to make a call.

Format: projects/{project-id}/locations/{location-id}.

If no location is specified, a region will be chosen automatically. This field is only populated when used in ProcessDocument method.

input_config

InputConfig

Required. Information about the input file.

output_config

OutputConfig

The desired output location. This field is only needed in BatchProcessDocumentsRequest.

document_type

string

Specifies a known document type for deeper structure detection. Valid values are currently "general" and "invoice". If not provided, "general"\ is used as default. If any other value is given, the request is rejected.

table_extraction_params

TableExtractionParams

Controls table extraction behavior. If not specified, the system will decide reasonable defaults.

form_extraction_params

FormExtractionParams

Controls form extraction behavior. If not specified, the system will decide reasonable defaults.

entity_extraction_params

EntityExtractionParams

Controls entity extraction behavior. If not specified, the system will decide reasonable defaults.

ocr_params

OcrParams

Controls OCR behavior. If not specified, the system will decide reasonable defaults.

automl_params

AutoMlParams

Controls AutoML model prediction behavior. AutoMlParams cannot be used together with other Params.

ProcessDocumentResponse

Response to a single document processing request.

Fields
input_config

InputConfig

Information about the input file. This is the same as the corresponding input config in the request.

output_config

OutputConfig

The output location of the parsed responses. The responses are written to this location as JSON-serialized Document objects.

TableBoundHint

A hint for a table bounding box on the page for table parsing.

Fields
page_number

int32

Optional. Page number for multi-paged inputs this hint applies to. If not provided, this hint will apply to all pages by default. This value is 1-based.

bounding_box

BoundingPoly

Bounding box hint for a table on this page. The coordinates must be normalized to [0,1] and the bounding box must be an axis-aligned rectangle.

TableExtractionParams

Parameters to control table extraction behavior.

Fields
enabled

bool

Whether to enable table extraction.

table_bound_hints[]

TableBoundHint

Optional. Table bounding box hints that can be provided to complex cases which our algorithm cannot locate the table(s) in.

header_hints[]

string

Optional. Reserved for future use.

model_version

string

Model version of the table extraction system. Default is "builtin/stable". Specify "builtin/latest" for the latest model.

Vertex

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

Fields
x

int32

X coordinate.

y

int32

Y coordinate.