Package Classes (2.34.0)

Summary of entries of Classes for documentai.

Classes

DocumentProcessorServiceAsyncClient

Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.

DocumentProcessorServiceClient

Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.

ListEvaluationsAsyncPager

A pager for iterating through list_evaluations requests.

This class thinly wraps an initial ListEvaluationsResponse object, and provides an __aiter__ method to iterate through its evaluations field.

If there are more pages, the __aiter__ method will make additional ListEvaluations requests and continue to iterate through the evaluations field on the corresponding responses.

All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListEvaluationsPager

A pager for iterating through list_evaluations requests.

This class thinly wraps an initial ListEvaluationsResponse object, and provides an __iter__ method to iterate through its evaluations field.

If there are more pages, the __iter__ method will make additional ListEvaluations requests and continue to iterate through the evaluations field on the corresponding responses.

All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorTypesAsyncPager

A pager for iterating through list_processor_types requests.

This class thinly wraps an initial ListProcessorTypesResponse object, and provides an __aiter__ method to iterate through its processor_types field.

If there are more pages, the __aiter__ method will make additional ListProcessorTypes requests and continue to iterate through the processor_types field on the corresponding responses.

All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorTypesPager

A pager for iterating through list_processor_types requests.

This class thinly wraps an initial ListProcessorTypesResponse object, and provides an __iter__ method to iterate through its processor_types field.

If there are more pages, the __iter__ method will make additional ListProcessorTypes requests and continue to iterate through the processor_types field on the corresponding responses.

All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorVersionsAsyncPager

A pager for iterating through list_processor_versions requests.

This class thinly wraps an initial ListProcessorVersionsResponse object, and provides an __aiter__ method to iterate through its processor_versions field.

If there are more pages, the __aiter__ method will make additional ListProcessorVersions requests and continue to iterate through the processor_versions field on the corresponding responses.

All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorVersionsPager

A pager for iterating through list_processor_versions requests.

This class thinly wraps an initial ListProcessorVersionsResponse object, and provides an __iter__ method to iterate through its processor_versions field.

If there are more pages, the __iter__ method will make additional ListProcessorVersions requests and continue to iterate through the processor_versions field on the corresponding responses.

All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorsAsyncPager

A pager for iterating through list_processors requests.

This class thinly wraps an initial ListProcessorsResponse object, and provides an __aiter__ method to iterate through its processors field.

If there are more pages, the __aiter__ method will make additional ListProcessors requests and continue to iterate through the processors field on the corresponding responses.

All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorsPager

A pager for iterating through list_processors requests.

This class thinly wraps an initial ListProcessorsResponse object, and provides an __iter__ method to iterate through its processors field.

If there are more pages, the __iter__ method will make additional ListProcessors requests and continue to iterate through the processors field on the corresponding responses.

All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

Barcode

Encodes the detailed information of a barcode.

BatchDocumentsInputConfig

The common config to specify a set of documents used as input.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchProcessMetadata

The long-running operation metadata for BatchProcessDocuments.

IndividualProcessStatus

The status of a each individual document in the batch process.

State

Possible states of the batch processing operation.

BatchProcessRequest

Request message for BatchProcessDocuments.

LabelsEntry

The abstract base class for a message.

BatchProcessResponse

Response message for BatchProcessDocuments.

BoundingPoly

A bounding polygon for the detected image annotation.

CommonOperationMetadata

The common metadata for long running operations.

State

State of the longrunning operation.

CreateProcessorRequest

Request message for the CreateProcessor method. Notice this request is sent to a regionalized backend service. If the ProcessorType isn't available in that region, the creation fails.

DeleteProcessorMetadata

The long-running operation metadata for the DeleteProcessor method.

DeleteProcessorRequest

Request message for the DeleteProcessor method.

DeleteProcessorVersionMetadata

The long-running operation metadata for the DeleteProcessorVersion method.

DeleteProcessorVersionRequest

Request message for the DeleteProcessorVersion method.

DeployProcessorVersionMetadata

The long-running operation metadata for the DeployProcessorVersion method.

DeployProcessorVersionRequest

Request message for the DeployProcessorVersion method.

DeployProcessorVersionResponse

Response message for the DeployProcessorVersion method.

DisableProcessorMetadata

The long-running operation metadata for the DisableProcessor method.

DisableProcessorRequest

Request message for the DisableProcessor method.

DisableProcessorResponse

Response message for the DisableProcessor method. Intentionally empty proto for adding fields in future.

Document

Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

ChunkedDocument

Represents the chunks that the document is divided into.

Chunk

Represents a chunk.

ChunkPageFooter

Represents the page footer associated with the chunk.

ChunkPageHeader

Represents the page header associated with the chunk.

ChunkPageSpan

Represents where the chunk starts and ends in the document.

DocumentLayout

Represents the parsed layout of a document as a collection of blocks that the document is divided into.

DocumentLayoutBlock

Represents a block. A block could be one of the various types (text, table, list) supported.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

LayoutListBlock

Represents a list type block.

LayoutListEntry

Represents an entry in the list.

LayoutPageSpan

Represents where the block starts and ends in the document.

LayoutTableBlock

Represents a table type block.

LayoutTableCell

Represents a cell in a table row.

LayoutTableRow

Represents a row in a table.

LayoutTextBlock

Represents a text type block.

Entity

An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.

NormalizedValue

Parsed and normalized entity value.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

EntityRelation

Relationship between Entities.

Page

A page in a Document.

Block

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

DetectedBarcode

A detected barcode.

DetectedLanguage

Detected language for a structural component.

Dimension

Dimension for the page.

FormField

A form field detected on the page.

Image

Rendered image contents for this page.

ImageQualityScores

Image quality scores for the page image.

DetectedDefect

Image Quality Defects

Layout

Visual element describing a layout unit on a page.

Orientation

Detected human reading orientation.

Line

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Matrix

Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.

Paragraph

A collection of lines that a human would perceive as a paragraph.

Symbol

A detected symbol.

Table

A table representation similar to HTML table structure.

TableCell

A cell representation inside the table.

TableRow

A row of table cells.

Token

A detected token.

DetectedBreak

Detected break at the end of a Token.

Type

Enum to denote the type of break found.

StyleInfo

Font and other text style attributes.

VisualElement

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

PageAnchor

Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

PageRef

Represents a weak reference to a page element within a document.

LayoutType

The type of layout that is being referenced.

Provenance

Structure to identify provenance relationships between annotations in different revisions.

OperationType

If a processor or agent does an explicit operation on existing elements.

Parent

The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.

Revision

Contains past or forward revisions of this document.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

HumanReview

Human Review information of the document.

ShardInfo

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

Style

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

FontSize

Font size with unit.

TextAnchor

Text reference indexing into the Document.text.

TextSegment

A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset

TextChange

This message is used for text changes aka. OCR corrections.

DocumentOutputConfig

Config that controls the output of documents. All documents will be written as a JSON file.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

GcsOutputConfig

The configuration used when outputting documents.

ShardingConfig

The sharding config for the output document.

DocumentSchema

The schema defines the output of the processed document by a processor.

EntityType

EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

EnumValues

Defines the a list of enum values.

Property

Defines properties that can be part of the entity type.

OccurrenceType

Types of occurrences of the entity type in the document. This represents the number of instances, not mentions, of an entity. For example, a bank statement might only have one account_number, but this account number can be mentioned in several places on the document. In this case, the account_number is considered a REQUIRED_ONCE entity type. If, on the other hand, we expect a bank statement to contain the status of multiple different accounts for the customers, the occurrence type is set to REQUIRED_MULTIPLE.

Metadata

Metadata for global schema behavior.

EnableProcessorMetadata

The long-running operation metadata for the EnableProcessor method.

EnableProcessorRequest

Request message for the EnableProcessor method.

EnableProcessorResponse

Response message for the EnableProcessor method. Intentionally empty proto for adding fields in future.

EvaluateProcessorVersionMetadata

Metadata of the EvaluateProcessorVersion method.

EvaluateProcessorVersionRequest

Evaluates the given ProcessorVersion against the supplied documents.

EvaluateProcessorVersionResponse

Response of the EvaluateProcessorVersion method.

Evaluation

An evaluation of a ProcessorVersion's performance.

ConfidenceLevelMetrics

Evaluations metrics, at a specific confidence level.

Counters

Evaluation counters for the documents that were used.

EntityMetricsEntry

The abstract base class for a message.

Metrics

Evaluation metrics, either in aggregate or about a specific entity.

MultiConfidenceMetrics

Metrics across multiple confidence levels.

MetricsType

A type that determines how metrics should be interpreted.

EvaluationReference

Gives a short summary of an evaluation, and links to the evaluation itself.

FetchProcessorTypesRequest

Request message for the FetchProcessorTypes method. Some processor types may require the project be added to an allowlist.

FetchProcessorTypesResponse

Response message for the FetchProcessorTypes method.

GcsDocument

Specifies a document stored on Cloud Storage.

GcsDocuments

Specifies a set of documents on Cloud Storage.

GcsPrefix

Specifies all documents on Cloud Storage with a common prefix.

GetEvaluationRequest

Retrieves a specific Evaluation.

GetProcessorRequest

Request message for the GetProcessor method.

GetProcessorTypeRequest

Request message for the GetProcessorType method.

GetProcessorVersionRequest

Request message for the GetProcessorVersion method.

HumanReviewStatus

The status of human review on a processed document.

State

The final state of human review on a processed document.

ListEvaluationsRequest

Retrieves a list of evaluations for a given ProcessorVersion.

ListEvaluationsResponse

The response from ListEvaluations.

ListProcessorTypesRequest

Request message for the ListProcessorTypes method. Some processor types may require the project be added to an allowlist.

ListProcessorTypesResponse

Response message for the ListProcessorTypes method.

ListProcessorVersionsRequest

Request message for list all processor versions belongs to a processor.

ListProcessorVersionsResponse

Response message for the ListProcessorVersions method.

ListProcessorsRequest

Request message for list all processors belongs to a project.

ListProcessorsResponse

Response message for the ListProcessors method.

NormalizedVertex

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

OcrConfig

Config for Document OCR.

Hints

Hints for OCR Engine

PremiumFeatures

Configurations for premium OCR features.

ProcessOptions

Options for Process API

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

IndividualPageSelector

A list of individual page numbers.

LayoutConfig

Serving config for layout parser processor.

ChunkingConfig

Serving config for chunking.

ProcessRequest

Request message for the ProcessDocument method.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

LabelsEntry

The abstract base class for a message.

ProcessResponse

Response message for the ProcessDocument method.

Processor

The first-class citizen for Document AI. Each processor defines how to extract structural information from a document.

State

The possible states of the processor.

ProcessorType

A processor type is responsible for performing a certain document understanding task on a certain type of document.

LocationInfo

The location information about where the processor is available.

ProcessorVersion

A processor version is an implementation of a processor. Each processor can have multiple versions, pretrained by Google internally or uptrained by the customer. A processor can only have one default version at a time. Its document-processing behavior is defined by that version.

DeprecationInfo

Information about the upcoming deprecation of this processor version.

GenAiModelInfo

Information about Generative AI model-based processor versions.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

CustomGenAiModelInfo

Information for a custom Generative AI model created by the user. These are created with Create New Version in either the Call foundation model or Fine tuning tabs.

CustomModelType

The type of custom model created by the user.

FoundationGenAiModelInfo

Information for a pretrained Google-managed foundation model.

ModelType

The possible model types of the processor version.

State

The possible states of the processor version.

ProcessorVersionAlias

Contains the alias and the aliased resource name of processor version.

RawDocument

Payload message of raw document content (bytes).

ReviewDocumentOperationMetadata

The long-running operation metadata for the ReviewDocument method.

ReviewDocumentRequest

Request message for the ReviewDocument method.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Priority

The priority level of the human review task.

ReviewDocumentResponse

Response message for the ReviewDocument method.

State

Possible states of the review operation.

SetDefaultProcessorVersionMetadata

The long-running operation metadata for the SetDefaultProcessorVersion method.

SetDefaultProcessorVersionRequest

Request message for the SetDefaultProcessorVersion method.

SetDefaultProcessorVersionResponse

Response message for the SetDefaultProcessorVersion method.

TrainProcessorVersionMetadata

The metadata that represents a processor version being created.

DatasetValidation

The dataset validation information. This includes any and all errors with documents and the dataset.

TrainProcessorVersionRequest

Request message for the TrainProcessorVersion method.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

CustomDocumentExtractionOptions

Options to control the training of the Custom Document Extraction (CDE) Processor.

TrainingMethod

Training Method for CDE. TRAINING_METHOD_UNSPECIFIED will fall back to MODEL_BASED.

FoundationModelTuningOptions

Options to control foundation model tuning of the processor.

InputData

The input data used to train a new ProcessorVersion.

TrainProcessorVersionResponse

The response for TrainProcessorVersion.

UndeployProcessorVersionMetadata

The long-running operation metadata for the UndeployProcessorVersion method.

UndeployProcessorVersionRequest

Request message for the UndeployProcessorVersion method.

UndeployProcessorVersionResponse

Response message for the UndeployProcessorVersion method.

Vertex

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

DocumentUnderstandingServiceAsyncClient

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.

DocumentUnderstandingServiceClient

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.

AutoMlParams

Parameters to control AutoML model prediction behavior.

Barcode

Encodes the detailed information of a barcode.

BatchProcessDocumentsRequest

Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.

BatchProcessDocumentsResponse

Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

BoundingPoly

A bounding polygon for the detected image annotation.

Document

Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Entity

An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.

NormalizedValue

Parsed and normalized entity value.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

EntityRelation

Relationship between Entities.

Label

Label attaches schema information and/or other metadata to segments within a Document. Multiple Labels on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Page

A page in a Document.

Block

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

DetectedBarcode

A detected barcode.

DetectedLanguage

Detected language for a structural component.

Dimension

Dimension for the page.

FormField

A form field detected on the page.

Image

Rendered image contents for this page.

ImageQualityScores

Image quality scores for the page image.

DetectedDefect

Image Quality Defects

Layout

Visual element describing a layout unit on a page.

Orientation

Detected human reading orientation.

Line

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Matrix

Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.

Paragraph

A collection of lines that a human would perceive as a paragraph.

Symbol

A detected symbol.

Table

A table representation similar to HTML table structure.

TableCell

A cell representation inside the table.

TableRow

A row of table cells.

Token

A detected token.

DetectedBreak

Detected break at the end of a Token.

Type

Enum to denote the type of break found.

StyleInfo

Font and other text style attributes.

VisualElement

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

PageAnchor

Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

PageRef

Represents a weak reference to a page element within a document.

LayoutType

The type of layout that is being referenced.

Provenance

Structure to identify provenance relationships between annotations in different revisions.

OperationType

If a processor or agent does an explicit operation on existing elements.

Parent

The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.

Revision

Contains past or forward revisions of this document.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

HumanReview

Human Review information of the document.

ShardInfo

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

Style

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

FontSize

Font size with unit.

TextAnchor

Text reference indexing into the Document.text.

TextSegment

A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset

TextChange

This message is used for text changes aka. OCR corrections.

EntityExtractionParams

Parameters to control entity extraction behavior.

FormExtractionParams

Parameters to control form extraction behavior.

GcsDestination

The Google Cloud Storage location where the output file will be written to.

GcsSource

The Google Cloud Storage location where the input file will be read from.

InputConfig

The desired input location and metadata.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

KeyValuePairHint

Reserved for future use.

NormalizedVertex

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

OcrParams

Parameters to control Optical Character Recognition (OCR) behavior.

OperationMetadata

Contains metadata for the BatchProcessDocuments operation.

State

OutputConfig

The desired output location and metadata.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

ProcessDocumentRequest

Request to process one document.

ProcessDocumentResponse

Response to a single document processing request.

TableBoundHint

A hint for a table bounding box on the page for table parsing.

TableExtractionParams

Parameters to control table extraction behavior.

Vertex

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

DocumentProcessorServiceAsyncClient

Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.

DocumentProcessorServiceClient

Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.

ListEvaluationsAsyncPager

A pager for iterating through list_evaluations requests.

This class thinly wraps an initial ListEvaluationsResponse object, and provides an __aiter__ method to iterate through its evaluations field.

If there are more pages, the __aiter__ method will make additional ListEvaluations requests and continue to iterate through the evaluations field on the corresponding responses.

All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListEvaluationsPager

A pager for iterating through list_evaluations requests.

This class thinly wraps an initial ListEvaluationsResponse object, and provides an __iter__ method to iterate through its evaluations field.

If there are more pages, the __iter__ method will make additional ListEvaluations requests and continue to iterate through the evaluations field on the corresponding responses.

All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorTypesAsyncPager

A pager for iterating through list_processor_types requests.

This class thinly wraps an initial ListProcessorTypesResponse object, and provides an __aiter__ method to iterate through its processor_types field.

If there are more pages, the __aiter__ method will make additional ListProcessorTypes requests and continue to iterate through the processor_types field on the corresponding responses.

All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorTypesPager

A pager for iterating through list_processor_types requests.

This class thinly wraps an initial ListProcessorTypesResponse object, and provides an __iter__ method to iterate through its processor_types field.

If there are more pages, the __iter__ method will make additional ListProcessorTypes requests and continue to iterate through the processor_types field on the corresponding responses.

All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorVersionsAsyncPager

A pager for iterating through list_processor_versions requests.

This class thinly wraps an initial ListProcessorVersionsResponse object, and provides an __aiter__ method to iterate through its processor_versions field.

If there are more pages, the __aiter__ method will make additional ListProcessorVersions requests and continue to iterate through the processor_versions field on the corresponding responses.

All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorVersionsPager

A pager for iterating through list_processor_versions requests.

This class thinly wraps an initial ListProcessorVersionsResponse object, and provides an __iter__ method to iterate through its processor_versions field.

If there are more pages, the __iter__ method will make additional ListProcessorVersions requests and continue to iterate through the processor_versions field on the corresponding responses.

All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorsAsyncPager

A pager for iterating through list_processors requests.

This class thinly wraps an initial ListProcessorsResponse object, and provides an __aiter__ method to iterate through its processors field.

If there are more pages, the __aiter__ method will make additional ListProcessors requests and continue to iterate through the processors field on the corresponding responses.

All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListProcessorsPager

A pager for iterating through list_processors requests.

This class thinly wraps an initial ListProcessorsResponse object, and provides an __iter__ method to iterate through its processors field.

If there are more pages, the __iter__ method will make additional ListProcessors requests and continue to iterate through the processors field on the corresponding responses.

All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

DocumentServiceAsyncClient

Service to call Cloud DocumentAI to manage document collection (dataset).

DocumentServiceClient

Service to call Cloud DocumentAI to manage document collection (dataset).

ListDocumentsAsyncPager

A pager for iterating through list_documents requests.

This class thinly wraps an initial ListDocumentsResponse object, and provides an __aiter__ method to iterate through its document_metadata field.

If there are more pages, the __aiter__ method will make additional ListDocuments requests and continue to iterate through the document_metadata field on the corresponding responses.

All the usual ListDocumentsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListDocumentsPager

A pager for iterating through list_documents requests.

This class thinly wraps an initial ListDocumentsResponse object, and provides an __iter__ method to iterate through its document_metadata field.

If there are more pages, the __iter__ method will make additional ListDocuments requests and continue to iterate through the document_metadata field on the corresponding responses.

All the usual ListDocumentsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

Barcode

Encodes the detailed information of a barcode.

BatchDatasetDocuments

Dataset documents that the batch operation will be applied to.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

IndividualDocumentIds

List of individual DocumentIds.

BatchDeleteDocumentsMetadata

IndividualBatchDeleteStatus

The status of each individual document in the batch delete process.

BatchDeleteDocumentsRequest

BatchDeleteDocumentsResponse

Response of the delete documents operation.

BatchDocumentsInputConfig

The common config to specify a set of documents used as input.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchProcessMetadata

The long-running operation metadata for BatchProcessDocuments.

IndividualProcessStatus

The status of a each individual document in the batch process.

State

Possible states of the batch processing operation.

BatchProcessRequest

Request message for BatchProcessDocuments.

BatchInputConfig

The message for input config in batch process.

BatchOutputConfig

The output configuration in the BatchProcessDocuments method.

LabelsEntry

The abstract base class for a message.

BatchProcessResponse

Response message for BatchProcessDocuments.

BoundingPoly

A bounding polygon for the detected image annotation.

CommonOperationMetadata

The common metadata for long running operations.

State

State of the longrunning operation.

CreateProcessorRequest

Request message for the CreateProcessor method. Notice this request is sent to a regionalized backend service. If the ProcessorType isn't available in that region, the creation fails.

Dataset

A singleton resource under a Processor which configures a collection of documents.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

DocumentWarehouseConfig

Configuration specific to the Document AI Warehouse-based implementation.

GCSManagedConfig

Configuration specific to the Cloud Storage-based implementation.

SpannerIndexingConfig

Configuration specific to spanner-based indexing.

State

Different states of a dataset.

UnmanagedDatasetConfig

Configuration specific to an unmanaged dataset.

DatasetSchema

Dataset Schema.

DatasetSplitType

Documents belonging to a dataset will be split into different groups referred to as splits: train, test.

DeleteProcessorMetadata

The long-running operation metadata for the DeleteProcessor method.

DeleteProcessorRequest

Request message for the DeleteProcessor method.

DeleteProcessorVersionMetadata

The long-running operation metadata for the DeleteProcessorVersion method.

DeleteProcessorVersionRequest

Request message for the DeleteProcessorVersion method.

DeployProcessorVersionMetadata

The long-running operation metadata for the DeployProcessorVersion method.

DeployProcessorVersionRequest

Request message for the DeployProcessorVersion method.

DeployProcessorVersionResponse

Response message for the DeployProcessorVersion method.

DisableProcessorMetadata

The long-running operation metadata for the DisableProcessor method.

DisableProcessorRequest

Request message for the DisableProcessor method.

DisableProcessorResponse

Response message for the DisableProcessor method. Intentionally empty proto for adding fields in future.

Document

Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

ChunkedDocument

Represents the chunks that the document is divided into.

Chunk

Represents a chunk.

ChunkPageFooter

Represents the page footer associated with the chunk.

ChunkPageHeader

Represents the page header associated with the chunk.

ChunkPageSpan

Represents where the chunk starts and ends in the document.

DocumentLayout

Represents the parsed layout of a document as a collection of blocks that the document is divided into.

DocumentLayoutBlock

Represents a block. A block could be one of the various types (text, table, list) supported.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

LayoutListBlock

Represents a list type block.

LayoutListEntry

Represents an entry in the list.

LayoutPageSpan

Represents where the block starts and ends in the document.

LayoutTableBlock

Represents a table type block.

LayoutTableCell

Represents a cell in a table row.

LayoutTableRow

Represents a row in a table.

LayoutTextBlock

Represents a text type block.

Entity

An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.

NormalizedValue

Parsed and normalized entity value.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

EntityRelation

Relationship between Entities.

Page

A page in a Document.

Block

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

DetectedBarcode

A detected barcode.

DetectedLanguage

Detected language for a structural component.

Dimension

Dimension for the page.

FormField

A form field detected on the page.

Image

Rendered image contents for this page.

ImageQualityScores

Image quality scores for the page image.

DetectedDefect

Image Quality Defects

Layout

Visual element describing a layout unit on a page.

Orientation

Detected human reading orientation.

Line

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Matrix

Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.

Paragraph

A collection of lines that a human would perceive as a paragraph.

Symbol

A detected symbol.

Table

A table representation similar to HTML table structure.

TableCell

A cell representation inside the table.

TableRow

A row of table cells.

Token

A detected token.

DetectedBreak

Detected break at the end of a Token.

Type

Enum to denote the type of break found.

StyleInfo

Font and other text style attributes.

VisualElement

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

PageAnchor

Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

PageRef

Represents a weak reference to a page element within a document.

LayoutType

The type of layout that is being referenced.

Provenance

Structure to identify provenance relationships between annotations in different revisions.

OperationType

If a processor or agent does an explicit operation on existing elements.

Parent

The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.

Revision

Contains past or forward revisions of this document.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

HumanReview

Human Review information of the document.

ShardInfo

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

Style

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

FontSize

Font size with unit.

TextAnchor

Text reference indexing into the Document.text.

TextSegment

A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset

TextChange

This message is used for text changes aka. OCR corrections.

DocumentId

Document Identifier.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

GCSManagedDocumentId

Identifies a document uniquely within the scope of a dataset in the user-managed Cloud Storage option.

UnmanagedDocumentId

Identifies a document uniquely within the scope of a dataset in unmanaged option.

DocumentLabelingState

Describes the labeling status of a document.

DocumentMetadata

Metadata about a document.

DocumentOutputConfig

Config that controls the output of documents. All documents will be written as a JSON file.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

GcsOutputConfig

The configuration used when outputting documents.

ShardingConfig

The sharding config for the output document.

DocumentPageRange

Range of pages present in a document.

DocumentSchema

The schema defines the output of the processed document by a processor.

EntityType

EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

EnumValues

Defines the a list of enum values.

Property

Defines properties that can be part of the entity type.

OccurrenceType

Types of occurrences of the entity type in the document. This represents the number of instances, not mentions, of an entity. For example, a bank statement might only have one account_number, but this account number can be mentioned in several places on the document. In this case, the account_number is considered a REQUIRED_ONCE entity type. If, on the other hand, we expect a bank statement to contain the status of multiple different accounts for the customers, the occurrence type is set to REQUIRED_MULTIPLE.

Metadata

Metadata for global schema behavior.

EnableProcessorMetadata

The long-running operation metadata for the EnableProcessor method.

EnableProcessorRequest

Request message for the EnableProcessor method.

EnableProcessorResponse

Response message for the EnableProcessor method. Intentionally empty proto for adding fields in future.

EntityTypeMetadata

Metadata about an entity type.

EvaluateProcessorVersionMetadata

Metadata of the EvaluateProcessorVersion method.

EvaluateProcessorVersionRequest

Evaluates the given ProcessorVersion against the supplied documents.

EvaluateProcessorVersionResponse

Response of the EvaluateProcessorVersion method.

Evaluation

An evaluation of a ProcessorVersion's performance.

ConfidenceLevelMetrics

Evaluations metrics, at a specific confidence level.

Counters

Evaluation counters for the documents that were used.

EntityMetricsEntry

The abstract base class for a message.

Metrics

Evaluation metrics, either in aggregate or about a specific entity.

MultiConfidenceMetrics

Metrics across multiple confidence levels.

MetricsType

A type that determines how metrics should be interpreted.

EvaluationReference

Gives a short summary of an evaluation, and links to the evaluation itself.

FetchProcessorTypesRequest

Request message for the FetchProcessorTypes method. Some processor types may require the project be added to an allowlist.

FetchProcessorTypesResponse

Response message for the FetchProcessorTypes method.

FieldExtractionMetadata

Metadata for how this field value is extracted.

GcsDocument

Specifies a document stored on Cloud Storage.

GcsDocuments

Specifies a set of documents on Cloud Storage.

GcsPrefix

Specifies all documents on Cloud Storage with a common prefix.

GetDatasetSchemaRequest

Request for GetDatasetSchema.

GetDocumentRequest

GetDocumentResponse

GetEvaluationRequest

Retrieves a specific Evaluation.

GetProcessorRequest

Request message for the GetProcessor method.

GetProcessorTypeRequest

Request message for the GetProcessorType method.

GetProcessorVersionRequest

Request message for the GetProcessorVersion method.

HumanReviewStatus

The status of human review on a processed document.

State

The final state of human review on a processed document.

ImportDocumentsMetadata

Metadata of the import document operation.

ImportConfigValidationResult

The validation status of each import config. Status is set to an error if there are no documents to import in the import_config, or OK if the operation will try to proceed with at least one document.

IndividualImportStatus

The status of each individual document in the import process.

ImportDocumentsRequest

BatchDocumentsImportConfig

Config for importing documents. Each batch can have its own dataset split type.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

AutoSplitConfig

The config for auto-split.

ImportDocumentsResponse

Response of the import document operation.

ImportProcessorVersionMetadata

The long-running operation metadata for the ImportProcessorVersion method.

ImportProcessorVersionRequest

The request message for the ImportProcessorVersion method.

The Document AI Service Agent <https://cloud.google.com/iam/docs/service-agents> of the destination project must have Document AI Editor role <https://cloud.google.com/document-ai/docs/access-control/iam-roles> on the source project.

The destination project is specified as part of the parent field. The source project is specified as part of the source or external_processor_version_source field.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

ExternalProcessorVersionSource

The external source processor version.

ImportProcessorVersionResponse

The response message for the ImportProcessorVersion method.

ListDocumentsRequest

ListDocumentsResponse

ListEvaluationsRequest

Retrieves a list of evaluations for a given ProcessorVersion.

ListEvaluationsResponse

The response from ListEvaluations.

ListProcessorTypesRequest

Request message for the ListProcessorTypes method. Some processor types may require the project be added to an allowlist.

ListProcessorTypesResponse

Response message for the ListProcessorTypes method.

ListProcessorVersionsRequest

Request message for list all processor versions belongs to a processor.

ListProcessorVersionsResponse

Response message for the ListProcessorVersions method.

ListProcessorsRequest

Request message for list all processors belongs to a project.

ListProcessorsResponse

Response message for the ListProcessors method.

NormalizedVertex

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

OcrConfig

Config for Document OCR.

Hints

Hints for OCR Engine

PremiumFeatures

Configurations for premium OCR features.

ProcessOptions

Options for Process API

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

IndividualPageSelector

A list of individual page numbers.

LayoutConfig

Serving config for layout parser processor.

ChunkingConfig

Serving config for chunking.

ProcessRequest

Request message for the ProcessDocument method.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

LabelsEntry

The abstract base class for a message.

ProcessResponse

Response message for the ProcessDocument method.

Processor

The first-class citizen for Document AI. Each processor defines how to extract structural information from a document.

State

The possible states of the processor.

ProcessorType

A processor type is responsible for performing a certain document understanding task on a certain type of document.

LocationInfo

The location information about where the processor is available.

ProcessorVersion

A processor version is an implementation of a processor. Each processor can have multiple versions, pretrained by Google internally or uptrained by the customer. A processor can only have one default version at a time. Its document-processing behavior is defined by that version.

DeprecationInfo

Information about the upcoming deprecation of this processor version.

GenAiModelInfo

Information about Generative AI model-based processor versions.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

CustomGenAiModelInfo

Information for a custom Generative AI model created by the user. These are created with Create New Version in either the Call foundation model or Fine tuning tabs.

CustomModelType

The type of custom model created by the user.

FoundationGenAiModelInfo

Information for a pretrained Google-managed foundation model.

ModelType

The possible model types of the processor version.

State

The possible states of the processor version.

ProcessorVersionAlias

Contains the alias and the aliased resource name of processor version.

PropertyMetadata

Metadata about a property.

RawDocument

Payload message of raw document content (bytes).

ReviewDocumentOperationMetadata

The long-running operation metadata for the ReviewDocument method.

State

State of the long-running operation.

ReviewDocumentRequest

Request message for the ReviewDocument method.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Priority

The priority level of the human review task.

ReviewDocumentResponse

Response message for the ReviewDocument method.

State

Possible states of the review operation.

RevisionRef

The revision reference specifies which revision on the document to read.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RevisionCase

Some predefined revision cases.

SetDefaultProcessorVersionMetadata

The long-running operation metadata for the SetDefaultProcessorVersion method.

SetDefaultProcessorVersionRequest

Request message for the SetDefaultProcessorVersion method.

SetDefaultProcessorVersionResponse

Response message for the SetDefaultProcessorVersion method.

SummaryOptions

Metadata for document summarization.

Format

The Format enum.

Length

The Length enum.

TrainProcessorVersionMetadata

The metadata that represents a processor version being created.

DatasetValidation

The dataset validation information. This includes any and all errors with documents and the dataset.

TrainProcessorVersionRequest

Request message for the TrainProcessorVersion method.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

CustomDocumentExtractionOptions

Options to control the training of the Custom Document Extraction (CDE) Processor.

TrainingMethod

Training Method for CDE. TRAINING_METHOD_UNSPECIFIED will fall back to MODEL_BASED.

FoundationModelTuningOptions

Options to control foundation model tuning of the processor.

InputData

The input data used to train a new ProcessorVersion.

TrainProcessorVersionResponse

The response for TrainProcessorVersion.

UndeployProcessorVersionMetadata

The long-running operation metadata for the UndeployProcessorVersion method.

UndeployProcessorVersionRequest

Request message for the UndeployProcessorVersion method.

UndeployProcessorVersionResponse

Response message for the UndeployProcessorVersion method.

UpdateDatasetOperationMetadata

UpdateDatasetRequest

UpdateDatasetSchemaRequest

Request for UpdateDatasetSchema.

Vertex

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

Modules

pagers

API documentation for documentai_v1.services.document_processor_service.pagers module.

pagers

API documentation for documentai_v1beta3.services.document_processor_service.pagers module.

pagers

API documentation for documentai_v1beta3.services.document_service.pagers module.