- 3.0.1 (latest)
- 3.0.0
- 2.35.0
- 2.34.0
- 2.33.0
- 2.32.0
- 2.30.0
- 2.29.3
- 2.28.0
- 2.27.1
- 2.26.0
- 2.25.0
- 2.24.2
- 2.23.0
- 2.22.0
- 2.21.1
- 2.20.2
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.1
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.1
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.1
- 2.3.0
- 2.2.0
- 2.1.0
- 2.0.3
- 1.5.1
- 1.4.2
- 1.3.0
- 1.2.1
- 1.1.0
- 1.0.0
- 0.5.2
- 0.4.0
- 0.3.0
- 0.2.0
- 0.1.0
Summary of entries of Classes for documentai.
Classes
DocumentProcessorServiceAsyncClient
Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.
DocumentProcessorServiceClient
Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.
ListEvaluationsAsyncPager
A pager for iterating through list_evaluations
requests.
This class thinly wraps an initial
ListEvaluationsResponse object, and
provides an __aiter__
method to iterate through its
evaluations
field.
If there are more pages, the __aiter__
method will make additional
ListEvaluations
requests and continue to iterate
through the evaluations
field on the
corresponding responses.
All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListEvaluationsPager
A pager for iterating through list_evaluations
requests.
This class thinly wraps an initial
ListEvaluationsResponse object, and
provides an __iter__
method to iterate through its
evaluations
field.
If there are more pages, the __iter__
method will make additional
ListEvaluations
requests and continue to iterate
through the evaluations
field on the
corresponding responses.
All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorTypesAsyncPager
A pager for iterating through list_processor_types
requests.
This class thinly wraps an initial
ListProcessorTypesResponse object, and
provides an __aiter__
method to iterate through its
processor_types
field.
If there are more pages, the __aiter__
method will make additional
ListProcessorTypes
requests and continue to iterate
through the processor_types
field on the
corresponding responses.
All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorTypesPager
A pager for iterating through list_processor_types
requests.
This class thinly wraps an initial
ListProcessorTypesResponse object, and
provides an __iter__
method to iterate through its
processor_types
field.
If there are more pages, the __iter__
method will make additional
ListProcessorTypes
requests and continue to iterate
through the processor_types
field on the
corresponding responses.
All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorVersionsAsyncPager
A pager for iterating through list_processor_versions
requests.
This class thinly wraps an initial
ListProcessorVersionsResponse object, and
provides an __aiter__
method to iterate through its
processor_versions
field.
If there are more pages, the __aiter__
method will make additional
ListProcessorVersions
requests and continue to iterate
through the processor_versions
field on the
corresponding responses.
All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorVersionsPager
A pager for iterating through list_processor_versions
requests.
This class thinly wraps an initial
ListProcessorVersionsResponse object, and
provides an __iter__
method to iterate through its
processor_versions
field.
If there are more pages, the __iter__
method will make additional
ListProcessorVersions
requests and continue to iterate
through the processor_versions
field on the
corresponding responses.
All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorsAsyncPager
A pager for iterating through list_processors
requests.
This class thinly wraps an initial
ListProcessorsResponse object, and
provides an __aiter__
method to iterate through its
processors
field.
If there are more pages, the __aiter__
method will make additional
ListProcessors
requests and continue to iterate
through the processors
field on the
corresponding responses.
All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorsPager
A pager for iterating through list_processors
requests.
This class thinly wraps an initial
ListProcessorsResponse object, and
provides an __iter__
method to iterate through its
processors
field.
If there are more pages, the __iter__
method will make additional
ListProcessors
requests and continue to iterate
through the processors
field on the
corresponding responses.
All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
Barcode
Encodes the detailed information of a barcode.
BatchDocumentsInputConfig
The common config to specify a set of documents used as input.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
BatchProcessMetadata
The long-running operation metadata for BatchProcessDocuments.
IndividualProcessStatus
The status of a each individual document in the batch process.
State
Possible states of the batch processing operation.
BatchProcessRequest
Request message for BatchProcessDocuments.
LabelsEntry
The abstract base class for a message.
BatchProcessResponse
Response message for BatchProcessDocuments.
BoundingPoly
A bounding polygon for the detected image annotation.
CommonOperationMetadata
The common metadata for long running operations.
State
State of the longrunning operation.
CreateProcessorRequest
Request message for the CreateProcessor method. Notice this request is sent to a regionalized backend service. If the ProcessorType isn't available in that region, the creation fails.
DeleteProcessorMetadata
The long-running operation metadata for the DeleteProcessor method.
DeleteProcessorRequest
Request message for the DeleteProcessor method.
DeleteProcessorVersionMetadata
The long-running operation metadata for the DeleteProcessorVersion method.
DeleteProcessorVersionRequest
Request message for the DeleteProcessorVersion method.
DeployProcessorVersionMetadata
The long-running operation metadata for the DeployProcessorVersion method.
DeployProcessorVersionRequest
Request message for the DeployProcessorVersion method.
DeployProcessorVersionResponse
Response message for the DeployProcessorVersion method.
DisableProcessorMetadata
The long-running operation metadata for the DisableProcessor method.
DisableProcessorRequest
Request message for the DisableProcessor method.
DisableProcessorResponse
Response message for the DisableProcessor method. Intentionally empty proto for adding fields in future.
Document
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
ChunkedDocument
Represents the chunks that the document is divided into.
Chunk
Represents a chunk.
ChunkPageFooter
Represents the page footer associated with the chunk.
ChunkPageHeader
Represents the page header associated with the chunk.
ChunkPageSpan
Represents where the chunk starts and ends in the document.
DocumentLayout
Represents the parsed layout of a document as a collection of blocks that the document is divided into.
DocumentLayoutBlock
Represents a block. A block could be one of the various types (text, table, list) supported.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
LayoutListBlock
Represents a list type block.
LayoutListEntry
Represents an entry in the list.
LayoutPageSpan
Represents where the block starts and ends in the document.
LayoutTableBlock
Represents a table type block.
LayoutTableCell
Represents a cell in a table row.
LayoutTableRow
Represents a row in a table.
LayoutTextBlock
Represents a text type block.
Entity
An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.
NormalizedValue
Parsed and normalized entity value.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
EntityRelation
Relationship between Entities.
Page
A page in a Document.
Block
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
DetectedBarcode
A detected barcode.
DetectedLanguage
Detected language for a structural component.
Dimension
Dimension for the page.
FormField
A form field detected on the page.
Image
Rendered image contents for this page.
ImageQualityScores
Image quality scores for the page image.
DetectedDefect
Image Quality Defects
Layout
Visual element describing a layout unit on a page.
Orientation
Detected human reading orientation.
Line
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
Matrix
Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
Paragraph
A collection of lines that a human would perceive as a paragraph.
Symbol
A detected symbol.
Table
A table representation similar to HTML table structure.
TableCell
A cell representation inside the table.
TableRow
A row of table cells.
Token
A detected token.
DetectedBreak
Detected break at the end of a Token.
Type
Enum to denote the type of break found.
StyleInfo
Font and other text style attributes.
VisualElement
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
PageAnchor
Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.
PageRef
Represents a weak reference to a page element within a document.
LayoutType
The type of layout that is being referenced.
Provenance
Structure to identify provenance relationships between annotations in different revisions.
OperationType
If a processor or agent does an explicit operation on existing elements.
Parent
The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.
Revision
Contains past or forward revisions of this document.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
HumanReview
Human Review information of the document.
ShardInfo
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Style
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
FontSize
Font size with unit.
TextAnchor
Text reference indexing into the Document.text.
TextSegment
A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset
TextChange
This message is used for text changes aka. OCR corrections.
DocumentOutputConfig
Config that controls the output of documents. All documents will be written as a JSON file.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
GcsOutputConfig
The configuration used when outputting documents.
ShardingConfig
The sharding config for the output document.
DocumentSchema
The schema defines the output of the processed document by a processor.
EntityType
EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
EnumValues
Defines the a list of enum values.
Property
Defines properties that can be part of the entity type.
OccurrenceType
Types of occurrences of the entity type in the document. This
represents the number of instances, not mentions, of an entity. For
example, a bank statement might only have one account_number
,
but this account number can be mentioned in several places on the
document. In this case, the account_number
is considered a
REQUIRED_ONCE
entity type. If, on the other hand, we expect a
bank statement to contain the status of multiple different accounts
for the customers, the occurrence type is set to
REQUIRED_MULTIPLE
.
Metadata
Metadata for global schema behavior.
EnableProcessorMetadata
The long-running operation metadata for the EnableProcessor method.
EnableProcessorRequest
Request message for the EnableProcessor method.
EnableProcessorResponse
Response message for the EnableProcessor method. Intentionally empty proto for adding fields in future.
EvaluateProcessorVersionMetadata
Metadata of the EvaluateProcessorVersion method.
EvaluateProcessorVersionRequest
Evaluates the given ProcessorVersion against the supplied documents.
EvaluateProcessorVersionResponse
Response of the EvaluateProcessorVersion method.
Evaluation
An evaluation of a ProcessorVersion's performance.
ConfidenceLevelMetrics
Evaluations metrics, at a specific confidence level.
Counters
Evaluation counters for the documents that were used.
EntityMetricsEntry
The abstract base class for a message.
Metrics
Evaluation metrics, either in aggregate or about a specific entity.
MultiConfidenceMetrics
Metrics across multiple confidence levels.
MetricsType
A type that determines how metrics should be interpreted.
EvaluationReference
Gives a short summary of an evaluation, and links to the evaluation itself.
FetchProcessorTypesRequest
Request message for the FetchProcessorTypes method. Some processor types may require the project be added to an allowlist.
FetchProcessorTypesResponse
Response message for the FetchProcessorTypes method.
GcsDocument
Specifies a document stored on Cloud Storage.
GcsDocuments
Specifies a set of documents on Cloud Storage.
GcsPrefix
Specifies all documents on Cloud Storage with a common prefix.
GetEvaluationRequest
Retrieves a specific Evaluation.
GetProcessorRequest
Request message for the GetProcessor method.
GetProcessorTypeRequest
Request message for the GetProcessorType method.
GetProcessorVersionRequest
Request message for the GetProcessorVersion method.
HumanReviewStatus
The status of human review on a processed document.
State
The final state of human review on a processed document.
ListEvaluationsRequest
Retrieves a list of evaluations for a given ProcessorVersion.
ListEvaluationsResponse
The response from ListEvaluations
.
ListProcessorTypesRequest
Request message for the ListProcessorTypes method. Some processor types may require the project be added to an allowlist.
ListProcessorTypesResponse
Response message for the ListProcessorTypes method.
ListProcessorVersionsRequest
Request message for list all processor versions belongs to a processor.
ListProcessorVersionsResponse
Response message for the ListProcessorVersions method.
ListProcessorsRequest
Request message for list all processors belongs to a project.
ListProcessorsResponse
Response message for the ListProcessors method.
NormalizedVertex
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
OcrConfig
Config for Document OCR.
Hints
Hints for OCR Engine
PremiumFeatures
Configurations for premium OCR features.
ProcessOptions
Options for Process API
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
IndividualPageSelector
A list of individual page numbers.
LayoutConfig
Serving config for layout parser processor.
ChunkingConfig
Serving config for chunking.
ProcessRequest
Request message for the ProcessDocument method.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
LabelsEntry
The abstract base class for a message.
ProcessResponse
Response message for the ProcessDocument method.
Processor
The first-class citizen for Document AI. Each processor defines how to extract structural information from a document.
State
The possible states of the processor.
ProcessorType
A processor type is responsible for performing a certain document understanding task on a certain type of document.
LocationInfo
The location information about where the processor is available.
ProcessorVersion
A processor version is an implementation of a processor. Each processor can have multiple versions, pretrained by Google internally or uptrained by the customer. A processor can only have one default version at a time. Its document-processing behavior is defined by that version.
DeprecationInfo
Information about the upcoming deprecation of this processor version.
ModelType
The possible model types of the processor version.
State
The possible states of the processor version.
ProcessorVersionAlias
Contains the alias and the aliased resource name of processor version.
RawDocument
Payload message of raw document content (bytes).
ReviewDocumentOperationMetadata
The long-running operation metadata for the ReviewDocument method.
ReviewDocumentRequest
Request message for the ReviewDocument method.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Priority
The priority level of the human review task.
ReviewDocumentResponse
Response message for the ReviewDocument method.
State
Possible states of the review operation.
SetDefaultProcessorVersionMetadata
The long-running operation metadata for the SetDefaultProcessorVersion method.
SetDefaultProcessorVersionRequest
Request message for the SetDefaultProcessorVersion method.
SetDefaultProcessorVersionResponse
Response message for the SetDefaultProcessorVersion method.
TrainProcessorVersionMetadata
The metadata that represents a processor version being created.
DatasetValidation
The dataset validation information. This includes any and all errors with documents and the dataset.
TrainProcessorVersionRequest
Request message for the TrainProcessorVersion method.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
CustomDocumentExtractionOptions
Options to control the training of the Custom Document Extraction (CDE) Processor.
TrainingMethod
Training Method for CDE. TRAINING_METHOD_UNSPECIFIED
will fall
back to MODEL_BASED
.
FoundationModelTuningOptions
Options to control foundation model tuning of the processor.
InputData
The input data used to train a new ProcessorVersion.
TrainProcessorVersionResponse
The response for TrainProcessorVersion.
UndeployProcessorVersionMetadata
The long-running operation metadata for the UndeployProcessorVersion method.
UndeployProcessorVersionRequest
Request message for the UndeployProcessorVersion method.
UndeployProcessorVersionResponse
Response message for the UndeployProcessorVersion method.
Vertex
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
DocumentUnderstandingServiceAsyncClient
Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.
DocumentUnderstandingServiceClient
Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.
AutoMlParams
Parameters to control AutoML model prediction behavior.
Barcode
Encodes the detailed information of a barcode.
BatchProcessDocumentsRequest
Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.
BatchProcessDocumentsResponse
Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.
BoundingPoly
A bounding polygon for the detected image annotation.
Document
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Entity
An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.
NormalizedValue
Parsed and normalized entity value.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
EntityRelation
Relationship between Entities.
Label
Label attaches schema information and/or other metadata to segments within a Document. Multiple Labels on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Page
A page in a Document.
Block
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
DetectedBarcode
A detected barcode.
DetectedLanguage
Detected language for a structural component.
Dimension
Dimension for the page.
FormField
A form field detected on the page.
Image
Rendered image contents for this page.
ImageQualityScores
Image quality scores for the page image.
DetectedDefect
Image Quality Defects
Layout
Visual element describing a layout unit on a page.
Orientation
Detected human reading orientation.
Line
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
Matrix
Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
Paragraph
A collection of lines that a human would perceive as a paragraph.
Symbol
A detected symbol.
Table
A table representation similar to HTML table structure.
TableCell
A cell representation inside the table.
TableRow
A row of table cells.
Token
A detected token.
DetectedBreak
Detected break at the end of a Token.
Type
Enum to denote the type of break found.
StyleInfo
Font and other text style attributes.
VisualElement
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
PageAnchor
Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.
PageRef
Represents a weak reference to a page element within a document.
LayoutType
The type of layout that is being referenced.
Provenance
Structure to identify provenance relationships between annotations in different revisions.
OperationType
If a processor or agent does an explicit operation on existing elements.
Parent
The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.
Revision
Contains past or forward revisions of this document.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
HumanReview
Human Review information of the document.
ShardInfo
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Style
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
FontSize
Font size with unit.
TextAnchor
Text reference indexing into the Document.text.
TextSegment
A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset
TextChange
This message is used for text changes aka. OCR corrections.
EntityExtractionParams
Parameters to control entity extraction behavior.
FormExtractionParams
Parameters to control form extraction behavior.
GcsDestination
The Google Cloud Storage location where the output file will be written to.
GcsSource
The Google Cloud Storage location where the input file will be read from.
InputConfig
The desired input location and metadata.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
KeyValuePairHint
Reserved for future use.
NormalizedVertex
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
OcrParams
Parameters to control Optical Character Recognition (OCR) behavior.
OperationMetadata
Contains metadata for the BatchProcessDocuments operation.
State
OutputConfig
The desired output location and metadata.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
ProcessDocumentRequest
Request to process one document.
ProcessDocumentResponse
Response to a single document processing request.
TableBoundHint
A hint for a table bounding box on the page for table parsing.
TableExtractionParams
Parameters to control table extraction behavior.
Vertex
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
DocumentProcessorServiceAsyncClient
Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.
DocumentProcessorServiceClient
Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.
ListEvaluationsAsyncPager
A pager for iterating through list_evaluations
requests.
This class thinly wraps an initial
ListEvaluationsResponse object, and
provides an __aiter__
method to iterate through its
evaluations
field.
If there are more pages, the __aiter__
method will make additional
ListEvaluations
requests and continue to iterate
through the evaluations
field on the
corresponding responses.
All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListEvaluationsPager
A pager for iterating through list_evaluations
requests.
This class thinly wraps an initial
ListEvaluationsResponse object, and
provides an __iter__
method to iterate through its
evaluations
field.
If there are more pages, the __iter__
method will make additional
ListEvaluations
requests and continue to iterate
through the evaluations
field on the
corresponding responses.
All the usual ListEvaluationsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorTypesAsyncPager
A pager for iterating through list_processor_types
requests.
This class thinly wraps an initial
ListProcessorTypesResponse object, and
provides an __aiter__
method to iterate through its
processor_types
field.
If there are more pages, the __aiter__
method will make additional
ListProcessorTypes
requests and continue to iterate
through the processor_types
field on the
corresponding responses.
All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorTypesPager
A pager for iterating through list_processor_types
requests.
This class thinly wraps an initial
ListProcessorTypesResponse object, and
provides an __iter__
method to iterate through its
processor_types
field.
If there are more pages, the __iter__
method will make additional
ListProcessorTypes
requests and continue to iterate
through the processor_types
field on the
corresponding responses.
All the usual ListProcessorTypesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorVersionsAsyncPager
A pager for iterating through list_processor_versions
requests.
This class thinly wraps an initial
ListProcessorVersionsResponse object, and
provides an __aiter__
method to iterate through its
processor_versions
field.
If there are more pages, the __aiter__
method will make additional
ListProcessorVersions
requests and continue to iterate
through the processor_versions
field on the
corresponding responses.
All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorVersionsPager
A pager for iterating through list_processor_versions
requests.
This class thinly wraps an initial
ListProcessorVersionsResponse object, and
provides an __iter__
method to iterate through its
processor_versions
field.
If there are more pages, the __iter__
method will make additional
ListProcessorVersions
requests and continue to iterate
through the processor_versions
field on the
corresponding responses.
All the usual ListProcessorVersionsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorsAsyncPager
A pager for iterating through list_processors
requests.
This class thinly wraps an initial
ListProcessorsResponse object, and
provides an __aiter__
method to iterate through its
processors
field.
If there are more pages, the __aiter__
method will make additional
ListProcessors
requests and continue to iterate
through the processors
field on the
corresponding responses.
All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListProcessorsPager
A pager for iterating through list_processors
requests.
This class thinly wraps an initial
ListProcessorsResponse object, and
provides an __iter__
method to iterate through its
processors
field.
If there are more pages, the __iter__
method will make additional
ListProcessors
requests and continue to iterate
through the processors
field on the
corresponding responses.
All the usual ListProcessorsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
DocumentServiceAsyncClient
Service to call Cloud DocumentAI to manage document collection (dataset).
DocumentServiceClient
Service to call Cloud DocumentAI to manage document collection (dataset).
ListDocumentsAsyncPager
A pager for iterating through list_documents
requests.
This class thinly wraps an initial
ListDocumentsResponse object, and
provides an __aiter__
method to iterate through its
document_metadata
field.
If there are more pages, the __aiter__
method will make additional
ListDocuments
requests and continue to iterate
through the document_metadata
field on the
corresponding responses.
All the usual ListDocumentsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
ListDocumentsPager
A pager for iterating through list_documents
requests.
This class thinly wraps an initial
ListDocumentsResponse object, and
provides an __iter__
method to iterate through its
document_metadata
field.
If there are more pages, the __iter__
method will make additional
ListDocuments
requests and continue to iterate
through the document_metadata
field on the
corresponding responses.
All the usual ListDocumentsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.
Barcode
Encodes the detailed information of a barcode.
BatchDatasetDocuments
Dataset documents that the batch operation will be applied to.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
IndividualDocumentIds
List of individual DocumentIds.
BatchDeleteDocumentsMetadata
IndividualBatchDeleteStatus
The status of each individual document in the batch delete process.
BatchDeleteDocumentsRequest
BatchDeleteDocumentsResponse
Response of the delete documents operation.
BatchDocumentsInputConfig
The common config to specify a set of documents used as input.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
BatchProcessMetadata
The long-running operation metadata for BatchProcessDocuments.
IndividualProcessStatus
The status of a each individual document in the batch process.
State
Possible states of the batch processing operation.
BatchProcessRequest
Request message for BatchProcessDocuments.
BatchInputConfig
The message for input config in batch process.
BatchOutputConfig
The output configuration in the BatchProcessDocuments method.
LabelsEntry
The abstract base class for a message.
BatchProcessResponse
Response message for BatchProcessDocuments.
BoundingPoly
A bounding polygon for the detected image annotation.
CommonOperationMetadata
The common metadata for long running operations.
State
State of the longrunning operation.
CreateProcessorRequest
Request message for the CreateProcessor method. Notice this request is sent to a regionalized backend service. If the ProcessorType isn't available in that region, the creation fails.
Dataset
A singleton resource under a Processor which configures a collection of documents.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
DocumentWarehouseConfig
Configuration specific to the Document AI Warehouse-based implementation.
GCSManagedConfig
Configuration specific to the Cloud Storage-based implementation.
SpannerIndexingConfig
Configuration specific to spanner-based indexing.
State
Different states of a dataset.
UnmanagedDatasetConfig
Configuration specific to an unmanaged dataset.
DatasetSchema
Dataset Schema.
DatasetSplitType
Documents belonging to a dataset will be split into different groups referred to as splits: train, test.
DeleteProcessorMetadata
The long-running operation metadata for the DeleteProcessor method.
DeleteProcessorRequest
Request message for the DeleteProcessor method.
DeleteProcessorVersionMetadata
The long-running operation metadata for the DeleteProcessorVersion method.
DeleteProcessorVersionRequest
Request message for the DeleteProcessorVersion method.
DeployProcessorVersionMetadata
The long-running operation metadata for the DeployProcessorVersion method.
DeployProcessorVersionRequest
Request message for the DeployProcessorVersion method.
DeployProcessorVersionResponse
Response message for the DeployProcessorVersion method.
DisableProcessorMetadata
The long-running operation metadata for the DisableProcessor method.
DisableProcessorRequest
Request message for the DisableProcessor method.
DisableProcessorResponse
Response message for the DisableProcessor method. Intentionally empty proto for adding fields in future.
Document
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
ChunkedDocument
Represents the chunks that the document is divided into.
Chunk
Represents a chunk.
ChunkPageFooter
Represents the page footer associated with the chunk.
ChunkPageHeader
Represents the page header associated with the chunk.
ChunkPageSpan
Represents where the chunk starts and ends in the document.
DocumentLayout
Represents the parsed layout of a document as a collection of blocks that the document is divided into.
DocumentLayoutBlock
Represents a block. A block could be one of the various types (text, table, list) supported.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
LayoutListBlock
Represents a list type block.
LayoutListEntry
Represents an entry in the list.
LayoutPageSpan
Represents where the block starts and ends in the document.
LayoutTableBlock
Represents a table type block.
LayoutTableCell
Represents a cell in a table row.
LayoutTableRow
Represents a row in a table.
LayoutTextBlock
Represents a text type block.
Entity
An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.
NormalizedValue
Parsed and normalized entity value.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
EntityRelation
Relationship between Entities.
Page
A page in a Document.
Block
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
DetectedBarcode
A detected barcode.
DetectedLanguage
Detected language for a structural component.
Dimension
Dimension for the page.
FormField
A form field detected on the page.
Image
Rendered image contents for this page.
ImageQualityScores
Image quality scores for the page image.
DetectedDefect
Image Quality Defects
Layout
Visual element describing a layout unit on a page.
Orientation
Detected human reading orientation.
Line
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
Matrix
Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
Paragraph
A collection of lines that a human would perceive as a paragraph.
Symbol
A detected symbol.
Table
A table representation similar to HTML table structure.
TableCell
A cell representation inside the table.
TableRow
A row of table cells.
Token
A detected token.
DetectedBreak
Detected break at the end of a Token.
Type
Enum to denote the type of break found.
StyleInfo
Font and other text style attributes.
VisualElement
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
PageAnchor
Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.
PageRef
Represents a weak reference to a page element within a document.
LayoutType
The type of layout that is being referenced.
Provenance
Structure to identify provenance relationships between annotations in different revisions.
OperationType
If a processor or agent does an explicit operation on existing elements.
Parent
The parent element the current element is based on. Used for referencing/aligning, removal and replacement operations.
Revision
Contains past or forward revisions of this document.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
HumanReview
Human Review information of the document.
ShardInfo
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Style
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
FontSize
Font size with unit.
TextAnchor
Text reference indexing into the Document.text.
TextSegment
A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset
TextChange
This message is used for text changes aka. OCR corrections.
DocumentId
Document Identifier.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
GCSManagedDocumentId
Identifies a document uniquely within the scope of a dataset in the user-managed Cloud Storage option.
UnmanagedDocumentId
Identifies a document uniquely within the scope of a dataset in unmanaged option.
DocumentLabelingState
Describes the labeling status of a document.
DocumentMetadata
Metadata about a document.
DocumentOutputConfig
Config that controls the output of documents. All documents will be written as a JSON file.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
GcsOutputConfig
The configuration used when outputting documents.
ShardingConfig
The sharding config for the output document.
DocumentPageRange
Range of pages present in a document.
DocumentSchema
The schema defines the output of the processed document by a processor.
EntityType
EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
EnumValues
Defines the a list of enum values.
Property
Defines properties that can be part of the entity type.
OccurrenceType
Types of occurrences of the entity type in the document. This
represents the number of instances, not mentions, of an entity. For
example, a bank statement might only have one account_number
,
but this account number can be mentioned in several places on the
document. In this case, the account_number
is considered a
REQUIRED_ONCE
entity type. If, on the other hand, we expect a
bank statement to contain the status of multiple different accounts
for the customers, the occurrence type is set to
REQUIRED_MULTIPLE
.
Metadata
Metadata for global schema behavior.
EnableProcessorMetadata
The long-running operation metadata for the EnableProcessor method.
EnableProcessorRequest
Request message for the EnableProcessor method.
EnableProcessorResponse
Response message for the EnableProcessor method. Intentionally empty proto for adding fields in future.
EntityTypeMetadata
Metadata about an entity type.
EvaluateProcessorVersionMetadata
Metadata of the EvaluateProcessorVersion method.
EvaluateProcessorVersionRequest
Evaluates the given ProcessorVersion against the supplied documents.
EvaluateProcessorVersionResponse
Response of the EvaluateProcessorVersion method.
Evaluation
An evaluation of a ProcessorVersion's performance.
ConfidenceLevelMetrics
Evaluations metrics, at a specific confidence level.
Counters
Evaluation counters for the documents that were used.
EntityMetricsEntry
The abstract base class for a message.
Metrics
Evaluation metrics, either in aggregate or about a specific entity.
MultiConfidenceMetrics
Metrics across multiple confidence levels.
MetricsType
A type that determines how metrics should be interpreted.
EvaluationReference
Gives a short summary of an evaluation, and links to the evaluation itself.
FetchProcessorTypesRequest
Request message for the FetchProcessorTypes method. Some processor types may require the project be added to an allowlist.
FetchProcessorTypesResponse
Response message for the FetchProcessorTypes method.
FieldExtractionMetadata
Metadata for how this field value is extracted.
GcsDocument
Specifies a document stored on Cloud Storage.
GcsDocuments
Specifies a set of documents on Cloud Storage.
GcsPrefix
Specifies all documents on Cloud Storage with a common prefix.
GetDatasetSchemaRequest
Request for GetDatasetSchema
.
GetDocumentRequest
GetDocumentResponse
GetEvaluationRequest
Retrieves a specific Evaluation.
GetProcessorRequest
Request message for the GetProcessor method.
GetProcessorTypeRequest
Request message for the GetProcessorType method.
GetProcessorVersionRequest
Request message for the GetProcessorVersion method.
HumanReviewStatus
The status of human review on a processed document.
State
The final state of human review on a processed document.
ImportDocumentsMetadata
Metadata of the import document operation.
ImportConfigValidationResult
The validation status of each import config. Status is set to an
error if there are no documents to import in the import_config
,
or OK
if the operation will try to proceed with at least one
document.
IndividualImportStatus
The status of each individual document in the import process.
ImportDocumentsRequest
BatchDocumentsImportConfig
Config for importing documents. Each batch can have its own dataset split type.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
AutoSplitConfig
The config for auto-split.
ImportDocumentsResponse
Response of the import document operation.
ImportProcessorVersionMetadata
The long-running operation metadata for the ImportProcessorVersion method.
ImportProcessorVersionRequest
The request message for the ImportProcessorVersion method.
The Document AI Service
Agent <https://cloud.google.com/iam/docs/service-agents>
of the
destination project must have Document AI Editor
role <https://cloud.google.com/document-ai/docs/access-control/iam-roles>
on the source project.
The destination project is specified as part of the parent field. The source project is specified as part of the source or external_processor_version_source field.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
ExternalProcessorVersionSource
The external source processor version.
ImportProcessorVersionResponse
The response message for the ImportProcessorVersion method.
ListDocumentsRequest
ListDocumentsResponse
ListEvaluationsRequest
Retrieves a list of evaluations for a given ProcessorVersion.
ListEvaluationsResponse
The response from ListEvaluations
.
ListProcessorTypesRequest
Request message for the ListProcessorTypes method. Some processor types may require the project be added to an allowlist.
ListProcessorTypesResponse
Response message for the ListProcessorTypes method.
ListProcessorVersionsRequest
Request message for list all processor versions belongs to a processor.
ListProcessorVersionsResponse
Response message for the ListProcessorVersions method.
ListProcessorsRequest
Request message for list all processors belongs to a project.
ListProcessorsResponse
Response message for the ListProcessors method.
NormalizedVertex
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
OcrConfig
Config for Document OCR.
Hints
Hints for OCR Engine
PremiumFeatures
Configurations for premium OCR features.
ProcessOptions
Options for Process API
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
IndividualPageSelector
A list of individual page numbers.
LayoutConfig
Serving config for layout parser processor.
ChunkingConfig
Serving config for chunking.
ProcessRequest
Request message for the ProcessDocument method.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
LabelsEntry
The abstract base class for a message.
ProcessResponse
Response message for the ProcessDocument method.
Processor
The first-class citizen for Document AI. Each processor defines how to extract structural information from a document.
State
The possible states of the processor.
ProcessorType
A processor type is responsible for performing a certain document understanding task on a certain type of document.
LocationInfo
The location information about where the processor is available.
ProcessorVersion
A processor version is an implementation of a processor. Each processor can have multiple versions, pretrained by Google internally or uptrained by the customer. A processor can only have one default version at a time. Its document-processing behavior is defined by that version.
DeprecationInfo
Information about the upcoming deprecation of this processor version.
ModelType
The possible model types of the processor version.
State
The possible states of the processor version.
ProcessorVersionAlias
Contains the alias and the aliased resource name of processor version.
PropertyMetadata
Metadata about a property.
RawDocument
Payload message of raw document content (bytes).
ReviewDocumentOperationMetadata
The long-running operation metadata for the ReviewDocument method.
State
State of the long-running operation.
ReviewDocumentRequest
Request message for the ReviewDocument method.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Priority
The priority level of the human review task.
ReviewDocumentResponse
Response message for the ReviewDocument method.
State
Possible states of the review operation.
RevisionRef
The revision reference specifies which revision on the document to read.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
RevisionCase
Some predefined revision cases.
SetDefaultProcessorVersionMetadata
The long-running operation metadata for the SetDefaultProcessorVersion method.
SetDefaultProcessorVersionRequest
Request message for the SetDefaultProcessorVersion method.
SetDefaultProcessorVersionResponse
Response message for the SetDefaultProcessorVersion method.
SummaryOptions
Metadata for document summarization.
Format
The Format enum.
Length
The Length enum.
TrainProcessorVersionMetadata
The metadata that represents a processor version being created.
DatasetValidation
The dataset validation information. This includes any and all errors with documents and the dataset.
TrainProcessorVersionRequest
Request message for the TrainProcessorVersion method.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
CustomDocumentExtractionOptions
Options to control the training of the Custom Document Extraction (CDE) Processor.
TrainingMethod
Training Method for CDE. TRAINING_METHOD_UNSPECIFIED
will fall
back to MODEL_BASED
.
FoundationModelTuningOptions
Options to control foundation model tuning of the processor.
InputData
The input data used to train a new ProcessorVersion.
TrainProcessorVersionResponse
The response for TrainProcessorVersion.
UndeployProcessorVersionMetadata
The long-running operation metadata for the UndeployProcessorVersion method.
UndeployProcessorVersionRequest
Request message for the UndeployProcessorVersion method.
UndeployProcessorVersionResponse
Response message for the UndeployProcessorVersion method.
UpdateDatasetOperationMetadata
UpdateDatasetRequest
UpdateDatasetSchemaRequest
Request for UpdateDatasetSchema
.
Vertex
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
Modules
pagers
API documentation for documentai_v1.services.document_processor_service.pagers
module.
pagers
API documentation for documentai_v1beta3.services.document_processor_service.pagers
module.
pagers
API documentation for documentai_v1beta3.services.document_service.pagers
module.