Index
DocumentProcessorService
(interface)DocumentService
(interface)Barcode
(message)BatchDatasetDocuments
(message)BatchDatasetDocuments.IndividualDocumentIds
(message)BatchDeleteDocumentsMetadata
(message)BatchDeleteDocumentsMetadata.IndividualBatchDeleteStatus
(message)BatchDeleteDocumentsRequest
(message)BatchDeleteDocumentsResponse
(message)BatchDocumentsInputConfig
(message)BatchProcessMetadata
(message)BatchProcessMetadata.IndividualProcessStatus
(message)BatchProcessMetadata.State
(enum)BatchProcessRequest
(message)BatchProcessRequest.BatchInputConfig
(message) (deprecated)BatchProcessRequest.BatchOutputConfig
(message) (deprecated)BatchProcessResponse
(message)BoundingPoly
(message)CommonOperationMetadata
(message)CommonOperationMetadata.State
(enum)CreateProcessorRequest
(message)Dataset
(message)Dataset.DocumentWarehouseConfig
(message)Dataset.GCSManagedConfig
(message)Dataset.SpannerIndexingConfig
(message)Dataset.State
(enum)Dataset.UnmanagedDatasetConfig
(message)DatasetSchema
(message)DatasetSplitType
(enum)DeleteProcessorMetadata
(message)DeleteProcessorRequest
(message)DeleteProcessorVersionMetadata
(message)DeleteProcessorVersionRequest
(message)DeployProcessorVersionMetadata
(message)DeployProcessorVersionRequest
(message)DeployProcessorVersionResponse
(message)DisableProcessorMetadata
(message)DisableProcessorRequest
(message)DisableProcessorResponse
(message)Document
(message)Document.Entity
(message)Document.Entity.NormalizedValue
(message)Document.EntityRelation
(message)Document.Page
(message)Document.Page.Block
(message)Document.Page.DetectedBarcode
(message)Document.Page.DetectedLanguage
(message)Document.Page.Dimension
(message)Document.Page.FormField
(message)Document.Page.Image
(message)Document.Page.ImageQualityScores
(message)Document.Page.ImageQualityScores.DetectedDefect
(message)Document.Page.Layout
(message)Document.Page.Layout.Orientation
(enum)Document.Page.Line
(message)Document.Page.Matrix
(message)Document.Page.Paragraph
(message)Document.Page.Symbol
(message)Document.Page.Table
(message)Document.Page.Table.TableCell
(message)Document.Page.Table.TableRow
(message)Document.Page.Token
(message)Document.Page.Token.DetectedBreak
(message)Document.Page.Token.DetectedBreak.Type
(enum)Document.Page.Token.StyleInfo
(message)Document.Page.VisualElement
(message)Document.PageAnchor
(message)Document.PageAnchor.PageRef
(message)Document.PageAnchor.PageRef.LayoutType
(enum)Document.Provenance
(message)Document.Provenance.OperationType
(enum)Document.Provenance.Parent
(message)Document.Revision
(message)Document.Revision.HumanReview
(message)Document.ShardInfo
(message)Document.Style
(message)Document.Style.FontSize
(message)Document.TextAnchor
(message)Document.TextAnchor.TextSegment
(message)Document.TextChange
(message)DocumentId
(message)DocumentId.GCSManagedDocumentId
(message)DocumentId.UnmanagedDocumentId
(message)DocumentLabelingState
(enum)DocumentMetadata
(message)DocumentOutputConfig
(message)DocumentOutputConfig.GcsOutputConfig
(message)DocumentOutputConfig.GcsOutputConfig.ShardingConfig
(message)DocumentPageRange
(message)DocumentSchema
(message)DocumentSchema.EntityType
(message)DocumentSchema.EntityType.EnumValues
(message)DocumentSchema.EntityType.Property
(message)DocumentSchema.EntityType.Property.OccurrenceType
(enum)DocumentSchema.Metadata
(message)EnableProcessorMetadata
(message)EnableProcessorRequest
(message)EnableProcessorResponse
(message)EntityTypeMetadata
(message)EvaluateProcessorVersionMetadata
(message)EvaluateProcessorVersionRequest
(message)EvaluateProcessorVersionResponse
(message)Evaluation
(message)Evaluation.ConfidenceLevelMetrics
(message)Evaluation.Counters
(message)Evaluation.Metrics
(message)Evaluation.MultiConfidenceMetrics
(message)Evaluation.MultiConfidenceMetrics.MetricsType
(enum)EvaluationReference
(message)FetchProcessorTypesRequest
(message)FetchProcessorTypesResponse
(message)FieldExtractionMetadata
(message)GcsDocument
(message)GcsDocuments
(message)GcsPrefix
(message)GetDatasetSchemaRequest
(message)GetDocumentRequest
(message)GetDocumentResponse
(message)GetEvaluationRequest
(message)GetProcessorRequest
(message)GetProcessorTypeRequest
(message)GetProcessorVersionRequest
(message)HumanReviewStatus
(message)HumanReviewStatus.State
(enum)ImportDocumentsMetadata
(message)ImportDocumentsMetadata.ImportConfigValidationResult
(message)ImportDocumentsMetadata.IndividualImportStatus
(message)ImportDocumentsRequest
(message)ImportDocumentsRequest.BatchDocumentsImportConfig
(message)ImportDocumentsRequest.BatchDocumentsImportConfig.AutoSplitConfig
(message)ImportDocumentsResponse
(message)ImportProcessorVersionMetadata
(message)ImportProcessorVersionRequest
(message)ImportProcessorVersionRequest.ExternalProcessorVersionSource
(message)ImportProcessorVersionResponse
(message)ListDocumentsRequest
(message)ListDocumentsResponse
(message)ListEvaluationsRequest
(message)ListEvaluationsResponse
(message)ListProcessorTypesRequest
(message)ListProcessorTypesResponse
(message)ListProcessorVersionsRequest
(message)ListProcessorVersionsResponse
(message)ListProcessorsRequest
(message)ListProcessorsResponse
(message)NormalizedVertex
(message)OcrConfig
(message)OcrConfig.Hints
(message)OcrConfig.PremiumFeatures
(message)ProcessOptions
(message)ProcessOptions.IndividualPageSelector
(message)ProcessRequest
(message)ProcessResponse
(message)Processor
(message)Processor.State
(enum)ProcessorType
(message)ProcessorType.LocationInfo
(message)ProcessorVersion
(message)ProcessorVersion.DeprecationInfo
(message)ProcessorVersion.State
(enum)ProcessorVersionAlias
(message)PropertyMetadata
(message)RawDocument
(message)ReviewDocumentOperationMetadata
(message)ReviewDocumentOperationMetadata.State
(enum)ReviewDocumentRequest
(message)ReviewDocumentRequest.Priority
(enum)ReviewDocumentResponse
(message)ReviewDocumentResponse.State
(enum)RevisionRef
(message)RevisionRef.RevisionCase
(enum)SetDefaultProcessorVersionMetadata
(message)SetDefaultProcessorVersionRequest
(message)SetDefaultProcessorVersionResponse
(message)SummaryOptions
(message)SummaryOptions.Format
(enum)SummaryOptions.Length
(enum)TrainProcessorVersionMetadata
(message)TrainProcessorVersionMetadata.DatasetValidation
(message)TrainProcessorVersionRequest
(message)TrainProcessorVersionRequest.CustomDocumentExtractionOptions
(message)TrainProcessorVersionRequest.CustomDocumentExtractionOptions.TrainingMethod
(enum)TrainProcessorVersionRequest.InputData
(message)TrainProcessorVersionResponse
(message)UndeployProcessorVersionMetadata
(message)UndeployProcessorVersionRequest
(message)UndeployProcessorVersionResponse
(message)UpdateDatasetOperationMetadata
(message)UpdateDatasetRequest
(message)UpdateDatasetSchemaRequest
(message)Vertex
(message)
DocumentProcessorService
Service to call Document AI to process documents according to the processor's definition. Processors are built using state-of-the-art Google AI such as natural language, computer vision, and translation to extract structured information from unstructured or semi-structured documents.
BatchProcessDocuments |
---|
LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
|
CreateProcessor |
---|
Creates a processor from the
|
DeleteProcessor |
---|
Deletes the processor, unloads all deployed model artifacts if it was enabled and then deletes all artifacts associated with this processor.
|
DeleteProcessorVersion |
---|
Deletes the processor version, all artifacts under the processor version will be deleted.
|
DeployProcessorVersion |
---|
Deploys the processor version.
|
DisableProcessor |
---|
Disables a processor
|
EnableProcessor |
---|
Enables a processor
|
EvaluateProcessorVersion |
---|
Evaluates a ProcessorVersion against annotated documents, producing an Evaluation.
|
FetchProcessorTypes |
---|
Fetches processor types. Note that we don't use
|
GetEvaluation |
---|
Retrieves a specific evaluation.
|
GetProcessor |
---|
Gets a processor detail.
|
GetProcessorType |
---|
Gets a processor type detail.
|
GetProcessorVersion |
---|
Gets a processor version detail.
|
ImportProcessorVersion |
---|
Imports a processor version from source processor version.
|
ListEvaluations |
---|
Retrieves a set of evaluations for a given processor version.
|
ListProcessorTypes |
---|
Lists the processor types that exist.
|
ListProcessorVersions |
---|
Lists all versions of a processor.
|
ListProcessors |
---|
Lists all processors which belong to this project.
|
ProcessDocument |
---|
Processes a single document.
|
ReviewDocument |
---|
Send a document for Human Review. The input document should be processed by the specified processor.
|
SetDefaultProcessorVersion |
---|
Set the default (active) version of a
|
TrainProcessorVersion |
---|
Trains a new processor version. Operation metadata is returned as
|
UndeployProcessorVersion |
---|
Undeploys the processor version.
|
DocumentService
Service to call Cloud DocumentAI to manage document collection (dataset).
BatchDeleteDocuments |
---|
Deletes a set of documents.
|
GetDatasetSchema |
---|
Gets the
|
GetDocument |
---|
Returns relevant fields present in the requested document.
|
ImportDocuments |
---|
Import documents into a dataset.
|
ListDocuments |
---|
Returns a list of documents present in the dataset.
|
UpdateDataset |
---|
Updates metadata associated with a dataset.
|
UpdateDatasetSchema |
---|
Updates a
|
Barcode
Encodes the detailed information of a barcode.
Fields | |
---|---|
format |
Format of a barcode. The supported formats are:
|
value_format |
Value format describes the format of the value that a barcode encodes. The supported formats are:
|
raw_value |
Raw value encoded in the barcode. For example: |
BatchDatasetDocuments
Dataset documents that the batch operation will be applied to.
Fields | |
---|---|
Union field
|
|
individual_document_ids |
Document identifiers. |
filter |
A filter matching the documents. Follows the same format and restriction as [google.cloud.documentai.master.ListDocumentsRequest.filter]. |
IndividualDocumentIds
List of individual DocumentIds.
Fields | |
---|---|
document_ids[] |
Required. List of Document IDs indicating where the actual documents are stored. |
BatchDeleteDocumentsMetadata
Fields | |
---|---|
common_metadata |
The basic metadata of the long-running operation. |
individual_batch_delete_statuses[] |
The list of response details of each document. |
total_document_count |
Total number of documents deleting from dataset. |
error_document_count |
Total number of documents that failed to be deleted in storage. |
IndividualBatchDeleteStatus
The status of each individual document in the batch delete process.
Fields | |
---|---|
document_id |
The document id of the document. |
status |
The status of deleting the document in storage. |
BatchDeleteDocumentsRequest
Fields | |
---|---|
dataset |
Required. The dataset resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset |
dataset_documents |
Required. Dataset documents input. If given |
BatchDeleteDocumentsResponse
This type has no fields.
Response of the delete documents operation.
BatchDocumentsInputConfig
The common config to specify a set of documents used as input.
Fields | |
---|---|
Union field source . The source. source can be only one of the following: |
|
gcs_prefix |
The set of documents that match the specified Cloud Storage |
gcs_documents |
The set of documents individually specified on Cloud Storage. |
BatchProcessMetadata
The long-running operation metadata for BatchProcessDocuments
.
Fields | |
---|---|
state |
The state of the current batch processing. |
state_message |
A message providing more details about the current state of processing. For example, the error message if the operation is failed. |
create_time |
The creation time of the operation. |
update_time |
The last update time of the operation. |
individual_process_statuses[] |
The list of response details of each document. |
IndividualProcessStatus
The status of a each individual document in the batch process.
Fields | |
---|---|
input_gcs_source |
The source of the document, same as the |
status |
The status processing the document. |
output_gcs_destination |
The Cloud Storage output destination (in the request as |
human_review_operation |
The name of the operation triggered by the processed document. If the human review process isn't triggered, this field will be empty. It has the same response type and metadata as the long-running operation returned by the |
human_review_status |
The status of human review on the processed document. |
State
Possible states of the batch processing operation.
Enums | |
---|---|
STATE_UNSPECIFIED |
The default value. This value is used if the state is omitted. |
WAITING |
Request operation is waiting for scheduling. |
RUNNING |
Request is being processed. |
SUCCEEDED |
The batch processing completed successfully. |
CANCELLING |
The batch processing was being cancelled. |
CANCELLED |
The batch processing was cancelled. |
FAILED |
The batch processing has failed. |
BatchProcessRequest
Request message for BatchProcessDocuments
.
Fields | |
---|---|
name |
Required. The resource name of Authorization requires one or more of the following IAM permissions on the specified resource
|
input_configs[] |
The input config for each single document in the batch process. |
output_config |
The overall output config for batch process. |
input_documents |
The input documents for the |
document_output_config |
The output configuration for the |
skip_human_review |
Whether human review should be skipped for this request. Default to |
process_options |
Inference-time options for the process API |
BatchInputConfig
The message for input config in batch process.
Fields | |
---|---|
gcs_source |
The Cloud Storage location as the source of the document. |
mime_type |
An IANA published media type (MIME type) of the input. If the input is a raw document, refer to supported file types for the list of media types. If the input is a |
BatchOutputConfig
The output configuration in the BatchProcessDocuments
method.
Fields | |
---|---|
gcs_destination |
The output Cloud Storage directory to put the processed documents. |
BatchProcessResponse
This type has no fields.
Response message for BatchProcessDocuments
.
BoundingPoly
A bounding polygon for the detected image annotation.
Fields | |
---|---|
vertices[] |
The bounding polygon vertices. |
normalized_vertices[] |
The bounding polygon normalized vertices. |
CommonOperationMetadata
The common metadata for long running operations.
Fields | |
---|---|
state |
The state of the operation. |
state_message |
A message providing more details about the current state of processing. |
resource |
A related resource to this operation. |
create_time |
The creation time of the operation. |
update_time |
The last update time of the operation. |
State
State of the longrunning operation.
Enums | |
---|---|
STATE_UNSPECIFIED |
Unspecified state. |
RUNNING |
Operation is still running. |
CANCELLING |
Operation is being cancelled. |
SUCCEEDED |
Operation succeeded. |
FAILED |
Operation failed. |
CANCELLED |
Operation is cancelled. |
CreateProcessorRequest
Request message for the CreateProcessor
method. Notice this request is sent to a regionalized backend service. If the ProcessorType
isn't available in that region, the creation fails.
Fields | |
---|---|
parent |
Required. The parent (project and location) under which to create the processor. Format: |
processor |
Required. The processor to be created, requires Authorization requires the following IAM permission on the specified resource
|
Dataset
A singleton resource under a Processor
which configures a collection of documents.
Fields | |
---|---|
name |
Dataset resource name. Format: |
state |
Required. State of the dataset. Ignored when updating dataset. |
Union field
|
|
gcs_managed_config |
Optional. User-managed Cloud Storage dataset configuration. Use this configuration if the dataset documents are stored under a user-managed Cloud Storage location. |
document_warehouse_config |
Optional. Document AI Warehouse-based dataset configuration. |
unmanaged_dataset_config |
Optional. Unmanaged dataset configuration. Use this configuration if the dataset documents are managed by the document service internally (not user-managed). |
Union field
|
|
spanner_indexing_config |
Optional. A lightweight indexing source with low latency and high reliability, but lacking advanced features like CMEK and content-based search. |
DocumentWarehouseConfig
Configuration specific to the Document AI Warehouse-based implementation.
Fields | |
---|---|
collection |
Output only. The collection in Document AI Warehouse associated with the dataset. |
schema |
Output only. The schema in Document AI Warehouse associated with the dataset. |
GCSManagedConfig
Configuration specific to the Cloud Storage-based implementation.
Fields | |
---|---|
gcs_prefix |
Required. The Cloud Storage URI (a directory) where the documents belonging to the dataset must be stored. |
SpannerIndexingConfig
This type has no fields.
Configuration specific to spanner-based indexing.
State
Different states of a dataset.
Enums | |
---|---|
STATE_UNSPECIFIED |
Default unspecified enum, should not be used. |
UNINITIALIZED |
Dataset has not been initialized. |
INITIALIZING |
Dataset is being initialized. |
INITIALIZED |
Dataset has been initialized. |
UnmanagedDatasetConfig
This type has no fields.
Configuration specific to an unmanaged dataset.
DatasetSchema
Dataset Schema.
Fields | |
---|---|
name |
Dataset schema resource name. Format: |
document_schema |
Optional. Schema of the dataset. |
DatasetSplitType
Documents belonging to a dataset will be split into different groups referred to as splits: train, test.
Enums | |
---|---|
DATASET_SPLIT_TYPE_UNSPECIFIED |
Default value if the enum is not set. |
DATASET_SPLIT_TRAIN |
Identifies the train documents. |
DATASET_SPLIT_TEST |
Identifies the test documents. |
DATASET_SPLIT_UNASSIGNED |
Identifies the unassigned documents. |
DeleteProcessorMetadata
The long-running operation metadata for the DeleteProcessor
method.
Fields | |
---|---|
common_metadata |
The basic metadata of the long-running operation. |
DeleteProcessorRequest
Request message for the DeleteProcessor
method.
Fields | |
---|---|
name |
Required. The processor resource name to be deleted. Authorization requires the following IAM permission on the specified resource
|
DeleteProcessorVersionMetadata
The long-running operation metadata for the DeleteProcessorVersion
method.
Fields | |
---|---|
common_metadata |
The basic metadata of the long-running operation. |
DeleteProcessorVersionRequest
Request message for the DeleteProcessorVersion
method.
Fields | |
---|---|
name |
Required. The processor version resource name to be deleted. Authorization requires the following IAM permission on the specified resource
|
DeployProcessorVersionMetadata
The long-running operation metadata for the DeployProcessorVersion
method.
Fields | |
---|---|
common_metadata |
The basic metadata of the long-running operation. |
DeployProcessorVersionRequest
Request message for the DeployProcessorVersion
method.
Fields | |
---|---|
name |
Required. The processor version resource name to be deployed. Authorization requires the following IAM permission on the specified resource
|
DeployProcessorVersionResponse
This type has no fields.
Response message for the DeployProcessorVersion
method.
DisableProcessorMetadata
The long-running operation metadata for the DisableProcessor
method.
Fields | |
---|---|
common_metadata |
The basic metadata of the long-running operation. |
DisableProcessorRequest
Request message for the DisableProcessor
method.
Fields | |
---|---|
name |
Required. The processor resource name to be disabled. Authorization requires the following IAM permission on the specified resource
|
DisableProcessorResponse
This type has no fields.
Response message for the DisableProcessor
method. Intentionally empty proto for adding fields in future.
Document
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
Fields | |
---|---|
mime_type |
An IANA published media type (MIME type). |
text |
Optional. UTF-8 encoded text in reading order from the document. |
text_styles[] |
Styles for the |
pages[] |
Visual page layout for the |
entities[] |
A list of entities detected on |
entity_relations[] |
Placeholder. Relationship among |
text_changes[] |
Placeholder. A list of text corrections made to |
shard_info |
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified. |
error |
Any error that occurred while processing this document. |
revisions[] |
Placeholder. Revision history of this document. |
Union field source . Original source document from the user. source can be only one of the following: |
|
uri |
Optional. Currently supports Google Cloud Storage URI of the form |
content |
Optional. Inline document content, represented as a stream of bytes. Note: As with all |
Entity
An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.
Fields | |
---|---|
text_anchor |
Optional. Provenance of the entity. Text anchor indexing into the |
type |
Required. Entity type from a schema e.g. |
mention_text |
Optional. Text value of the entity e.g. |
mention_id |
Optional. Deprecated. Use |
confidence |
Optional. Confidence of detected Schema entity. Range |
page_anchor |
Optional. Represents the provenance of this entity wrt. the location on the page where it was found. |
id |
Optional. Canonical id. This will be a unique value in the entity list for this document. |
normalized_value |
Optional. Normalized entity value. Absent if the extracted value could not be converted or the type (e.g. address) is not supported for certain parsers. This field is also only populated for certain supported document types. |
properties[] |
Optional. Entities can be nested to form a hierarchical data structure representing the content in the document. |
provenance |
Optional. The history of this annotation. |
redacted |
Optional. Whether the entity will be redacted for de-identification purposes. |
NormalizedValue
Parsed and normalized entity value.
Fields | |
---|---|
text |
Optional. An optional field to store a normalized string. For some entity types, one of respective Below are sample formats mapped to structured values.
|
Union field structured_value . An optional structured entity value. Must match entity type defined in schema if known. If this field is present, the text field could also be populated. structured_value can be only one of the following: |
|
money_value |
Money value. See also: https://github.com/googleapis/googleapis/blob/master/google/type/money.proto |
date_value |
Date value. Includes year, month, day. See also: https://github.com/googleapis/googleapis/blob/master/google/type/date.proto |
datetime_value |
DateTime value. Includes date, time, and timezone. See also: https://github.com/googleapis/googleapis/blob/master/google/type/datetime.proto |
address_value |
Postal address. See also: https://github.com/googleapis/googleapis/blob/master/google/type/postal_address.proto |
boolean_value |
Boolean value. Can be used for entities with binary values, or for checkboxes. |
integer_value |
Integer value. |
float_value |
Float value. |
EntityRelation
Relationship between Entities
.
Fields | |
---|---|
subject_id |
Subject entity id. |
object_id |
Object entity id. |
relation |
Relationship description. |
Page
A page in a Document
.
Fields | |
---|---|
page_number |
1-based index for current |
image |
Rendered image for this page. This image is preprocessed to remove any skew, rotation, and distortions such that the annotation bounding boxes can be upright and axis-aligned. |
transforms[] |
Transformation matrices that were applied to the original document image to produce |
dimension |
Physical dimension of the page. |
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
blocks[] |
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation. |
paragraphs[] |
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph. |
lines[] |
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line. |
tokens[] |
A list of visually detected tokens on the page. |
visual_elements[] |
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page. |
tables[] |
A list of visually detected tables on the page. |
form_fields[] |
A list of visually detected form fields on the page. |
symbols[] |
A list of visually detected symbols on the page. |
detected_barcodes[] |
A list of detected barcodes. |
image_quality_scores |
Image quality scores. |
provenance |
The history of this page. |
Block
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
DetectedBarcode
A detected barcode.
Fields | |
---|---|
layout |
|
barcode |
Detailed barcode information of the |
DetectedLanguage
Detected language for a structural component.
Fields | |
---|---|
language_code |
The BCP-47 language code, such as |
confidence |
Confidence of detected language. Range |
Dimension
Dimension for the page.
Fields | |
---|---|
width |
Page width. |
height |
Page height. |
unit |
Dimension unit. |
FormField
A form field detected on the page.
Fields | |
---|---|
field_name |
|
field_value |
|
name_detected_languages[] |
A list of detected languages for name together with confidence. |
value_detected_languages[] |
A list of detected languages for value together with confidence. |
value_type |
If the value is non-textual, this field represents the type. Current valid values are:
|
corrected_key_text |
Created for Labeling UI to export key text. If corrections were made to the text identified by the |
corrected_value_text |
Created for Labeling UI to export value text. If corrections were made to the text identified by the |
provenance |
The history of this annotation. |
Image
Rendered image contents for this page.
Fields | |
---|---|
content |
Raw byte content of the image. |
mime_type |
Encoding media type (MIME type) for the image. |
width |
Width of the image in pixels. |
height |
Height of the image in pixels. |
ImageQualityScores
Image quality scores for the page image.
Fields | |
---|---|
quality_score |
The overall quality score. Range |
detected_defects[] |
A list of detected defects. |
DetectedDefect
Image Quality Defects
Fields | |
---|---|
type |
Name of the defect type. Supported values are:
|
confidence |
Confidence of detected defect. Range |
Layout
Visual element describing a layout unit on a page.
Fields | |
---|---|
text_anchor |
Text anchor indexing into the |
confidence |
Confidence of the current |
bounding_poly |
The bounding polygon for the |
orientation |
Detected orientation for the |
Orientation
Detected human reading orientation.
Enums | |
---|---|
ORIENTATION_UNSPECIFIED |
Unspecified orientation. |
PAGE_UP |
Orientation is aligned with page up. |
PAGE_RIGHT |
Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read. |
PAGE_DOWN |
Orientation is aligned with page down. Turn the head 180 degrees from upright to read. |
PAGE_LEFT |
Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read. |
Line
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
Matrix
Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
Fields | |
---|---|
rows |
Number of rows in the matrix. |
cols |
Number of columns in the matrix. |
type |
This encodes information about what data type the matrix uses. For example, 0 (CV_8U) is an unsigned 8-bit image. For the full list of OpenCV primitive data types, please refer to https://docs.opencv.org/4.3.0/d1/d1b/group__core__hal__interface.html |
data |
The matrix data. |
Paragraph
A collection of lines that a human would perceive as a paragraph.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this annotation. |
Symbol
A detected symbol.
Fields | |
---|---|
layout |
|
detected_languages[] |
A list of detected languages together with confidence. |
Table
A table representation similar to HTML table structure.
Fields | |
---|---|
layout |
|
header_rows[] |
Header rows of the table. |
body_rows[] |
Body rows of the table. |
detected_languages[] |
A list of detected languages together with confidence. |
provenance |
The history of this table. |
TableCell
A cell representation inside the table.
Fields | |
---|---|
layout |
|
row_span |
How many rows this cell spans. |
col_span |
How many columns this cell spans. |
detected_languages[] |
A list of detected languages together with confidence. |
TableRow
A row of table cells.
Fields | |
---|---|
cells[] |
Cells that make up this row. |
Token
A detected token.
Fields | |
---|---|
layout |