- 3.1.0 (latest)
- 3.0.1
- 2.35.0
- 2.34.0
- 2.33.0
- 2.32.0
- 2.30.0
- 2.29.3
- 2.28.0
- 2.27.1
- 2.26.0
- 2.25.0
- 2.24.2
- 2.23.0
- 2.22.0
- 2.21.1
- 2.20.2
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.1
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.1
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.1
- 2.3.0
- 2.2.0
- 2.1.0
- 2.0.3
- 1.5.1
- 1.4.2
- 1.3.0
- 1.2.1
- 1.1.0
- 1.0.0
- 0.5.2
- 0.4.0
- 0.3.0
- 0.2.0
- 0.1.0
Client for Google Cloud Documentai API
class google.cloud.documentai_v1beta2.AutoMlParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Parameters to control AutoML model prediction behavior.
model()
Resource name of the AutoML model.
Format:
projects/{project-id}/locations/{location-id}/models/{model-id}
.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.BatchProcessDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.
requests()
Required. Individual requests for each document.
Type
Sequence[ProcessDocumentRequest]
parent()
Target project and location to make a call.
Format: projects/{project-id}/locations/{location-id}
.
If no location is specified, a region will be chosen automatically.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.BatchProcessDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.
responses()
Responses for each individual document.
Type
Sequence[ProcessDocumentResponse]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.BoundingPoly(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A bounding polygon for the detected image annotation.
vertices()
The bounding polygon vertices.
Type
Sequence[Vertex]
normalized_vertices()
The bounding polygon normalized vertices.
Type
Sequence[NormalizedVertex]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.
uri()
Currently supports Google Cloud Storage URI of the form
gs://bucket_name/object_name
. Object versioning is not
supported. See Google Cloud Storage Request
URIs
for more info.
Type
content()
Inline document content, represented as a stream of bytes.
Note: As with all bytes
fields, protobuffers use a pure
binary representation, whereas JSON representations use
base64.
Type
mime_type()
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media- types/media-types.xhtml.
Type
text()
UTF-8 encoded text in reading order from the document.
Type
text_styles()
Styles for the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
Sequence[Style]
pages()
Visual page layout for the [Document][google.cloud.documentai.v1beta2.Document].
Type
Sequence[Page]
entities()
A list of entities detected on [Document.text][google.cloud.documentai.v1beta2.Document.text]. For document shards, entities in this list may cross shard boundaries.
Type
Sequence[Entity]
entity_relations()
Relationship among [Document.entities][google.cloud.documentai.v1beta2.Document.entities].
Type
Sequence[EntityRelation]
shard_info()
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
Type
ShardInfo
labels()
[Label][google.cloud.documentai.v1beta2.Document.Label]s for this document.
Type
Sequence[Label]
error()
Any error that occurred while processing this document.
Type
Status
class Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A phrase in the text that is a known entity type, such as a person, an organization, or location.
text_anchor()
Provenance of the entity. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
TextAnchor
type()
Entity type from a schema e.g. Address
.
Type
mention_text()
Text value in the document e.g. 1600 Amphitheatre Pkwy
.
Type
mention_id()
Deprecated. Use id
field instead.
Type
confidence()
Optional. Confidence of detected Schema entity. Range [0, 1].
Type
page_anchor()
Optional. Represents the provenance of this entity wrt. the location on the page where it was found.
Type
PageAnchor
id()
Optional. Canonical id. This will be a unique value in the entity list for this document.
Type
bounding_poly_for_demo_frontend()
Optional. Temporary field to store the bounding poly for short-term POCs. Used by the frontend only. Do not use before you talk to ybo@ and lukasr@.
Type
BoundingPoly
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Relationship between [Entities][google.cloud.documentai.v1beta2.Document.Entity].
subject_id()
Subject entity id.
Type
object_id()
Object entity id.
Type
relation()
Relationship description.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Label(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Label attaches schema information and/or other metadata to segments within a [Document][google.cloud.documentai.v1beta2.Document]. Multiple [Label][google.cloud.documentai.v1beta2.Document.Label]s on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.
automl_model()
Label is generated AutoML model. This field stores the full resource name of the AutoML model.
Format:
projects/{project-id}/locations/{location-id}/models/{model-id}
Type
name()
Name of the label. When the label is generated from AutoML Text Classification model, this field represents the name of the category.
Type
confidence()
Confidence score between 0 and 1 for label assignment.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A page in a [Document][google.cloud.documentai.v1beta2.Document].
page_number()
1-based index for current [Page][google.cloud.documentai.v1beta2.Document.Page] in a parent [Document][google.cloud.documentai.v1beta2.Document]. Useful when a page is taken out of a [Document][google.cloud.documentai.v1beta2.Document] for individual processing.
Type
dimension()
Physical dimension of the page.
Type
Dimension
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for the page.
Type
Layout
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
blocks()
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
Type
Sequence[Block]
paragraphs()
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.
Type
Sequence[Paragraph]
lines()
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.
Type
Sequence[Line]
tokens()
A list of visually detected tokens on the page.
Type
Sequence[Token]
visual_elements()
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.
Type
Sequence[VisualElement]
tables()
A list of visually detected tables on the page.
Type
Sequence[Table]
form_fields()
A list of visually detected form fields on the page.
Type
Sequence[FormField]
class Block(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Block][google.cloud.documentai.v1beta2.Document.Page.Block].
Type
Layout
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class DetectedLanguage(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Detected language for a structural component.
language_code()
The BCP-47 language code, such as “en-US” or “sr-Latn”. For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.
Type
confidence()
Confidence of detected language. Range [0, 1].
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Dimension(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Dimension for the page.
width()
Page width.
Type
height()
Page height.
Type
unit()
Dimension unit.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class FormField(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A form field detected on the page.
field_name()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout]
for the
[FormField][google.cloud.documentai.v1beta2.Document.Page.FormField]
name. e.g. Address
, Email
, Grand total
,
Phone number
, etc.
Type
Layout
field_value()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta2.Document.Page.FormField] value.
Type
Layout
name_detected_languages()
A list of detected languages for name together with confidence.
Type
Sequence[DetectedLanguage]
value_detected_languages()
A list of detected languages for value together with confidence.
Type
Sequence[DetectedLanguage]
value_type()
If the value is non-textual, this field represents the type. Current valid values are:
blank (this indicates the field_value is normal text)
“unfilled_checkbox”
“filled_checkbox”.
Type
corrected_key_text()
An internal field, created for Labeling UI to export key text.
Type
corrected_value_text()
An internal field, created for Labeling UI to export value text.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Layout(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Visual element describing a layout unit on a page.
text_anchor()
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
TextAnchor
confidence()
Confidence of the current [Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].
Type
bounding_poly()
The bounding polygon for the [Layout][google.cloud.documentai.v1beta2.Document.Page.Layout].
Type
BoundingPoly
orientation()
Detected orientation for the [Layout][google.cloud.documentai.v1beta2.Document.Page.Layout].
Type
Orientation
id()
Optional. This is the identifier used by referencing [PageAnchor][google.cloud.documentai.v1beta2.Document.PageAnchor]s.
Type
class Orientation(value)
Detected human reading orientation.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Line(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Line][google.cloud.documentai.v1beta2.Document.Page.Line].
Type
Layout
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Paragraph(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A collection of lines that a human would perceive as a paragraph.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Paragraph][google.cloud.documentai.v1beta2.Document.Page.Paragraph].
Type
Layout
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Table(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A table representation similar to HTML table structure.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Table][google.cloud.documentai.v1beta2.Document.Page.Table].
Type
Layout
header_rows()
Header rows of the table.
Type
Sequence[TableRow]
body_rows()
Body rows of the table.
Type
Sequence[TableRow]
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
class TableCell(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A cell representation inside the table.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [TableCell][google.cloud.documentai.v1beta2.Document.Page.Table.TableCell].
Type
Layout
row_span()
How many rows this cell spans.
Type
col_span()
How many columns this cell spans.
Type
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class TableRow(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A row of table cells.
cells()
Cells that make up this row.
Type
Sequence[TableCell]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Token(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A detected token.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta2.Document.Page.Token].
Type
Layout
detected_break()
Detected break at the end of a [Token][google.cloud.documentai.v1beta2.Document.Page.Token].
Type
DetectedBreak
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
class DetectedBreak(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Detected break at the end of a [Token][google.cloud.documentai.v1beta2.Document.Page.Token].
type()
Detected break type.
Type
Type
class Type(value)
Enum to denote the type of break found.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class VisualElement(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
layout()
[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [VisualElement][google.cloud.documentai.v1beta2.Document.Page.VisualElement].
Type
Layout
type()
Type of the [VisualElement][google.cloud.documentai.v1beta2.Document.Page.VisualElement].
Type
detected_languages()
A list of detected languages together with confidence.
Type
Sequence[DetectedLanguage]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Referencing elements in [Document.pages][google.cloud.documentai.v1beta2.Document.pages].
page_refs()
One or more references to visual page elements
Type
Sequence[PageRef]
class PageRef(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Represents a weak reference to a page element within a document.
page()
Required. Index into the [Document.pages][google.cloud.documentai.v1beta2.Document.pages] element
Type
layout_type()
Optional. The type of the layout element that is being referenced. If not specified the whole page is assumed to be referenced.
Type
LayoutType
layout_id()
Optional. The [Page.Layout.id][google.cloud.documentai.v1beta2.Document.Page.Layout.id] on the page that this element references. If [LayoutRef.type][] is specified this id must also be specified.
Type
class LayoutType(value)
The type of layout that is being referenced.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
shard_index()
The 0-based index of this shard.
Type
shard_count()
Total number of shards.
Type
text_offset()
The index of the first character in [Document.text][google.cloud.documentai.v1beta2.Document.text] in the overall document global text.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
text_anchor()
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
TextAnchor
color()
Text color.
Type
Color
background_color()
Text background color.
Type
Color
font_weight()
Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp
Type
text_style()
Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp
Type
text_decoration()
Text decoration. Follows CSS standard. https://www.w3schools.com/cssref/pr_text_text-decoration.asp
Type
font_size()
Font size.
Type
FontSize
class FontSize(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Font size with unit.
size()
Font size for the text.
Type
unit()
Unit for the font size. Follows CSS naming (in, px, pt, etc.).
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Text reference indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].
text_segments()
The text segments from the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
Sequence[TextSegment]
class TextSegment(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A text segment in the [Document.text][google.cloud.documentai.v1beta2.Document.text]. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See [ShardInfo.text_offset][google.cloud.documentai.v1beta2.Document.ShardInfo.text_offset]
start_index()
[TextSegment][google.cloud.documentai.v1beta2.Document.TextAnchor.TextSegment] start UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
end_index()
[TextSegment][google.cloud.documentai.v1beta2.Document.TextAnchor.TextSegment] half open end UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta2.Document.text].
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.DocumentUnderstandingServiceClient(*, credentials: google.auth.credentials.Credentials = None, transport: Union[str, google.cloud.documentai_v1beta2.services.document_understanding_service.transports.base.DocumentUnderstandingServiceTransport] = None, client_options: <module 'google.api_core.client_options' from '/workspace/python-documentai/.nox/docfx/lib/python3.9/site-packages/google/api_core/client_options.py'> = None)
Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.
Instantiate the document understanding service client.
Parameters
credentials (Optional[google.auth.credentials.Credentials]) – The authorization credentials to attach to requests. These credentials identify the application to the service; if none are specified, the client will attempt to ascertain the credentials from the environment.
transport (Union[str, **DocumentUnderstandingServiceTransport]) – The transport to use. If set to None, a transport is chosen automatically.
client_options (ClientOptions) – Custom options for the client. (1) The
api_endpoint
property can be used to override the default endpoint provided by the client. (2) Iftransport
argument is None,client_options
can be used to create a mutual TLS transport. Ifclient_cert_source
is provided, mutual TLS transport will be created with the givenapi_endpoint
or the default mTLS endpoint, and the client SSL credentials obtained fromclient_cert_source
.
Raises
google.auth.exceptions.MutualTlsChannelError – If mutual TLS transport creation failed for any reason.
batch_process_documents(request: Optional[google.cloud.documentai_v1beta2.types.document_understanding.BatchProcessDocumentsRequest] = None, *, requests: Optional[Sequence[google.cloud.documentai_v1beta2.types.document_understanding.ProcessDocumentRequest]] = None, retry: google.api_core.retry.Retry = <_MethodDefault._DEFAULT_VALUE:
LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
Parameters
request (
BatchProcessDocumentsRequest
) – The request object. Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.requests (
Sequence[~.document_understanding.ProcessDocumentRequest]
) – Required. Individual requests for each document. This corresponds to therequests
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
Returns
An object representing a long-running operation.
The result type for the operation will be :class:
~.document_understanding.BatchProcessDocumentsResponse
: Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.Return type
Operation
classmethod from_service_account_file(filename: str, *args, **kwargs)
Creates an instance of this client using the provided credentials file.
Parameters
filename (str) – The path to the service account private key json file.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
Returns
The constructed client.
Return type
classmethod from_service_account_json(filename: str, *args, **kwargs)
Creates an instance of this client using the provided credentials file.
Parameters
filename (str) – The path to the service account private key json file.
args – Additional arguments to pass to the constructor.
kwargs – Additional arguments to pass to the constructor.
Returns
The constructed client.
Return type
process_document(request: Optional[google.cloud.documentai_v1beta2.types.document_understanding.ProcessDocumentRequest] = None, *, retry: google.api_core.retry.Retry = <_MethodDefault._DEFAULT_VALUE:
Processes a single document.
Parameters
request (
ProcessDocumentRequest
) – The request object. Request to process one document.retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.
timeout (float) – The timeout for this request.
metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.
Returns
Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.
Return type
Document
class google.cloud.documentai_v1beta2.EntityExtractionParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Parameters to control entity extraction behavior.
enabled()
Whether to enable entity extraction.
Type
model_version()
Model version of the entity extraction. Default is “builtin/stable”. Specify “builtin/latest” for the latest model.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.FormExtractionParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Parameters to control form extraction behavior.
enabled()
Whether to enable form extraction.
Type
key_value_pair_hints()
User can provide pairs of (key text, value type) to improve the parsing result.
For example, if a document has a field called “Date” that holds a date value and a field called “Amount” that may hold either a currency value (e.g., “$500.00”) or a simple number value (e.g., “20”), you could use the following hints: [ {“key”: “Date”, value_types: [ “DATE”]}, {“key”: “Amount”, “value_types”: [ “PRICE”, “NUMBER” ]} ]
If the value type is unknown, but you want to provide hints for the keys, you can leave the value_types field blank. e.g. {“key”: “Date”, “value_types”: []}
Type
Sequence[KeyValuePairHint]
model_version()
Model version of the form extraction system. Default is “builtin/stable”. Specify “builtin/latest” for the latest model. For custom form models, specify: “custom/{model_name}”. Model name format is “bucket_name/path/to/modeldir” corresponding to “gs://bucket_name/path/to/modeldir” where annotated examples are stored.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.GcsDestination(mapping=None, *, ignore_unknown_fields=False, **kwargs)
The Google Cloud Storage location where the output file will be written to.
uri()
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.GcsSource(mapping=None, *, ignore_unknown_fields=False, **kwargs)
The Google Cloud Storage location where the input file will be read from.
uri()
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.InputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)
The desired input location and metadata.
gcs_source()
The Google Cloud Storage location to read the input from. This must be a single file.
Type
GcsSource
contents()
Content in bytes, represented as a stream of bytes. Note: As
with all bytes
fields, proto buffer messages use a pure
binary representation, whereas JSON representations use
base64.
This field only works for synchronous ProcessDocument method.
Type
mime_type()
Required. Mimetype of the input. Current supported mimetypes are application/pdf, image/tiff, and image/gif. In addition, application/json type is supported for requests with [ProcessDocumentRequest.automl_params][google.cloud.documentai.v1beta2.ProcessDocumentRequest.automl_params] field set. The JSON file needs to be in [Document][google.cloud.documentai.v1beta2.Document] format.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.KeyValuePairHint(mapping=None, *, ignore_unknown_fields=False, **kwargs)
User-provided hint for key value pair.
key()
The key text for the hint.
Type
value_types()
Type of the value. This is case-insensitive, and could be one of: ADDRESS, LOCATION, ORGANIZATION, PERSON, PHONE_NUMBER, ID, NUMBER, EMAIL, PRICE, TERMS, DATE, NAME. Types not in this list will be ignored.
Type
Sequence[str]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.NormalizedVertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
x()
X coordinate.
Type
y()
Y coordinate.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.OcrParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Parameters to control Optical Character Recognition (OCR) behavior.
language_hints()
List of languages to use for OCR. In most cases, an empty
value yields the best results since it enables automatic
language detection. For languages based on the Latin
alphabet, setting language_hints
is not needed. In rare
cases, when the language of the text in the image is known,
setting a hint will help get better results (although it
will be a significant hindrance if the hint is wrong).
Document processing returns an error if one or more of the
specified languages is not one of the supported languages.
Type
Sequence[str]
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.OperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Contains metadata for the BatchProcessDocuments operation.
state()
The state of the current batch processing.
Type
State
state_message()
A message providing more details about the current state of processing.
Type
create_time()
The creation time of the operation.
Type
Timestamp
update_time()
The last update time of the operation.
Type
Timestamp
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.OutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)
The desired output location and metadata.
gcs_destination()
The Google Cloud Storage location to write the output to.
Type
GcsDestination
pages_per_shard()
The max number of pages to include into each output Document shard JSON on Google Cloud Storage.
The valid range is [1, 100]. If not specified, the default value is 20.
For example, for one pdf file with 100 pages, 100 parsed
pages will be produced. If pages_per_shard
= 20, then 5
Document shard JSON files each containing 20 parsed pages
will be written under the prefix
[OutputConfig.gcs_destination.uri][] and suffix
pages-x-to-y.json where x and y are 1-indexed page numbers.
Example GCS outputs with 157 pages and pages_per_shard = 50:
pages-001-to-050.json pages-051-to-100.json pages-101-to-150.json pages-151-to-157.json
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.ProcessDocumentRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Request to process one document.
parent()
Target project and location to make a call.
Format: projects/{project-id}/locations/{location-id}
.
If no location is specified, a region will be chosen automatically. This field is only populated when used in ProcessDocument method.
Type
input_config()
Required. Information about the input file.
Type
InputConfig
output_config()
Optional. The desired output location. This field is only needed in BatchProcessDocumentsRequest.
Type
OutputConfig
document_type()
Specifies a known document type for deeper structure detection. Valid values are currently “general” and “invoice”. If not provided, “general”is used as default. If any other value is given, the request is rejected.
Type
table_extraction_params()
Controls table extraction behavior. If not specified, the system will decide reasonable defaults.
Type
TableExtractionParams
form_extraction_params()
Controls form extraction behavior. If not specified, the system will decide reasonable defaults.
Type
FormExtractionParams
entity_extraction_params()
Controls entity extraction behavior. If not specified, the system will decide reasonable defaults.
Type
EntityExtractionParams
ocr_params()
Controls OCR behavior. If not specified, the system will decide reasonable defaults.
Type
OcrParams
automl_params()
Controls AutoML model prediction behavior. AutoMlParams cannot be used together with other Params.
Type
AutoMlParams
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.ProcessDocumentResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Response to a single document processing request.
input_config()
Information about the input file. This is the same as the corresponding input config in the request.
Type
InputConfig
output_config()
The output location of the parsed responses. The responses
are written to this location as JSON-serialized Document
objects.
Type
OutputConfig
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.TableBoundHint(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A hint for a table bounding box on the page for table parsing.
page_number()
Optional. Page number for multi-paged inputs this hint applies to. If not provided, this hint will apply to all pages by default. This value is 1-based.
Type
bounding_box()
Bounding box hint for a table on this page. The coordinates must be normalized to [0,1] and the bounding box must be an axis-aligned rectangle.
Type
BoundingPoly
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.TableExtractionParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Parameters to control table extraction behavior.
enabled()
Whether to enable table extraction.
Type
table_bound_hints()
Optional. Table bounding box hints that can be provided to complex cases which our algorithm cannot locate the table(s) in.
Type
Sequence[TableBoundHint]
header_hints()
Optional. Table header hints. The extraction will bias towards producing these terms as table headers, which may improve accuracy.
Type
Sequence[str]
model_version()
Model version of the table extraction system. Default is “builtin/stable”. Specify “builtin/latest” for the latest model.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.
class google.cloud.documentai_v1beta2.Vertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
x()
X coordinate.
Type
y()
Y coordinate.
Type
_delattr_(key)
Delete the value on the given field.
This is generally equivalent to setting a falsy value.
_eq_(other)
Return True if the messages are equal, False otherwise.
_ne_(other)
Return True if the messages are unequal, False otherwise.
_setattr_(key, value)
Set the value on the given field.
For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.