Wrappers for Document AI Document type.
Classes
Document
Document(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
gcs_bucket_name: typing.Optional[str] = None,
gcs_prefix: typing.Optional[str] = None,
gcs_uri: typing.Optional[str] = None,
gcs_input_uri: typing.Optional[str] = None,
)
Represents a wrapped Document
.
This class hides away the complexities of using the Document
protobuf
response outputted by BatchProcessDocuments
or ProcessDocument
methods and implements convenient methods for searching and
extracting information within the Document
.
Modules Functions
_apply_text_offset
_apply_text_offset(
documentai_object: typing.Union[typing.Dict[str, typing.Dict], typing.List],
text_offset: int,
) -> None
Applies a text offset to all text_segments in documentai_object
.
Parameters | |
---|---|
Name | Description |
documentai_object |
object
Required. Document AI object to apply |
text_offset |
int
Required. Text offset to apply. From |
_bigquery_column_name
_bigquery_column_name(input_string: str) -> str
Converts a string into a BigQuery column name. https://cloud.google.com/bigquery/docs/schemas#column_names
Parameter | |
---|---|
Name | Description |
input_string |
str
Required: The string to convert. |
_dict_to_bigquery
_dict_to_bigquery(
dic: typing.Dict[str, typing.Union[str, typing.List[str]]],
dataset_name: str,
table_name: str,
project_id: typing.Optional[str],
) -> google.cloud.bigquery.job.load.LoadJob
Loads dictionary to a BigQuery table.
Parameters | |
---|---|
Name | Description |
dic |
Dict[str, Union[str, List[str]]]
Required: The dictionary to insert. |
dataset_name |
str
Required. Name of the BigQuery dataset. |
table_name |
str
Required. Name of the BigQuery table. |
project_id |
Optional[str]
Optional. Project ID containing the BigQuery table. If not passed, falls back to the default inferred from the environment. |
Returns | |
---|---|
Type | Description |
bigquery.job.LoadJob |
The BigQuery LoadJob for adding the dictionary. |
_entities_from_shards
_entities_from_shards(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]
Returns a list of Entities and Properties from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Entity] |
a list of Entities. |
_get_batch_process_metadata
_get_batch_process_metadata(
operation_name: str,
location: typing.Optional[str] = None,
timeout: typing.Optional[float] = None,
) -> google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadata
Get BatchProcessMetadata
from a batch_process_documents()
long-running operation.
Parameters | |
---|---|
Name | Description |
operation_name |
str
Required. The fully qualified operation name for a |
location |
str
Optional. The location of the processor used for |
timeout |
float
Optional. Default None. Time in seconds to wait for operation to complete. If None, will wait indefinitely. |
Returns | |
---|---|
Type | Description |
documentai.BatchProcessMetadata |
Metadata from batch process. |
_get_shards
_get_shards(
gcs_bucket_name: str, gcs_prefix: str
) -> typing.List[google.cloud.documentai_v1.types.document.Document]
Returns a list of documentai.Document
shards from a Cloud Storage folder.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |
Returns | |
---|---|
Type | Description |
List[google.cloud.documentai.Document] |
A list of documentai.Documents. |
_insert_into_dictionary_with_list
_insert_into_dictionary_with_list(
dic: typing.Dict[str, typing.Union[str, typing.List[str]]], key: str, value: str
) -> typing.Dict[str, typing.Union[str, typing.List[str]]]
Inserts value into a dictionary that can contain lists.
Parameters | |
---|---|
Name | Description |
dic |
Dict[str, Union[str, List[str]]]
Required. The dictionary to insert into. |
key |
str
Required. The key to be created or inserted into. |
value |
str
Required. The value to be inserted. |
Returns | |
---|---|
Type | Description |
Dict[str, Union[str, List[str]]] |
The dictionary after adding the key-value pair. |
_pages_from_shards
_pages_from_shards(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]
Returns a list of Pages from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Page] |
A list of Pages. |