Summary of entries of Methods for documentai-toolbox.
google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info
_get_client_info(
module: typing.Optional[str] = None,
) -> google.api_core.gapic_v1.client_info.ClientInfo
Returns a custom user agent header.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info
google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client
_get_storage_client(
module: typing.Optional[str] = None,
) -> google.cloud.storage.client.Client
Returns a Storage client with custom user agent header.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client
google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches
create_batches(
gcs_bucket_name: str, gcs_prefix: str, batch_size: int = 1000
) -> typing.List[
google.cloud.documentai_v1.types.document_io.BatchDocumentsInputConfig
]
Create batches of documents in Cloud Storage to process with batch_process_documents()
.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches
google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri
create_gcs_uri(gcs_bucket_name: str, gcs_prefix: str) -> str
Creates a Cloud Storage uri from the bucket_name and prefix.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob
get_blob(
gcs_uri: str, module: typing.Optional[str] = "get-bytes"
) -> google.cloud.storage.blob.Blob
Returns a blob from Cloud Storage.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs
get_blobs(
gcs_uri: typing.Optional[str] = None,
gcs_bucket_name: typing.Optional[str] = None,
gcs_prefix: typing.Optional[str] = "/",
module: typing.Optional[str] = "get-bytes",
) -> typing.List[google.cloud.storage.blob.Blob]
Returns a list of blobs from Cloud Storage.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes
get_bytes(gcs_bucket_name: str, gcs_prefix: str) -> typing.List[bytes]
Returns a list of bytes of json files from Cloud Storage.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes
google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree
list_gcs_document_tree(
gcs_bucket_name: str, gcs_prefix: str
) -> typing.Dict[str, typing.List[str]]
Returns a list path to files in Cloud Storage folder.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree
google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree
print_gcs_document_tree(
gcs_bucket_name: str, gcs_prefix: str, files_to_display: int = 4
) -> None
Prints a tree of filenames in a Cloud Storage folder.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree
google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri
split_gcs_uri(gcs_uri: str) -> typing.Tuple[str, str]
Splits a Cloud Storage uri into the bucket_name and prefix.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri
google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file
upload_file(
gcs_output_directory: str,
file_name: str,
file_content: str,
content_type: str = "application/json",
module: typing.Optional[str] = "upload-file",
) -> None
Uploads the converted docproto to gcs.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file
google.cloud.documentai_toolbox.wrappers.document._apply_text_offset
_apply_text_offset(
documentai_object: typing.Union[typing.Dict[str, typing.Dict], typing.List],
text_offset: int,
) -> None
Applies a text offset to all text_segments in documentai_object
.
See more: google.cloud.documentai_toolbox.wrappers.document._apply_text_offset
google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name
_bigquery_column_name(input_string: str) -> str
Converts a string into a BigQuery column name.
See more: google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name
google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery
_dict_to_bigquery(
dic: typing.Dict[str, typing.Union[str, typing.List[str]]],
dataset_name: str,
table_name: str,
project_id: typing.Optional[str],
) -> google.cloud.bigquery.job.load.LoadJob
Loads dictionary to a BigQuery table.
See more: google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery
google.cloud.documentai_toolbox.wrappers.document._entities_from_shards
_entities_from_shards(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]
Returns a list of Entities and Properties from a list of documentai.Document shards.
See more: google.cloud.documentai_toolbox.wrappers.document._entities_from_shards
google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata
_get_batch_process_metadata(
operation_name: str,
location: typing.Optional[str] = None,
timeout: typing.Optional[float] = None,
) -> google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadata
Get BatchProcessMetadata
from a batch_process_documents()
long-running operation.
See more: google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata
google.cloud.documentai_toolbox.wrappers.document._get_shards
_get_shards(
gcs_bucket_name: str, gcs_prefix: str
) -> typing.List[google.cloud.documentai_v1.types.document.Document]
Returns a list of documentai.Document
shards from a Cloud Storage folder.
See more: google.cloud.documentai_toolbox.wrappers.document._get_shards
google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list
_insert_into_dictionary_with_list(
dic: typing.Dict[str, typing.Union[str, typing.List[str]]], key: str, value: str
) -> typing.Dict[str, typing.Union[str, typing.List[str]]]
Inserts value into a dictionary that can contain lists.
See more: google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list
google.cloud.documentai_toolbox.wrappers.document._pages_from_shards
_pages_from_shards(
shards: typing.List[google.cloud.documentai_v1.types.document.Document],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]
Returns a list of Pages from a list of documentai.Document shards.
See more: google.cloud.documentai_toolbox.wrappers.document._pages_from_shards
google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box
_get_hocr_bounding_box(
element_with_layout: typing.Union[
google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
google.cloud.documentai_v1.types.document.Document.Page,
google.cloud.documentai_v1.types.document.Document.Page.Token,
google.cloud.documentai_v1.types.document.Document.Page.Block,
google.cloud.documentai_v1.types.document.Document.Page.Symbol,
],
page_dimension: google.cloud.documentai_v1.types.document.Document.Page.Dimension,
) -> typing.Optional[str]
Returns a hOCR bounding box string.
See more: google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box
google.cloud.documentai_toolbox.wrappers.page._text_from_layout
_text_from_layout(
layout: google.cloud.documentai_v1.types.document.Document.Page.Layout, text: str
) -> str
Returns a text from a single layout element.
See more: google.cloud.documentai_toolbox.wrappers.page._text_from_layout
google.cloud.documentai_toolbox.wrappers.page._trim_text
_trim_text(text: str) -> str
Remove extra space characters from text (blank, newline, tab, etc.) .
See more: google.cloud.documentai_toolbox.wrappers.page._trim_text
google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_json_response
convert_document_to_annotate_file_json_response() -> str
Convert OCR data from Document.proto
to JSON str of AnnotateFileResponse
for Vision API.
google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_response
convert_document_to_annotate_file_response() -> (
google.cloud.vision_v1.types.image_annotator.AnnotateFileResponse
)
Convert OCR data from Document.proto
to AnnotateFileResponse.proto
for Vision API.
google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery
entities_to_bigquery(
dataset_name: str, table_name: str, project_id: typing.Optional[str] = None
) -> google.cloud.bigquery.job.load.LoadJob
Adds extracted entities to a BigQuery table.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery
google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict
entities_to_dict() -> typing.Dict[str, typing.Union[str, typing.List[str]]]
Returns Dictionary of entities in document.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict
google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str
export_hocr_str(title: str) -> str
Exports a string hOCR version of the Document.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str
google.cloud.documentai_toolbox.wrappers.document.Document.export_images
export_images(
output_path: str, output_file_prefix: str, output_file_extension: str
) -> typing.List[str]
Exports images from Document.entities
to files.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.export_images
google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery
form_fields_to_bigquery(
dataset_name: str, table_name: str, project_id: typing.Optional[str] = None
) -> google.cloud.bigquery.job.load.LoadJob
Adds extracted form fields to a BigQuery table.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery
google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict
form_fields_to_dict() -> typing.Dict[str, typing.Union[str, typing.List[str]]]
Returns dictionary of form fields in document.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict
google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata
from_batch_process_metadata(
metadata: google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadata,
) -> typing.List[google.cloud.documentai_toolbox.wrappers.document.Document]
Loads Documents from Cloud Storage, using the output from BatchProcessMetadata
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata
google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation
from_batch_process_operation(
location: str, operation_name: str, timeout: typing.Optional[float] = None
) -> typing.List[google.cloud.documentai_toolbox.wrappers.document.Document]
Loads Documents from Cloud Storage, using the operation name returned from batch_process_documents()
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation
google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path
from_document_path(
document_path: str,
) -> google.cloud.documentai_toolbox.wrappers.document.Document
Loads Document
from local document_path
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path
google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document
from_documentai_document(
documentai_document: google.cloud.documentai_v1.types.document.Document,
) -> google.cloud.documentai_toolbox.wrappers.document.Document
Loads Document
from local documentai_document
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document
google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs
from_gcs(
gcs_bucket_name: str, gcs_prefix: str, gcs_input_uri: typing.Optional[str] = None
) -> google.cloud.documentai_toolbox.wrappers.document.Document
Loads a Document from a Cloud Storage directory.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs
google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri
from_gcs_uri(
gcs_uri: str, gcs_input_uri: typing.Optional[str] = None
) -> google.cloud.documentai_toolbox.wrappers.document.Document
Loads a Document from a Cloud Storage uri.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri
google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type
get_entity_by_type(
target_type: str,
) -> typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]
Returns the list of Entities
of target_type
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type
google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name
get_form_field_by_name(
target_field: str,
) -> typing.List[google.cloud.documentai_toolbox.wrappers.page.FormField]
Returns the list of FormFields
named target_field
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name
google.cloud.documentai_toolbox.wrappers.document.Document.search_pages
search_pages(
target_string: typing.Optional[str] = None, pattern: typing.Optional[str] = None
) -> typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]
Returns the list of Pages containing target_string or text matching pattern.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.search_pages
google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf
split_pdf(pdf_path: str, output_path: str) -> typing.List[str]
Splits local PDF file into multiple PDF files based on output from a Splitter processor.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf
google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document
to_merged_documentai_document() -> (
google.cloud.documentai_v1.types.document.Document
)
Exports a documentai.Document from the wrapped document with shards merged.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document
google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image
crop_image(
documentai_page: google.cloud.documentai_v1.types.document.Document.Page,
) -> typing.Optional[PIL.Image.Image]
Return image cropped from page image for detected entity.
See more: google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image
google.cloud.documentai_toolbox.wrappers.page.Page._get_elements
_get_elements(element_type: typing.Type, attribute_name: str) -> typing.List
Helper method to create elements based on specified type.
See more: google.cloud.documentai_toolbox.wrappers.page.Page._get_elements
google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows
_extract_table_rows(
table_rows: typing.Iterable[
google.cloud.documentai_v1.types.document.Document.Page.Table.TableRow
],
) -> typing.List[typing.List[str]]
Returns a list of rows from table_rows.
See more: google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows
google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe
to_dataframe() -> pandas.core.frame.DataFrame
Returns pd.DataFrame from documentai.table .
See more: google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe
google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element
_get_children_of_element(
potential_children: typing.List[
google.cloud.documentai_toolbox.wrappers.page._BasePageElement
],
) -> typing.List[google.cloud.documentai_toolbox.wrappers.page._BasePageElement]
Filters potential child elements to identify only those fully contained within this element.
See more: google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element