Module documentai_toolbox.utilities (0.4.0a0)

Document AI utilities.

Modules Functions

create_batches

create_batches(
    gcs_bucket_name: str, gcs_prefix: str, batch_size: Optional[int] = 50
)

Create batches of documents in Cloud Storage to process with batch_process_documents().

Parameters
NameDescription
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder Format: gs://bucket/optional_folder/target_folder/ where gcs_prefix=optional_folder/target_folder.

batch_size Optional[int]

Optional. Size of each batch of documents. Default is 50.

Returns
TypeDescription
List[documentai.BatchDocumentsInputConfig]A list of BatchDocumentsInputConfig, each corresponding to one batch.

list_gcs_document_tree

list_gcs_document_tree(gcs_bucket_name: str, gcs_prefix: str)

Returns a list path to files in Cloud Storage folder and prints the tree to terminal.

Parameters
NameDescription
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.

Returns
TypeDescription
Dict[str, List[str]]The paths to documents in gs://{gcs_bucket_name}/{gcs_prefix}.

print_gcs_document_tree

print_gcs_document_tree(gcs_bucket_name: str, gcs_prefix: str)

Prints a tree of filenames in Cloud Storage folder..

Parameters
NameDescription
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.