Class Document (0.1.1a0)

Document(
    shards: List[google.cloud.documentai_v1.types.document.Document],
    gcs_bucket_name: Optional[str] = None,
    gcs_prefix: Optional[str] = None,
)

Represents a wrapped Document.

This class hides away the complexities of using Document protobuf response outputted by BatchProcessDocuments or ProcessDocument methods and implements convenient methods for searching and extracting information within the Document.

Optional. The name of the gcs bucket.

Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

:type: Optional[str]

(List[Entity]): A list of Entities in the Document.

Attributes

NameDescription
gcs_prefix Optional[str]
Optional. The prefix of the json files in the target_folder. Format: gs://bucket/optional_folder/target_folder/ where gcs_prefix=optional_folder/target_folder. For more information please take a look at https://cloud.google.com/storage/docs/json_api/v1/objects/list .
pages Optional[str]
(List[Page]): A list of Pages in the Document.

Methods

from_document_path

from_document_path(document_path: str)

Loads Document from local document_path.

Parameter
NameDescription
document_path str

Required. The path to the document.json file.

Returns
TypeDescription
DocumentA document from local document_path.

from_documentai_document

from_documentai_document(
    documentai_document: google.cloud.documentai_v1.types.document.Document,
)

Loads Document from local documentai_document.

Parameter
NameDescription
documentai_document documentai.Document

Optional. The Document.proto response.

Returns
TypeDescription
DocumentA document from local documentai_document.

from_gcs

from_gcs(gcs_bucket_name: str, gcs_prefix: str)

Loads Document from Cloud Storage.

Parameters
NameDescription
gcs_bucket_name str

Required. The gcs bucket. Format: Given gs://{bucket_name}/{optional_folder}/{operation_id}/ where gcs_bucket_name={bucket_name}.

gcs_prefix str

Required. The prefix to the location of the target folder. Format: Given gs://{bucket_name}/optional_folder/target_folder where gcs_prefix={optional_folder}/{target_folder}.

Returns
TypeDescription
DocumentA document from gcs.

get_entity_by_type

get_entity_by_type(target_type: str)

Returns the list of Entities of target_type.

Parameter
NameDescription
target_type str

Required. target_type.

Returns
TypeDescription
List[Entity]A list of Entity matching target_type.

search_pages

search_pages(target_string: Optional[str] = None, pattern: Optional[str] = None)

Returns the list of Pages containing target_string or text matching pattern.

Parameters
NameDescription
target_string Optional[str]

Optional. target str.

pattern Optional[str]

Optional. regex str.

Returns
TypeDescription
List[Page]A list of Pages.