Document(
shards: List[google.cloud.documentai_v1.types.document.Document],
gcs_bucket_name: Optional[str] = None,
gcs_prefix: Optional[str] = None,
)
Represents a wrapped Document.
This class hides away the complexities of using Document protobuf response outputted by BatchProcessDocuments or ProcessDocument methods and implements convenient methods for searching and extracting information within the Document.
Optional. The name of the gcs bucket.
Format: gs://{bucket}/{optional_folder}/{target_folder}/ where gcs_bucket_name={bucket}.
:type: Optional[str]
(List[Entity]): A list of Entities in the Document.
Attributes
Name | Description |
gcs_prefix |
Optional[str]
Optional. The prefix of the json files in the target_folder. Format: gs://{bucket}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}/. For more information please take a look at https://cloud.google.com/storage/docs/json_api/v1/objects/list . |
pages |
Optional[str]
(List[Page]): A list of Pages in the Document. |
Methods
from_document_path
from_document_path(document_path: str)
Loads Document from local document_path.
Name | Description |
document_path |
str
Required. The path to the resp. |
Type | Description |
Document | A document from local document_path. |
from_documentai_document
from_documentai_document(
documentai_document: google.cloud.documentai_v1.types.document.Document,
)
Loads Document from local documentai_document.
Name | Description |
documentai_document |
documentai.Document
Optional. The Document.proto response. |
Type | Description |
Document | A document from local documentai_document. |
from_gcs
from_gcs(gcs_bucket_name: str, gcs_prefix: str)
Loads Document from Cloud Storage.
Name | Description |
gcs_bucket_name |
str
Required. The gcs bucket. Format: Given |
gcs_prefix |
str
Required. The prefix to the location of the target folder. Format: Given |
Type | Description |
Document | A document from gcs. |
get_entity_by_type
get_entity_by_type(target_type: str)
Returns the list of Entities of target_type.
Name | Description |
target_type |
str
Required. target_type. |
Type | Description |
List[Entity] | A list of Entity matching target_type. |
search_pages
search_pages(target_string: Optional[str] = None, pattern: Optional[str] = None)
Returns the list of Pages containing target_string or text matching pattern.
Name | Description |
target_string |
Optional[str]
Optional. target str. |
pattern |
Optional[str]
Optional. regex str. |
Type | Description |
List[Page] | A list of Pages. |