Class Document (2.8.1)

Stay organized with collections Save and categorize content based on your preferences.
Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Represents the input to API methods.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Attributes

NameDescription
type_ google.cloud.language_v1beta2.types.Document.Type
Required. If the type is not set or is TYPE_UNSPECIFIED, returns an INVALID_ARGUMENT error.
content str
The content of the input in string format. Cloud audit logging exempt since it is based on user data. This field is a member of oneof_ source.
gcs_content_uri str
The Google Cloud Storage URI where the file content is located. This URI must be of the form: gs://bucket_name/object_name. For more details, see https://cloud.google.com/storage/docs/reference-uris. NOTE: Cloud Storage object versioning is not supported. This field is a member of oneof_ source.
language str
The language of the document (if not specified, the language is automatically detected). Both ISO and BCP-47 language codes are accepted. `Language Support
reference_web_uri str
The web URI where the document comes from. This URI is not used for fetching the content, but as a hint for analyzing the document.
boilerplate_handling google.cloud.language_v1beta2.types.Document.BoilerplateHandling
Indicates how detected boilerplate(e.g. advertisements, copyright declarations, banners) should be handled for this document. If not specified, boilerplate will be treated the same as content.

Classes

BoilerplateHandling

BoilerplateHandling(value)

Ways of handling boilerplate detected in the document

Values: BOILERPLATE_HANDLING_UNSPECIFIED (0): The boilerplate handling is not specified. SKIP_BOILERPLATE (1): Do not analyze detected boilerplate. Reference web URI is required for detecting boilerplate. KEEP_BOILERPLATE (2): Treat boilerplate the same as content.

Type

Type(value)

The document types enum.

Values: TYPE_UNSPECIFIED (0): The content type is not specified. PLAIN_TEXT (1): Plain text HTML (2): HTML