GcsSource(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Cloud Storage location for input content.
Attributes |
|
---|---|
Name | Description |
input_uris |
MutableSequence[str]
Required. Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, gs://bucket/directory/object.json ) or
a pattern matching one or more files, such as
gs://bucket/directory/*.json .
A request can contain at most 100 files (or 100,000 files if
data_schema is content ). Each file can be up to 2 GB
(or 100 MB if data_schema is content ).
|
data_schema |
str
The schema to use when parsing the data from the source. Supported values for document imports: - document (default): One JSON
Document per
line. Each document must have a valid
Document.id.
- content : Unstructured data (e.g. PDF, HTML). Each
file matched by input_uris becomes a document, with
the ID set to the first 128 bits of SHA256(URI) encoded
as a hex string.
- custom : One custom data JSON per row in arbitrary
format that conforms to the defined
Schema of the
data store. This can only be used by the GENERIC Data
Store vertical.
- csv : A CSV file with header conforming to the defined
Schema of the
data store. Each entry after the header is imported as a
Document. This can only be used by the GENERIC Data Store
vertical.
Supported values for user event imports:
- user_event (default): One JSON
UserEvent
per line.
|