GcsSource(mapping=None, *, ignore_unknown_fields=False, **kwargs)Cloud Storage location for input content.
Attributes |
|
|---|---|
| Name | Description |
input_uris |
MutableSequence[str]
Required. Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, gs://bucket/directory/object.json) or
a pattern matching one or more files, such as
gs://bucket/directory/*.json.
A request can contain at most 100 files (or 100,000 files if
data_schema is content). Each file can be up to 2 GB
(or 100 MB if data_schema is content).
|
data_schema |
str
The schema to use when parsing the data from the source. Supported values for document imports: - document (default): One JSON
Document per
line. Each document must have a valid
Document.id.
- content: Unstructured data (e.g. PDF, HTML). Each file
matched by input_uris becomes a document, with the ID
set to the first 128 bits of SHA256(URI) encoded as a hex
string.
- custom: One custom data JSON per row in arbitrary
format that conforms to the defined
Schema of the
data store. This can only be used by the GENERIC Data
Store vertical.
- csv: A CSV file with header conforming to the defined
Schema of the
data store. Each entry after the header is imported as a
Document. This can only be used by the GENERIC Data Store
vertical.
Supported values for user event imports:
- user_event (default): One JSON
UserEvent per
line.
|