GcsSource
Stay organized with collections
Save and categorize content based on your preferences.
Cloud Storage location for input content.
JSON representation |
{
"inputUris": [
string
],
"dataSchema": string
} |
Fields |
inputUris[] |
string
Required. Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, gs://bucket/directory/object.json ) or a pattern matching one or more files, such as gs://bucket/directory/*.json . A request can contain at most 100 files (or 100,000 files if dataSchema is content ). Each file can be up to 2 GB (or 100 MB if dataSchema is content ).
|
dataSchema |
string
The schema to use when parsing the data from the source. Supported values for document imports:
document (default): One JSON Document per line. Each document must have a valid Document.id .
content : Unstructured data (e.g. PDF, HTML). Each file matched by inputUris becomes a document, with the ID set to the first 128 bits of SHA256(URI) encoded as a hex string.
custom : One custom data JSON per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical.
csv : A CSV file with header conforming to the defined Schema of the data store. Each entry after the header is imported as a Document. This can only be used by the GENERIC Data Store vertical.
Supported values for user event imports:
user_event (default): One JSON UserEvent per line.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-03-03 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-03 UTC."],[[["The `inputUris` field is a required string array specifying Cloud Storage URIs for input files, each up to 2000 characters, supporting full object paths or patterns for multiple files, with a maximum of 100 files per request (or 100,000 if `dataSchema` is `content`)."],["The `dataSchema` field determines the parsing format of the data, supporting values like `document` for one JSON Document per line, `content` for unstructured data where each file is a document, `custom` for custom JSON data per row, `csv` for a CSV file with a header, and `user_event` for one JSON UserEvent per line."],["Each file specified in `inputUris` can be up to 2 GB in size, except when `dataSchema` is `content`, where each file is limited to 100 MB."]]],[]]