Method: projects.locations.collections.dataStores.branches.documents.import

Bulk import of multiple Documents. Request processing may be synchronous. Non-existing items will be created.

Note: It is possible for a subset of the Documents to be successfully updated.

HTTP request

POST https://discoveryengine.googleapis.com/v1alpha/{parent=projects/*/locations/*/collections/*/dataStores/*/branches/*}/documents:import

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters
parent

string

Required. The parent branch resource name, such as projects/{project}/locations/{location}/collections/{collection}/dataStores/{dataStore}/branches/{branch}. Requires create/update permission.

Request body

The request body contains data with the following structure:

JSON representation
{
  "errorConfig": {
    object (ImportErrorConfig)
  },
  "reconciliationMode": enum (ReconciliationMode),
  "autoGenerateIds": boolean,
  "idField": string,

  // Union field source can be only one of the following:
  "inlineSource": {
    object (InlineSource)
  },
  "gcsSource": {
    object (GcsSource)
  },
  "bigquerySource": {
    object (BigQuerySource)
  }
  // End of list of possible types for union field source.
}
Fields
errorConfig

object (ImportErrorConfig)

The desired location of errors incurred during the Import.

reconciliationMode

enum (ReconciliationMode)

The mode of reconciliation between existing documents and the documents to be imported. Defaults to ReconciliationMode.INCREMENTAL.

autoGenerateIds

boolean

Whether to automatically generate IDs for the documents if absent.

If set to true, Document.ids are automatically generated based on the hash of the payload, where IDs may not be consistent during multiple imports. In which case ReconciliationMode.FULL is highly recommended to avoid duplicate contents. If unset or set to false, Document.ids have to be specified using idField, otherwise, documents without IDs fail to be imported.

Only set this field when using GcsSource or BigQuerySource, and when GcsSource.data_schema or BigQuerySource.data_schema is custom or csv. Otherwise, an INVALID_ARGUMENT error is thrown.

idField

string

The field in the Cloud Storage and BigQuery sources that indicates the unique IDs of the documents.

For GcsSource it is the key of the JSON field. For instance, my_id for JSON {"my_id": "some_uuid"}. For BigQuerySource it is the column name of the BigQuery table where the unique ids are stored.

The values of the JSON field or the BigQuery column are used as the Document.ids. The JSON field or the BigQuery column must be of string type, and the values must be set as valid strings conform to RFC-1034 with 1-63 characters. Otherwise, documents without valid IDs fail to be imported.

Only set this field when using GcsSource or BigQuerySource, and when GcsSource.data_schema or BigQuerySource.data_schema is custom. And only set this field when autoGenerateIds is unset or set as false. Otherwise, an INVALID_ARGUMENT error is thrown.

If it is unset, a default value _id is used when importing from the allowed data sources.

Union field source. Required. The source of the input. source can be only one of the following:
inlineSource

object (InlineSource)

The Inline source for the input content for documents.

gcsSource

object (GcsSource)

Cloud Storage location for the input content.

bigquerySource

object (BigQuerySource)

BigQuery input source.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • discoveryengine.documents.import

For more information, see the IAM documentation.