Method: projects.locations.documents.search

Searches for documents using provided SearchDocumentsRequest. This call only returns documents that the caller has permission to search against.

HTTP request

POST https://contentwarehouse.googleapis.com/v1/{parent}/documents:search

Path parameters

Parameters
parent

string

Required. The parent, which owns this collection of documents. Format: projects/{projectNumber}/locations/{location}. It takes the form projects/{project}/locations/{location}.

Request body

The request body contains data with the following structure:

JSON representation
{
  "requestMetadata": {
    object (RequestMetadata)
  },
  "documentQuery": {
    object (DocumentQuery)
  },
  "offset": integer,
  "pageSize": integer,
  "pageToken": string,
  "orderBy": string,
  "histogramQueries": [
    {
      object (HistogramQuery)
    }
  ],
  "requireTotalSize": boolean,
  "totalResultSize": enum (TotalResultSize),
  "qaSizeLimit": integer
}
Fields
requestMetadata

object (RequestMetadata)

The meta information collected about the end user, used to enforce access control and improve the search quality of the service.

documentQuery

object (DocumentQuery)

Query used to search against documents (keyword, filters, etc.).

offset

integer

An integer that specifies the current offset (that is, starting result location, amongst the documents deemed by the API as relevant) in search results. This field is only considered if pageToken is unset.

The maximum allowed value is 5000. Otherwise an error is thrown.

For example, 0 means to return results starting from the first matching document, and 10 means to return from the 11th document. This can be used for pagination, (for example, pageSize = 10 and offset = 10 means to return from the second page).

pageSize

integer

A limit on the number of documents returned in the search results. Increasing this value above the default value of 10 can increase search response time. The value can be between 1 and 100.

pageToken

string

The token specifying the current offset within search results. See SearchDocumentsResponse.next_page_token for an explanation of how to obtain the next set of query results.

orderBy

string

The criteria determining how search results are sorted. For non-empty query, default is "relevance desc". For empty query, default is "uploadDate desc".

Supported options are:

  • "relevance desc": By relevance descending, as determined by the API algorithms.
  • "uploadDate desc": By upload date descending.
  • "uploadDate": By upload date ascending.
  • "updateDate desc": By last updated date descending.
  • "updateDate": By last updated date ascending.
  • "retrievalImportance desc": By retrieval importance of properties descending. This feature is still under development, please do not use unless otherwise instructed to do so.
histogramQueries[]

object (HistogramQuery)

An expression specifying a histogram request against matching documents. Expression syntax is an aggregation function call with histogram facets and other options.

The following aggregation functions are supported:

  • count(string_histogram_facet): count the number of matching entities for each distinct attribute value.

data types:

  • Histogram facet (aka filterable properties): Facet names with format <schema id>.<facet>. Facets will have the format of: [a-zA-Z][a-zA-Z0-9_:/-.]. If the facet is a child facet, then the parent hierarchy needs to be specified separated by dots in the prefix after the schema id. Thus, the format for a multi- level facet is: <schema id>.<parent facet name>. <child facet name>. Example: schema123.root_parent_facet.middle_facet.child_facet
  • DocumentSchemaId: (with no schema id prefix) to get histograms for each document type (returns the schema id path, e.g. projects/12345/locations/us-west/documentSchemas/abc123).

Example expression:

  • Document type counts: count('DocumentSchemaId')

  • For schema id, abc123, get the counts for MORTGAGE_TYPE: count('abc123.MORTGAGE_TYPE')

requireTotalSize

boolean

Controls if the search document request requires the return of a total size of matched documents. See SearchDocumentsResponse.total_size.

Enabling this flag may adversely impact performance. Hint: If this is used with pagination, set this flag on the initial query but set this to false on subsequent page calls (keep the total count locally).

Defaults to false.

totalResultSize

enum (TotalResultSize)

Controls if the search document request requires the return of a total size of matched documents. See SearchDocumentsResponse.total_size.

qaSizeLimit

integer

Experimental, do not use. The limit on the number of documents returned for the question-answering feature. To enable the question-answering feature, set [DocumentQuery].[isNlQuery][] to true.

Response body

Response message for DocumentService.SearchDocuments.

If successful, the response body contains data with the following structure:

JSON representation
{
  "matchingDocuments": [
    {
      object (MatchingDocument)
    }
  ],
  "nextPageToken": string,
  "totalSize": integer,
  "metadata": {
    object (ResponseMetadata)
  },
  "histogramQueryResults": [
    {
      object (HistogramQueryResult)
    }
  ],
  "questionAnswer": string
}
Fields
matchingDocuments[]

object (MatchingDocument)

The document entities that match the specified SearchDocumentsRequest.

nextPageToken

string

The token that specifies the starting position of the next page of results. This field is empty if there are no more results.

totalSize

integer

The total number of matched documents which is available only if the client set SearchDocumentsRequest.require_total_size to true or set SearchDocumentsRequest.total_result_size to ESTIMATED_SIZE or ACTUAL_SIZE. Otherwise, the value will be -1. Typically a UI would handle this condition by displaying "of many", for example: "Displaying 10 of many".

metadata

object (ResponseMetadata)

Additional information for the API invocation, such as the request tracking id.

histogramQueryResults[]

object (HistogramQueryResult)

The histogram results that match with the specified SearchDocumentsRequest.histogram_queries.

questionAnswer

string

Experimental. Question answer from the query against the document.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • contentwarehouse.documents.get

For more information, see the IAM documentation.

DocumentQuery

JSON representation
{
  "query": string,
  "isNlQuery": boolean,
  "customPropertyFilter": string,
  "timeFilters": [
    {
      object (TimeFilter)
    }
  ],
  "documentSchemaNames": [
    string
  ],
  "propertyFilter": [
    {
      object (PropertyFilter)
    }
  ],
  "fileTypeFilter": {
    object (FileTypeFilter)
  },
  "folderNameFilter": string,
  "documentNameFilter": [
    string
  ],
  "queryContext": [
    string
  ],
  "documentCreatorFilter": [
    string
  ],
  "customWeightsMetadata": {
    object (CustomWeightsMetadata)
  }
}
Fields
query

string

The query string that matches against the full text of the document and the searchable properties.

The query partially supports Google AIP style syntax. Specifically, the query supports literals, logical operators, negation operators, comparison operators, and functions.

Literals: A bare literal value (examples: "42", "Hugo") is a value to be matched against. It searches over the full text of the document and the searchable properties.

Logical operators: "AND", "and", "OR", and "or" are binary logical operators (example: "engineer OR developer").

Negation operators: "NOT" and "!" are negation operators (example: "NOT software").

Comparison operators: support the binary comparison operators =, !=, <, >, <= and >= for string, numeric, enum, boolean. Also support like operator ~~ for string. It provides semantic search functionality by parsing, stemming and doing synonyms expansion against the input query.

To specify a property in the query, the left hand side expression in the comparison must be the property id including the parent. The right hand side must be literals. For example: ""projects/123/locations/us".property_a < 1" matches results whose "property_a" is less than 1 in project 123 and us location. The literals and comparison expression can be connected in a single query (example: "software engineer "projects/123/locations/us".salary > 100").

Functions: supported functions are LOWER([propertyName]) to perform a case insensitive match and EMPTY([propertyName]) to filter on the existence of a key.

Support nested expressions connected using parenthesis and logical operators. The default logical operators is AND if there is no operators between expressions.

The query can be used with other filters e.g. timeFilters and folderNameFilter. They are connected with AND operator under the hood.

The maximum number of allowed characters is 255.

isNlQuery

boolean

Experimental, do not use. If the query is a natural language question. False by default. If true, then the question-answering feature will be used instead of search, and resultCount in SearchDocumentsRequest must be set. In addition, all other input fields related to search (pagination, histograms, etc.) will be ignored.

customPropertyFilter
(deprecated)

string

This filter specifies a structured syntax to match against the [PropertyDefinition].[isFilterable][] marked as true. The syntax for this expression is a subset of SQL syntax.

Supported operators are: =, !=, <, <=, >, and >= where the left of the operator is a property name and the right of the operator is a number or a quoted string. You must escape backslash (\) and quote (") characters. Supported functions are LOWER([propertyName]) to perform a case insensitive match and EMPTY([propertyName]) to filter on the existence of a key.

Boolean expressions (AND/OR/NOT) are supported up to 3 levels of nesting (for example, "((A AND B AND C) OR NOT D) AND E"), a maximum of 100 comparisons or functions are allowed in the expression. The expression must be < 6000 bytes in length.

Sample Query: (LOWER(driving_license)="class \"a\"" OR EMPTY(driving_license)) AND driving_years > 10

timeFilters[]

object (TimeFilter)

Documents created/updated within a range specified by this filter are searched against.

documentSchemaNames[]

string

This filter specifies the exact document schema Document.document_schema_name of the documents to search against.

If a value isn't specified, documents within the search results are associated with any schema. If multiple values are specified, documents within the search results may be associated with any of the specified schemas.

At most 20 document schema names are allowed.

propertyFilter[]

object (PropertyFilter)

This filter specifies a structured syntax to match against the PropertyDefinition.is_filterable marked as true. The relationship between the PropertyFilters is OR.

fileTypeFilter

object (FileTypeFilter)

This filter specifies the types of files to return: ALL, FOLDER, or FILE. If FOLDER or FILE is specified, then only either folders or files will be returned, respectively. If ALL is specified, both folders and files will be returned.

If no value is specified, ALL files will be returned.

folderNameFilter

string

Search all the documents under this specified folder. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}.

documentNameFilter[]

string

Search the documents in the list. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}.

queryContext[]

string

For custom synonyms. Customers provide the synonyms based on context. One customer can provide multiple set of synonyms based on different context. The search query will be expanded based on the custom synonyms of the query context set. By default, no custom synonyms wll be applied if no query context is provided. It is not supported for CMEK compliant deployment.

documentCreatorFilter[]

string

The exact creator(s) of the documents to search against.

If a value isn't specified, documents within the search results are associated with any creator. If multiple values are specified, documents within the search results may be associated with any of the specified creators.

customWeightsMetadata

object (CustomWeightsMetadata)

To support the custom weighting across document schemas, customers need to provide the properties to be used to boost the ranking in the search request. For a search query with CustomWeightsMetadata specified, only the RetrievalImportance for the properties in the CustomWeightsMetadata will be honored.

TimeFilter

Filter on create timestamp or update timestamp of documents.

JSON representation
{
  "timeRange": {
    object (Interval)
  },
  "timeField": enum (TimeField)
}
Fields
timeRange

object (Interval)

timeField

enum (TimeField)

Specifies which time field to filter documents on.

Defaults to [TimeField.UPLOAD_TIME][].

Interval

Represents a time interval, encoded as a timestamp start (inclusive) and a timestamp end (exclusive).

The start must be less than or equal to the end. When the start equals the end, the interval is empty (matches no time). When both start and end are unspecified, the interval matches any time.

JSON representation
{
  "startTime": string,
  "endTime": string
}
Fields
startTime

string (Timestamp format)

Optional. Inclusive start of the interval.

If specified, a timestamp matching this interval will have to be the same or after the start.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

endTime

string (Timestamp format)

Optional. Exclusive end of the interval.

If specified, a timestamp matching this interval will have to be before the end.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

TimeField

time field used in TimeFilter.

Enums
TIME_FIELD_UNSPECIFIED Default value.
CREATE_TIME Earliest document create time.
UPDATE_TIME Latest document update time.
DISPOSITION_TIME time when document becomes mutable again.

PropertyFilter

JSON representation
{
  "documentSchemaName": string,
  "condition": string
}
Fields
documentSchemaName

string

The Document schema name Document.document_schema_name. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}.

condition

string

The filter condition. The syntax for this expression is a subset of SQL syntax.

Supported operators are: =, !=, <, <=, >, >=, and ~~ where the left of the operator is a property name and the right of the operator is a number or a quoted string. You must escape backslash (\) and quote (") characters.

~~ is the LIKE operator. The right of the operator must be a string. The only supported property data type for LIKE is textValues. It provides semantic search functionality by parsing, stemming and doing synonyms expansion against the input query. It matches if the property contains semantic similar content to the query. It is not regex matching or wildcard matching. For example, "property.company ~~ "google"" will match records whose property property.compnay have values like "Google Inc.", "Google LLC" or "Google Company".

Supported functions are LOWER([propertyName]) to perform a case insensitive match and EMPTY([propertyName]) to filter on the existence of a key.

Boolean expressions (AND/OR/NOT) are supported up to 3 levels of nesting (for example, "((A AND B AND C) OR NOT D) AND E"), a maximum of 100 comparisons or functions are allowed in the expression. The expression must be < 6000 bytes in length.

Only properties that are marked filterable are allowed (PropertyDefinition.is_filterable). Property names do not need to be prefixed by the document schema id (as is the case with histograms), however property names will need to be prefixed by its parent hierarchy, if any. For example: top_property_name.sub_property_name.

Sample Query: (LOWER(driving_license)="class \"a\"" OR EMPTY(driving_license)) AND driving_years > 10

CMEK compliant deployment only supports:

  • Operators: =, <, <=, >, and >=.
  • Boolean expressions: AND and OR.

FileTypeFilter

Filter for the specific types of documents returned.

JSON representation
{
  "fileType": enum (FileType)
}
Fields
fileType

enum (FileType)

The type of files to return.

FileType

representation of the types of files.

Enums
FILE_TYPE_UNSPECIFIED Default document type. If set, disables the filter.
ALL Returns all document types, including folders.
FOLDER Returns only folders.
DOCUMENT Returns only non-folder documents.
ROOT_FOLDER Returns only root folders.

CustomWeightsMetadata

To support the custom weighting across document schemas.

JSON representation
{
  "weightedSchemaProperties": [
    {
      object (WeightedSchemaProperty)
    }
  ]
}
Fields
weightedSchemaProperties[]

object (WeightedSchemaProperty)

List of schema and property name. Allows a maximum of 10 schemas to be specified for relevance boosting.

WeightedSchemaProperty

Specifies the schema property name.

JSON representation
{
  "documentSchemaName": string,
  "propertyNames": [
    string
  ]
}
Fields
documentSchemaName

string

The document schema name.

propertyNames[]

string

The property definition names in the schema.

HistogramQuery

The histogram request.

JSON representation
{
  "histogramQuery": string,
  "requirePreciseResultSize": boolean,
  "filters": {
    object (HistogramQueryPropertyNameFilter)
  }
}
Fields
histogramQuery

string

An expression specifies a histogram request against matching documents for searches.

See SearchDocumentsRequest.histogram_queries for details about syntax.

requirePreciseResultSize

boolean

Controls if the histogram query requires the return of a precise count. Enable this flag may adversely impact performance.

Defaults to true.

filters

object (HistogramQueryPropertyNameFilter)

Optional. Filter the result of histogram query by the property names. It only works with histogram query count('FilterableProperties'). It is an optional. It will perform histogram on all the property names for all the document schemas. Setting this field will have a better performance.

HistogramQueryPropertyNameFilter

JSON representation
{
  "documentSchemas": [
    string
  ],
  "propertyNames": [
    string
  ],
  "yAxis": enum (HistogramYAxis)
}
Fields
documentSchemas[]

string

This filter specifies the exact document schema(s) Document.document_schema_name to run histogram query against. It is optional. It will perform histogram for property names for all the document schemas if it is not set.

At most 10 document schema names are allowed. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}.

propertyNames[]

string

It is optional. It will perform histogram for all the property names if it is not set. The properties need to be defined with the isFilterable flag set to true and the name of the property should be in the format: "schemaId.propertyName". The property needs to be defined in the schema. Example: the schema id is abc. Then the name of property for property MORTGAGE_TYPE will be "abc.MORTGAGE_TYPE".

yAxis

enum (HistogramYAxis)

By default, the yAxis is HISTOGRAM_YAXIS_DOCUMENT if this field is not set.

HistogramYAxis

The result of the histogram query count('FilterableProperties') using HISTOGRAM_YAXIS_DOCUMENT will be: invoice_id: 2 address: 1 payment_method: 2 line_item_description: 1

Enums
HISTOGRAM_YAXIS_DOCUMENT count the documents per property name.
HISTOGRAM_YAXIS_PROPERTY count the properties per property name.

TotalResultSize

The total number of matching documents.

Enums
TOTAL_RESULT_SIZE_UNSPECIFIED Total number calculation will be skipped.
ESTIMATED_SIZE Estimate total number. The total result size will be accurated up to 10,000. This option will add cost and latency to your request.
ACTUAL_SIZE It may adversely impact performance. The limit is 1000,000.

MatchingDocument

Document entry with metadata inside SearchDocumentsResponse

JSON representation
{
  "document": {
    object (Document)
  },
  "searchTextSnippet": string,
  "qaResult": {
    object (QAResult)
  }
}
Fields
document

object (Document)

Document that matches the specified SearchDocumentsRequest. This document only contains indexed metadata information.

searchTextSnippet

string

Contains snippets of text from the document full raw text that most closely match a search query's keywords, if available. All HTML tags in the original fields are stripped when returned in this field, and matching query keywords are enclosed in HTML bold tags.

If the question-answering feature is enabled, this field will instead contain a snippet that answers the user's natural-language query. No HTML bold tags will be present, and highlights in the answer snippet can be found in QAResult.highlights.

qaResult

object (QAResult)

Experimental. Additional result info if the question-answering feature is enabled.

QAResult

Additional result info for the question-answering feature.

JSON representation
{
  "highlights": [
    {
      object (Highlight)
    }
  ],
  "confidenceScore": number
}
Fields
highlights[]

object (Highlight)

Highlighted sections in the snippet.

confidenceScore

number

The calibrated confidence score for this document, in the range [0., 1.]. This represents the confidence level for whether the returned document and snippet answers the user's query.

Highlight

A text span in the search text snippet that represents a highlighted section (answer context, highly relevant sentence, etc.).

JSON representation
{
  "startIndex": integer,
  "endIndex": integer
}
Fields
startIndex

integer

Start index of the highlight.

endIndex

integer

End index of the highlight, exclusive.

HistogramQueryResult

Histogram result that matches HistogramQuery specified in searches.

JSON representation
{
  "histogramQuery": string,
  "histogram": {
    string: string,
    ...
  }
}
Fields
histogramQuery

string

Requested histogram expression.

histogram

map (key: string, value: string (int64 format))

A map from the values of the facet associated with distinct values to the number of matching entries with corresponding value.

The key format is:

  • (for string histogram) string values stored in the field.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.