Method: projects.datasets.query

Execute an Inference query over a loaded dataset.

HTTP request

POST https://infer.googleapis.com/v1/{name=projects/*/datasets/*}:query

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters
name

string

Loaded dataset to be queried.

Authorization requires the following Google IAM permission on the specified resource name:

  • infer.datasets.query

Request body

The request body contains data with the following structure:

JSON representation
{
  "queries": [
    {
      object(QueryRequest)
    }
  ]
}
Fields
queries[]

object(QueryRequest)

Queries to be executed against the loaded dataset. Only data sources which are part of the specified dataset are used during the query. Each query can be independent of the other queries from the same batch.

Response body

If successful, the response body contains data with the following structure:

Response for a batch of queries executed by the system.

JSON representation
{
  "name": string,
  "results": [
    {
      object(QueryResponse)
    }
  ]
}
Fields
name

string

Loaded dataset that was queried.

results[]

object(QueryResponse)

Query results.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

QueryRequest

A query we want to execute against a dataset.

JSON representation
{
  "query": {
    object(QueryNode)
  },
  "distributionConfigs": [
    {
      object(DataDistributionConfig)
    }
  ],
  "restrictStartTime": string,
  "restrictEndTime": string
}
Fields
query

object(QueryNode)

The query...

distributionConfigs[]

object(DataDistributionConfig)

Per data name result configuration.

restrictStartTime

string (Timestamp format)

Only accumulate terms from restrictStartTime.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

restrictEndTime

string (Timestamp format)

Only accumulate terms until restrictEndTime.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

QueryNode

A query node.

JSON representation
{
  "type": enum(Type),
  "children": [
    {
      object(QueryNode)
    }
  ],
  "term": {
    object(QueryTerm)
  },

  // Union field node_options can be only one of the following:
  "andNodeOptions": {
    object(AndNodeOptions)
  },
  "orNodeOptions": {
    object(OrNodeOptions)
  },
  "termNodeOptions": {
    object(TermNodeOptions)
  }
  // End of list of possible types for union field node_options.
}
Fields
type

enum(Type)

Query node type.

children[]

object(QueryNode)

Should ONLY be set for 'AND' and 'OR' nodes; otherwise we will return a BAD_ARGUMENTS error for this query.

term

object(QueryTerm)

Should ONLY be set for 'TERM' nodes; otherwise we will return a BAD_ARGUMENTS error for this query.

Union field node_options. Node options, that can be specifed for AND, OR and TERM nodes respectively. If the wrong options are specified for the wrong Node type we will return a BAD_ARGUMENTS error. node_options can be only one of the following:
andNodeOptions

object(AndNodeOptions)

AND node options.

orNodeOptions

object(OrNodeOptions)

OR node options.

termNodeOptions

object(TermNodeOptions)

TERM node options.

Type

Type of a query node.

Enums
TYPE_UNSPECIFIED No such type.
TYPE_TERM TERM node (simple entry from an input data source).
TYPE_AND AND node (it requires groups to match ALL its non-optional children).
TYPE_OR OR node (it requires groups to match ANY of its children).

QueryTerm

A query term is a simple mapping between a data name (e.g. pressure) and a data item value (e.g. '110'). A query consisting of a single non-optional query term will ONLY match documents that contain that term (e.g. sensor data groups where {dataName: 'pressure' dataValue: '110psi'} exists are matched and scored).

JSON representation
{
  "name": string,
  "value": string
}
Fields
name

string

Data name (e.g. 'pressure', 'temperature').

value

string

Data item value (e.g. '110psi', '70F').

AndNodeOptions

Options that AND nodes can have.

JSON representation
{
  "maxTimespan": string,
  "strictOrder": boolean
}
Fields
maxTimespan

string (Duration format)

If set, it limits the maximum time span between ANY terms in this match node.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

strictOrder

boolean

If set, it limits documents scored only to documents that contain this node's children in strict ordering (e.g. think of only matching documents that contain the sequence ( {dataName: 'pressure' dataValue:'110psi'} AND {dataName: 'temperature' dataValue: '70 C'}. NOTE: this ONLY works for AND nodes that have ONLY TERM children.

OrNodeOptions

Options that OR nodes can have.

TermNodeOptions

Options that TERM nodes can have.

JSON representation
{
  "optional": boolean,
  "matchRatio": number,
  "matchWeight": number
}
Fields
optional

boolean

If set, this term node is not required to be present in the matched group.

matchRatio

number

If set, it controls the percentage of groups from our scoring set that include this term. This is useful for enforcing diversity of matched entries in the queried datasets. Note: it ONLY works for OR child nodes of type 'TERM' (e.g. the query: {dataName: 'pressure' dataValue: '110psi' matchRatio: 0.2} OR {dataName: 'pressure' dataValue: '1500psi' matchRatio: 0.8} restricts the number of matched groups to 20% for groups where the pressure is the usual reading of 110 psi).

matchWeight

number

How much weight we should give in the final score to this term. By default each term has an equal weight of 1.

DataDistributionConfig

Parameters for how we should compute the returned distribution of events for a data source present in the loaded dataset.

JSON representation
{
  "dataName": string,
  "maxResultEntries": number,
  "bgprobExp": number,
  "maxBeforeTimespan": string,
  "maxAfterTimespan": string
}
Fields
dataName

string

Data name. If the distribution name is 'default' we will use this spec forall distributions that DO NOT have an explicit distribution config set. In that case we will also score and return results for all data types present in the queried data set (and this will make the query more expensive!).

maxResultEntries

number

TopN DataDistributionEntries to return for the query.

bgprobExp

number

Parameter used to compute the score of a distribution entry. Must be between [0.0, 1.0]. If set to 0, the score returned by the system will be raw conditional probability; if set to 1, the score returned by the system will be pure lift. It essentially controls how much the background probability of the event gets used for scoring a returned distribution entry for the event given a query.

maxBeforeTimespan

string (Duration format)

Maximum time before a match for a query term that we want to accumulate evidence for this distribution.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

maxAfterTimespan

string (Duration format)

Maximum time after a match for a query term that we want to accumulate evidence for this distribution.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

QueryResponse

Response to a query.

JSON representation
{
  "distributions": [
    {
      object(DataDistribution)
    }
  ],
  "error": {
    object(Status)
  }
}
Fields
distributions[]

object(DataDistribution)

Distributions requested via DataDistributionConfig which we could compute given the dataset and incoming query.

error

object(Status)

Query execution status.

DataDistribution

Each DataSource present in the loaded DataSet can be used to generate a distribution of events for query matches. We 'count' (or 'accumulate counts') around EACH TERM Node from the incoming query.

JSON representation
{
  "dataName": string,
  "matchedGroupCount": string,
  "totalGroupCount": string,
  "entries": [
    {
      object(DataDistributionEntry)
    }
  ],
  "debugInfo": string
}
Fields
dataName

string

Data name (e.g. 'pressure').

matchedGroupCount

string (int64 format)

Number of groups that match the incoming query.

totalGroupCount

string (int64 format)

Number of groups present in the dataset.

entries[]

object(DataDistributionEntry)

Entries returned for this data distribution.

debugInfo

string

Debug information for this distribution's computation.

DataDistributionEntry

An entry from a computed distribution for a data source present in the loaded dataset given a query.

JSON representation
{
  "value": string,
  "score": number,
  "matchedGroupCount": string,
  "totalGroupCount": string
}
Fields
value

string

Data item value (e.g. '110psi').

score

number

Data item score. Score(event|query) = P(event|query) / (P(event) ^ bgprobExp)

matchedGroupCount

string (int64 format)

Number of groups matched that contain this entry.

totalGroupCount

string (int64 format)

Number of groups that contain this entry.