- Using prospective search
- Topics and result keys
- Creating documents
- Supported types for property values
- Query language overview
- Receiving match responses
Prospective search is a querying service that allows your application to match search queries against real-time data streams. For every document presented, prospective search returns the ID of every registered query that matches the document.
Prospective search allows you to register a large set of queries and simultaneously match the queries against a single document. It is particularly useful for applications that process streaming data, for example:
- Applications that match against all the updates on a social networking service, or against high-frequency comments in a chat room.
- Applications that process data sources that provide notification, monitoring, or filtering services.
To understand prospective search, it's helpful to compare it to the conventional retrospective search model. In a retrospective search application, such as Google search, the application must build, or have access to, an index of the data to be searched. Needing to pre-index the data makes it difficult and expensive to create real-time applications, because each query must be executed separately against a potentially large index.
In a prospective search application, such as Google Alerts, you register search queries and match them against new documents in real time, as the documents are inserted into your application. This allows you to create applications that efficiently monitor incoming live data. You are not limited to using existing, indexed data.
Applications often use both retrospective and prospective search capabilities to get the best of both worlds. For example, an application can use retrospective search to find matching documents indexed in the past while using prospective search to find matching documents as soon as they arrive.
Using prospective search
The life of a typical prospective search application looks something like this:
- You decide on the appropriate document schema. Your choice will depend on the type of source data that the application is designed to handle.
- The application uses the
subscribe()call to register query subscriptions with prospective search using the query language.
- The application converts items in the streaming source data into documents, which are instances of
- The application uses the
match()call to present documents to prospective search for matching against subscribed queries.
- Prospective search returns matching subscription IDs and documents in the task queue. These results are subject to the usual Quotas and Limits.
Here's a summary of the essential function calls:
||Returns information about a single subscription such as the state, the query, the expiration time, and the subscription ID.|
||Returns information about a specified number of subscriptions, such as the state, the query, the expiration time, and the subscription ID.|
||Lists all topics currently in existence.|
||Matches all subscriptions within a topic. Returns results in the Task Queue rather than returning them directly, to ensure that the application can scale.|
||Registers subscriptions made up of a subscription ID and a query for a given topic. Expect a delay of a few seconds between when
||Removes a subscription.|
Topics and result keys
Prospective search applications may match queries against one or more streams of documents. Developers separate streams of documents by assigning a unique topic to documents they want grouped together and matched against a given set of queries. Generally, developers assign the same topic to documents of the same schema or format, but this convention is not enforced.
Topics are not defined as a separate step; instead, topics are created as a side effect of the
subscribe() call. As soon as a new topic is passed to
subscribe(), the topic exists. As soon as the last subscription using a given topic is deleted, the topic ceases to exist.
Documents are assigned to a particular topic when calling
match(). The topic name can either be explicitly specified to
match() or is taken from the class name of the document. See Creating Documents.
list_topics() to list all topics that currently exist.
You may also specify a
result_key argument in the
match() call that is returned with the matching results. A
result_key can be useful if you know, for example, that returned documents are too large for the task queue. In this case, you can choose to store the documents in a database and use the identifying
result_key to retrieve them later.
The document is a class derived from
db.Model. It contains a set of properties which correspond to fields, and queries can match against these fields. For example, the following code sample creates a definition using
class Comment(db.Model): author = db.StringProperty() body = db.TextProperty() length = db.IntegerProperty()
The example document will have the topic "Comment" derived from the class name, unless it is explicitly overwritten in the
match() call. The document defines two string fields named
body, and one integer field named
Here's how to populate the
db.Model object with data from a data source and create an instance of the document:
comment = Comment() comment.author = "Rose Jones" comment.body = "A rose by any other name would smell as sweet." comment.length = len(comment.body)
This example stores a string, text, and an integer in the appropriate fields.
Supported types for property values
Prospective search matches the following properties:
Prospective search also supports list properties. Conditions on list properties check all values in the list and match any matching value in the list. The following list properties are supported:
db.ListProperty, prospective search supports the following types:
int(32-bit int range only)
Query language overview
Prospective search uses a simple query language allowing you to query the contents of a document's fields. This query language supports numeric and text expressions and uses a
field:value syntax. The field identifies the name of a property defined as part of the
Entity or derived document class. The value defines the query on the specified field—a string or numerical value. Text fields and queries can be unicode strings.
Prospective search supports all space-delimited languages. Prospective search supports some languages not segmented by spaces (specifically, Chinese, Japanese, Korean, and Thai). For these languages, prospective search segments the text automatically.
The simplest type of query consists only of a string or text type value. The value can be a word or phrase to be matched against any supported string or text fields in the document. Queries are not case sensitive.
For example, to find all documents with the word "rose" (regardless of case) in any string or text field in the document, use a query like the following:
This simple query matches against any supported string or text field in the document. If your documents are "Comments" as defined in the Creating Documents section, the query matches if the word "rose" appears in the author or body fields. If the schema defines additional string or text fields, such as a subject or email,
rose also matches the contents of those fields.
To match a phrase, surround the query in quotes as follows:
"any other name"
Queries on fields
To create more complex queries that reference specific fields, use both the field and value in your query. Use a colon to delineate the two as follows:
This syntax allows you to reference any supported field defined in a schema by name. For example, to search for "Rose" only within the author field of a comment document as defined in Creating Documents, use the following query:
To search the body field for the phrase "any other name", use the following query:
body:"any other name"
To match against multiple fields at the same time, list a series of
field:value pairs together with a space between them as follows:
author:"Rose Jones" body:rose
The query language supports a number of Boolean operators as well as parentheses for grouping parts of the query together. The supported Boolean operators are
NOT. Always use uppercase for Boolean operators. Lowercase words are treated as part of the field or value portions of the query.
By default, when you create queries that match multiple fields at the same time, each
value is combined with a Boolean
AND. For the query as a whole to match, all the specified values must match.
You can also explicitly specify this by using the
AND Boolean operator. The following two queries are equivalent:
author:rose body:"any other name" author:rose AND body:"any other name"
OR operator if you only want to know if any of the two values matches. You can use more than one OR in a query. For example:
author:("bob" OR ("rose" OR "tom") AND "jones")
This example matches any document whose
author field contains either "Rose Jones", "Tom Jones", or "Bob".
For an example of Boolean
NOT, see the following:
author:rose NOT body:filligree
This example matches any document whose
author field contains "rose" but whose
body field does not contain "filligree".
Use parentheses to create more complex queries combining supported Boolean operators.
(author:Thomas OR author:Jones) AND (NOT body:rose)
This example matches documents with author "Thomas" or "Jones" only if the
body field of the comment does not contain "rose".
Numeric operators only match against numeric fields. Supported numeric operators are as follows:
< > <= >= =
For "not equal to", use the Boolean NOT with a numeric field name such as
length. For example:
NOT length = 15
This example returns documents whose
length is not 15.
You can combine numeric operators with text and Boolean operators. For example:
author:"Rose Jones" length > 15
This query matches comments whose
body field is longer than 15 characters in length and whose
author field is "Rose Jones".
Receiving match responses
The Prospective Search API returns match results by creating events on the TaskQueue. This section describes how to process the match events.
The Match method defines which TaskQueue to use, how many subscription ids per TaskQueue task, and what additional information to send (such as the document itself, or a key to identify the document).
To receive the resulting matching subscription ids, first, you must map the request handler to your match response handler:
def main(argv): app = webapp2.WSGIApplication([('/', MainHandler), ('/_ah/prospective_search', MatchResponseHandler)], debug=True)
In MatchResponseHandler you can access parameters of the POST request which includes the matching subscription IDs and the document sent for matching:
class MatchResponseHandler(webapp2.RequestHandler): """MatchResponseHandler receives match results from TaskQueue.""" def post(self): # List of subscription ids that matched for match. sub_ids = self.request.get_all('id') # document from match request, either a python dict or db.Model # if result_return_document = true in Match call doc = prospective_search.get_document(self.request) # topic from match request topic = self.request.get('topic') # Key specified in match call. key = self.request.get_all('key') # Number of total matching subscriptions from match request # which generated this result event. results_count = self.request.get_all('results_count') # Index of 1st subscription in this match result batch. # 0 <= result_offset < results_count. results_offset = self.request.get_all('results_offset')