The Index Class

Class Index represents an index allowing documents to be indexed, deleted, and searched.

Index is defined in the google.appengine.api.search module.

Introduction

The Index class provides arguments to construct an index as well as functions allowing you to add, list, search, and delete documents (or an iterable collection of documents) within the index. You construct an index using arguments to the Index class, including the name and namespace of the index.

The following code shows how to put documents into an index, then search it for documents matching a query:

# Get the index.
index = search.Index(name='index-name')

# Create a document.
doc = search.Document(
    doc_id='document-id',
    fields=[search.TextField(name='subject', value='my first email'),
            search.HtmlField(name='body', value='<html>some content here</html>')])

# Index the document.
try:
    index.put(doc)
except search.PutError, e:
    result = e.results[0]
    if result.code == search.OperationResult.TRANSIENT_ERROR:
        # possibly retry indexing result.object_id
except search.Error, e:
    # possibly log the failure

# Query the index.
try:
    results = index.search('subject:first body:here')

    # Iterate through the search results.
    for scored_document in results:
        # process the scored_document

except search.Error, e:
    # possibly log the failure

Constructor

The constructor for class Index is defined as follows:

Index(name, namespace=None)

Construct an instance of class Index.

Arguments

name

Index name (see name property, below, for details).

namespace

For multitenant applications, the namespace in which index name is defined.

Result value

A new instance of class Index.

Properties

An instance of class Index has the following properties:

schema

Schema mapping field names to the list of types supported. Valid only for indexes returned by the search.get_indexes method.

name

Index name, a human-readable ASCII string identifying the index. Must contain no whitespace characters and not start with an exclamation point (!).

namespace

Namespace in which index name is defined.

storage_usage

The approximate number of bytes used by this index. The number may not reflect the results of recent changes. Valid only for indexes returned by the search.get_indexes method.

storage_limit

The maximum allowable storage for this index, in bytes. Valid only for indexes returned by the search.get_indexes method.

Instance Methods

Instances of class Index have the following methods:

put(self, documents, deadline=None)

If the specified documents have already been put into the index, and if they have the same doc_ids, they are reindexed with updated contents.

Arguments

documents

Document (or iterable collection of documents) to index.

deadline

Deadline for RPC call in seconds.

Result value

List of results (PutResult), one for each document requested to be indexed.

Exceptions

PutError

One or more documents failed to index, or number indexed did not match number requested.

TypeError

Unknown attribute passed.

ValueError

Argument not a document or iterable collection of documents, or number of documents larger than MAXIMUM_DOCUMENTS_PER_PUT_REQUEST.

delete(self, document_ids, deadline=None)

Delete documents from index.

If no document exists for an identifier in the list, that identifier is ignored.

Arguments

document_ids

Identifier (or list of identifiers) of documents to delete.

deadline

Deadline for RPC call in seconds.

Exceptions

DeleteError

One or more documents failed to delete, or number deleted did not match number requested.

ValueError

Argument not a string or iterable collection of valid document identifiers, or number of document identifiers larger than MAXIMUM_DOCUMENTS_PER_PUT_REQUEST.

get(self,doc_id, deadline=None)

Retrieves a Document from the index using the document's identifier. If the document is not found, returns None.

Arguments

doc_id

The identifier of the document to retrieve.

deadline

Deadline for RPC call in seconds.

Result value

A Document object whose identifier matches the one supplied by doc_id.

Search the index for documents matching the query. The query may be either a string or a Query object.

For example, the following code fragment requests a search for documents where 'first' occurs in subject and 'good' occurs anywhere, returning at most 20 documents, starting the search from 'cursor token', returning another single cursor for the response, sorting by subject in descending order, returning the author, subject, and summary fields as well as a snippeted field content.

results = index.search(
          # Define the query by using a Query object.
          query=Query('subject:first good',
              options=QueryOptions(limit=20,
                  cursor=Cursor(),
                  sort_options=SortOptions(
                      expressions=[SortExpression(expression='subject',
                                                  default_value='')],
                      limit=1000),
                  returned_fields=['author', 'subject', 'summary'],
                  snippeted_fields=['content'])))

The following code fragment shows how to use a results cursor.

cursor = results.cursor
for result in results:
     # process result
results = index.search(Query('subject:first good',
                       options=QueryOptions(cursor=cursor))
                      )

The following code fragment shows how to use a per_result cursor:

results = index.search(query=Query('subject:first good',
                       options=QueryOptions(limit=20,
                       cursor=Cursor(per_result=True),
                       ...))
                       )

cursor = None
for result in results:
    cursor = result.cursor

results = index.search(
          Query('subject:first good', options=QueryOptions(cursor=cursor))
               )

Arguments

query

The query to match against documents in the index, described in a Query object. For more information, please see the Query Language Overview.

deadline

Deadline for RPC call in seconds.

Result value

A SearchResults object containing a list of documents matched, number returned and number matched by the query.

Exceptions

TypeError

A parameter has an invalid type, or an unknown attribute was passed.

ValueError

A parameter has an invalid value.

get_range(self, start_id=None, include_start_object=True, limit=100, ids_only=False, deadline=None)

Get a range of documents from an index, in doc_id order.

Arguments

start_id

String containing the document identifier from which to list documents. By default, starts at the first document identifier.

include_start_object

If true, include document specified by start_id.

limit

Maximum number of documents to return.

ids_only

If true, return only document identifiers instead of full documents.

deadline

Deadline for RPC call in seconds.

Result value

A GetResponse object containing a list of the retrieved documents, ordered by document identifier.

Exceptions

TypeError

Unknown attribute passed.

Error

Some subclass of Error occurred while processing request.