Faceted Search

Faceted search gives you the ability to attach categorical information to your documents. A facet is an attribute/value pair. For example, the facet named "size" might have values "small", "medium", and "large."

By using facets with search, you can retrieve summary information to help you refine a query and "drill down" into your results in a series of steps.

This is useful for applications like shopping sites, where you intend to offer a set of filters for customers to narrow down the products that they want to see.

The aggregated data for a facet shows you how a facet's values are distributed. For instance, the facet "size" may appear in many of the documents in your result set. The aggregated data for that facet might show that the value "small" appeared 100 times, "medium" 300 times, and "large" 250 times. Each facet/value pair represents a subset of documents in the query result. A key, called a refinement, is associated with each pair. You can include refinements in a query to retrieve documents that match the query string and that have the facet values corresponding to one or more refinements.

When you perform a search, you can choose which facets to collect and show with the results, or you can enable facet discovery to automatically select the facets that appear most often in your documents.

Adding facets to a document

Add facets to a document before you add the document to an index. Do this at the same time you specify the document's fields:

package app

import (
    "appengine"
    "appengine/search"
    "net/http"
)

type ComputerDoc struct {
    Name      search.Atom
    Type      search.Atom `search:",facet"`
    RAMSizeGB float64     `search:",facet"`
}

func handlePut(w http.ResponseWriter, r *http.Request) {
    c := appengine.NewContext(r)
    index, err := search.Open("products")
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }
    _, err = index.Put(c, "doc1", &ComputerDoc{
        Name:      "x86",
        Type:      "computer",
        RAMSizeGB: 8.0,
    })
    // Handle err and write HTTP response.
}

A facet is similar to a document field; it has a name, and takes one value.

Facet names follow the same rules as document fields: Names are case sensitive and can only contain ASCII characters. They must start with a letter and can contain letters, digits, or underscore. A name cannot be longer than 500 characters.

The value of a facet can be either an atomic string (no longer than 500 characters) or a number (a double precision floating point value between -2,147,483,647 and 2,147,483,647).

You can assign multiple values to a facet on one document by adding a facet with the same name and type many times, using a different value each time.

There is no limit to the number of values a facet can have. There is also no limit to the number of facets that you can add to a document or the number of uniquely-named facets in an index.

Note that each time you use a facet, it can take either an atomic or numeric value. A facet with the name "size" can be attached to one document with the string value "small" and another document with the numeric value 8. In fact, the same facet can appear multiple times on the same document with both kinds of values. We do not recommend using both atom and number values for the same facet, even though it is allowed.

While a facet has a specific type when you add it to a document, the search results gather all of its values together. For example, the results for facet "size" might show that there were 100 instances of the value "small", 150 instances of "medium", and 135 instances of numeric values in the range [4, 8). The exact numeric values and their frequency distribution are not shown.

When you retrieve a document using a query, you cannot directly access its facets and values. You must request that facet information be returned with your query, as explained in the next section.

Using a faceted search to retrieve facet information

You can ask the search backend to discover the most frequently used facets for you, this is called automatic facet discovery. You can also retrieve facet information explicitly by selecting a facet by name, or by name and value. You can mix and match all three kinds of facet retrieval in a single query.

Asking for facet information will not affect the documents your query returns. It can affect performance. Performing a faceted search with the default depth of 1000 has the same effect as setting the sort options scorer limit to 1000.

Automatic facet discovery

Automatic facet discovery looks for the facets that appear most often in the aggregate in your documents. For example, suppose the documents matching your query include a "color" facet that appears 5 times with the value "red", 5 times with the value "white", and 5 times with the color "blue". This facet has a total count of 15. For the purposes of discovery, it would be ranked higher than another facet "shade" that appears in the same matching documents 6 times with the value "dark" and 7 times with the value "light".

You must enable facet discovery by setting it in your Query:

func handleSearch(w http.ResponseWriter, r *http.Request) {
    index, err := search.Open("products")
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }
    it := index.Search(c, "name:x86", &search.SearchOptions{
        Facets: {
            search.AutoFacetDiscovery(0, 0),
        },
    })
    facets, err := it.Facets()
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }
    for _, results := range facets {
        for i, facet := range result {
            // The facet results are grouped by facet name.
            // Print the name of each group before its values.
            if i == 0 {
                fmt.Fprintf(w, "Facet %s:\n", facet.Name)
            }
            fmt.Fprintf(w, "    %v: count=%d", facet.Value, facet.Count)
        }
    }
}

When you retrieve facets by discovery, by default only the 10 most often occurring values for a facet will be returned. You can increase this limit up to 100 using the first parameter to AutoFacetDiscovery.

Note that automatic facet discovery is not meant to return all possible facets and their values. Facets returned from discovery may vary from run to run. If a fixed set of facets is desired, use a return_facets parameter on your query.

String values will be returned individually. The numeric values of a discovered facet are returned in a single range [min max). You can examine this range and create a smaller subrange for a later query.

Selecting facets by name

To retrieve information about a facet by its name only, add a FacetDiscovery with the facet's name to the search options for your query:

it := index.Search(c, "name:x86", &search.SearchOptions{
    Facets: {
        FacetDiscovery("Type"),
        FacetDiscovery("RAMSizeGB"),
    },
})

When you retrieve facets by name, only the 10 most frequently occurring values for a facet will be returned.

Selecting facets by name and value

To retrieve information only about particular values of a facet, add a FacetDiscovery with both the facet's name and the values of interest:

it := index.Search(c, "name:x86", &search.SearchOptions{
    Facets: {
        // Fetch the "Type" facet with Values "computer" and "printer"
        FacetDiscovery("Type",
            search.Atom("computer"),
            search.Atom("printer"),
        ),
        // Fetch the "RAMSizeGB" facet with values in the ranges [0, 4), [4, 8), and [8, max]
        FacetDiscovery("RAMSizeGB",
            search.Range{Start: 0, End: 4},
            search.Range{Start: 4, End: 8},
            search.AtLeast(8),
        ),
    },
})

The values in a single FacetDiscovery must all be the same type, either a list of search.Atom values or, for numbers, a list of search.Range, which are intervals that are closed on the left (start), and open on the right (end). If your facet has a mix of string and number values, add separate FacetDisovery options for each.

Options

You can control the minimum number of documents to evaluate to gather facet information by adding the FacetDocumentDepth option in the SearchOptions for your query. If not specified, this depth defaults to 1000.

Note that facet depth is usually much greater than the query limit. Facet results are computed to at least the depth number of documents. If you have set the sort options scoring limit higher than depth, than the scoring limit will be used instead.

Retrieving facet results

When you use faceted search parameters in a query, the aggregated facet information comes with the query result itself.

The search Iterator has a method Facets, which returns the aggregated facet information as [][]FacetResult. The results are organised so that there is one slice of facet results for each facet that appeared in a document that matched your query. For each result, you'll get:

  • The facet name
  • One value for the facet from the list of the most frequent values
  • An approximate count of how many times that value appeared

Note that the values list will include a facet's string and numeric values. If the facet was auto-discovered, its numeric values are returned as a single interval [min max). If you explicitly asked for a numeric facet with one or more ranges in your query, the list will contain one closed-open interval [start end) for each range.

The list of facet values might not include all of the values found in your documents, since query options determine how many documents to examine and how many values to return.

The aggregated information for each facet can be read from the iterator:

it := index.Search(...)
facets, err := it.Facets()  // And check err != nil.
for _, results := range facets {
    for _, facet := range results {
        ...
    }
}

For example, a query may have found documents that included a "size" facet with the string values and numeric values. The results for this facet will be constructed like this:

[][]search.FacetResult{
    {
        {Name: "size", Value: search.Range{Start: 8, End: 10}, Count: 22},
        {Name: "size", Value: search.Atom("small"), Count: 100},
        {Name: "size", Value: search.Atom("medium"), Count: 300},
        {Name: "size", Value: search.Atom("large"), Count: 250},
    },
}

Using facets to refine/filter a query

Each FacetResult can be used to further narrow your results to include only documents that have those facet values. To refine queries with one or more of these keys, pass them as search options:

it := index.Search(c, "...", &search.SearchOptions{
    Refinements: []search.Facet{
        facetResult1.Facet,
        facetResult2.Facet,
    },
})

You can combine refinements for one or more different facets in the same request. All the refinements belonging to the same facet are joined with an OR. Refinements for different facets are combined with AND.

It is also possible to create a custom Facet by hand for use as a refinement. Please see the reference for more information.