Faceted search gives you the ability to attach categorical information to your documents. A facet is an attribute/value pair. For example, the facet named "size" might have values "small", "medium", and "large."
By using facets with search, you can retrieve summary information to help you refine a query and "drill down" into your results in a series of steps.
This is useful for applications like shopping sites, where you intend to offer a set of filters for customers to narrow down the products that they want to see.
The aggregated data for a facet shows you how a facet's values are distributed. For instance, the facet "size" may appear in many of the documents in your result set. The aggregated data for that facet might show that the value "small" appeared 100 times, "medium" 300 times, and "large" 250 times. Each facet/value pair represents a subset of documents in the query result. A key, called a refinement, is associated with each pair. You can include refinements in a query to retrieve documents that match the query string and that have the facet values corresponding to one or more refinements.
When you perform a search, you can choose which facets to collect and show with the results, or you can enable facet discovery to automatically select the facets that appear most often in your documents.
Adding facets to a document
Add facets to a document before you add the document to an index. Do this at the same time you specify the document's fields:
A facet is similar to a document field; it has a name, and takes one value.
Facet names follow the same rules as document fields: Names are case sensitive and can only contain ASCII characters. They must start with a letter and can contain letters, digits, or underscore. A name cannot be longer than 500 characters.
The value of a facet can be either an atomic string (no longer than 500 characters) or a number (a double precision floating point value between -2,147,483,647 and 2,147,483,647).
You can assign multiple values to a facet on one document by adding a facet with the same name and type many times, using a different value each time.
There is no limit to the number of values a facet can have. There is also no limit to the number of facets that you can add to a document or the number of uniquely-named facets in an index.
Note that each time you use a facet, it can take either an atomic or numeric value. A facet with the name "size" can be attached to one document with the string value "small" and another document with the numeric value 8. In fact, the same facet can appear multiple times on the same document with both kinds of values. We do not recommend using both atom and number values for the same facet, even though it is allowed.
While a facet has a specific type when you add it to a document, the search results gather all of its values together. For example, the results for facet "size" might show that there were 100 instances of the value "small", 150 instances of "medium", and 135 instances of numeric values in the range [4, 8). The exact numeric values and their frequency distribution are not shown.
When you retrieve a document using a query, you cannot directly access its facets and values. You must request that facet information be returned with your query, as explained in the next section.
Using a faceted search to retrieve facet information
You can ask the search backend to discover the most frequently used facets for you, this is called automatic facet discovery. You can also retrieve facet information explicitly by selecting a facet by name, or by name and value. You can mix and match all three kinds of facet retrieval in a single query.
Asking for facet information will not affect the documents your query returns. It can affect performance. Performing a faceted search with the default depth of 1000 has the same effect as setting the sort options scorer limit to 1000.
Automatic facet discovery
Automatic facet discovery looks for the facets that appear most often in the aggregate in your documents. For example, suppose the documents matching your query include a "color" facet that appears 5 times with the value "red", 5 times with the value "white", and 5 times with the color "blue". This facet has a total count of 15. For the purposes of discovery, it would be ranked higher than another facet "shade" that appears in the same matching documents 6 times with the value "dark" and 7 times with the value "light".
You must enable facet discovery by setting it in your Query:
When you retrieve facets by discovery, by default only the 10 most often occurring values for a facet will be returned.
You can increase this limit up to 100 using the FacetOptions
discovery_limit
parameter.
Note that automatic facet discovery is not meant to return all possible facets
and their values. Facets returned from discovery may vary from run to run. If
a fixed set of facets is desired, use a return_facets
parameter on your query.
String values will be returned individually. The numeric values of a discovered facet are returned in a single range [min max). You can examine this range and create a smaller subrange for a later query.
Selecting facets by name
To retrieve information about a facet by its name only, add areturn_facets
parameter to your
query, including the facet name in the list:
When you retrieve facets by name, by default only the 10 most often occurring values for a facet will be returned.
You can increase this limit up to 20 using the FacetOptions
discovery_value_limit
parameter.
Selecting facets by name and value
To retrieve information only about particular values of a facet, add areturn_facets
parameter
that includes a FacetRequest
object with a values list:
The values in a single FacetRequest
must all be the same type, either a list
of string values or, for numbers, a list of FacetRanges
, which are intervals
that are closed on the left (start), and open on the right (end). If your
facet has a mix of string and number values, add separate FacetRequests for
each.
Options
You can control faceted search by adding thefacet_options
parameter to a Query call.
This parameter takes a single instance of FacetOptions
. Use this parameter to override the
default behavior of faceted search.
options = FacetOptions(discover_facet_limit=5,
discover_facet_value_limit=10,
depth=6000);
Parameter | Description | Default |
---|---|---|
discover_facet_limit |
Number of facets to discover if facet discovery is turned on. If 0, facet discovery will be disabled. | 10 |
discover_facet_value_limit |
Number of values to be returned for each of the top discovered facets. | 10 |
depth |
The minimum number of documents in query results to evaluate to gather facet information. | 1000 |
The depth
option applies to all three kinds of facet aggregation: by name, name and value, and auto-discovery.
The other options are for auto-discovery only.
Note that facet depth is usually much greater than the query limit. Facet results are computed to at least the depth number of documents. If you have set the sort options scoring limit higher than depth, than the scoring limit will be used instead.
Retrieving facet results
When you use faceted search parameters in a query, the aggregated facet information comes with the query result itself.
A query will have a list of FacetResult
.
There will be one result in the list for each facet
that appeared in a document that matched your query. For each result, you'll get:
- The facet name
- A list of the most frequent values for the facet. For each value there is an approximate count of how many times it appeared and a refinement key that can be used to retrieve the documents that match this query and facet value.
Note that the values list will include a facet's string and numeric values. If the facet was auto-discovered, its numeric values are returned as a single interval [min max). If you explicitly asked for a numeric facet with one or more ranges in your query, the list will contain one closed-open interval [start end) for each range.
The list of facet values might not include all of the values found in your documents, since query options determine how many documents to examine and how many values to return.
The aggregated information for each facet can be read from the search results:
query = search.Query(...)
results = index.search(query)
for facet_info in results.facets:
...
For example, a query may have found documents that included a "size" facet with the string values and numeric values. The FacetResult for this facet will be constructed like this:
FacetResult(name='size', values=[
FacetResultValue(label='[8, 10)', 22, refinement=refinement_key),
FacetResultValue(label='small', 100, refinement=refinement_key),
FacetResultValue(label='medium', 300, refinement=refinement_key),
FacetResultValue(label='large', 250, refinement=refinement_key)])
The label
parameter is constructed from a facet value. For numeric values label
is the representation of a range.
The refinement_key
is a web/url safe string that can be used in a later query to retrieve the documents
matching that result's facet name and value.
Using facets to refine/filter a query
The refinement associated with eachFacetResultValue
can be used to further
narrow your results to include only documents that have those facet values. To
refine queries with one or more of these keys, pass them to the query object:
query = search.Query(..., facet_refinements=[refinement_key1, refinement_key2, refinement_key3])
You can combine refinements for one or more different facets in the same request. All the refinements belonging to the same facet are joined with an OR. Refinements for different facets are combined with AND.
It is also possible to create a custom FacetRefinement
key by hand. Please see the class documentation for more information.