Faceted search is at the heart of most complex, large-scale eCommerce sites. Not only does it play a huge role on the website, it plays a particularly significant role in those visits that can be influenced with merchandising. There are always a significant number of visits to an eCommerce site that lack any purchase intent. There are, too, a large number of visits where purchase intent is heavily focused on a specific product. Neither of these visit types is easy to influence. But for those visits and visitors who are seriously shopping, faceted search is usually at the core of the experience and analysis of facet behavior can be hugely rewarding both in terms of immediate conversion optimization and in longer term customer understanding.
That being said, most organizations do surprisingly little analysis of faceting. A big part of the reason for that is that our traditional digital analytics toolset is very poorly suited to this type of analysis – in part because of the form in which facet data is collected. The most common approach to collecting facet data is to collect the facet state. The facet state is typically a string that contains the full set of currently applied facets as a set of name value pairs.
A typical faceting session might look like this:
04122015-12:02:05 “TV Type:LED”
04122015-12:04:13 “TV Type:LED;TV Screen Size:70+”
04122015-12:04:59 “TV Type:LED;TV Screen Size:70+;Brand:Samsung”
04122015-12:05:51 “TV Type:LED;TV Screen Size:70+;Brand:Sharp”
04122015-12:07:32 “TV Type:LED;TV Screen Size:70+;Brand:Sharp;Condition:New”
04122015-12:08:25 “TV Type:LED;TV Screen Size:70+;Price:1000-2000”
Like most such strings, it’s nearly impossible to analyze this data in a digital analytics solution and, unfortunately, it’s not a heck of a lot easier even if you’ve loaded it into an analytics warehouse. It’s pretty easy, working with these strings, to figure out stuff like which users are interested in a particular brand or size of TV. That information can be pulled out simply by scanning for a match in the strings. But if you wanted to ask questions like which brand is a consumer most interested in, did a consumer narrow or expand their shopping set during a session, how often did price eliminate a brand from a customer’s selection set, or which customers are most price sensitive, you’d really struggle with the data in this format. Yet these are important and interesting questions; questions that faceting can definitely help answer. So let’s take a look at building a faceting data model.
To begin with, I’m not going to spend time on the extra data that needs to go in a facet record. You will undoubtedly want things like customer and session keys in a faceting record, but I’m only going to think about the facet data itself.
The simplest and most obvious way to clean-up the faceting data is to break-out the data so that it becomes a series of faceting actions where the facet change is broken out:
DateTime, Facet State, Facet Action (Add or Subtract), Facet Name-Value Pair
You can significantly enhance the data here by adding a few basic calculated fields to the mix. I think having a sequence number eases queries, as does having a time between facet actions and a time during which the facet is active. In our original facet data, this would generate a first row that looked like:
04122015-12:02:05 1, “TV Type:LED”, Add, “TV Type:LED”, 128 seconds, 380 seconds
This row tells us that the visitor’s first facet was TV Type. We know that the visitor spent about two minutes looking at the full LED TV list, and spent about 6 ½ minutes looking at LED TVs. That’s pretty useful. Let’s look at how the third record would map:
04122015-12:04:59 3, “TV Type:LED;TV Screen Size:70+;Brand:Samsung”, Add, “Brand Samsung”, 52, 52”
You can see that this will make it easier to answer questions like which brand a customer was most interested in, but it should also be clear that a really good answer to that question still can’t be easily retrieved from the data. Depending on how a user facets, some brands may never be exposed and others may be heavily exposed even when no brand is explicitly faceted.
It’s also hard to know which facets ultimately contained products selected by the user.
We can make many types of success analysis easier by adding some flags about subsequent success to the facet record. I think it’s useful to know if any product was purchased in the session, whether any product in the category was purchased, whether any product in the category was carted, and whether any product in this facet view as detail viewed, carted or purchased. That gives us six Boolean flags attached to every facet record that will make it much easier to determine if the visitor was in a buying session and whether or not the current facet set was ultimately used. You might also consider making the detail viewed field an integer counter instead of a Boolean. In most cases, a Boolean will work fine for all the other variables.
This is a much more convenient representation of detailed faceting behavior than the raw data contained in the text strings. As I’ve mentioned, however, it doesn’t answer every question. It’s not ideal as a representation of a customer’s overall faceting preferences. Perhaps even more important is that it doesn’t expose anything about what’s contained inside the facet views. In answering questions about brand preference or why a facet worked or didn’t, that seems to me to be a pretty huge hole.
Understanding what’s inside a facet set is really challenging. The challenges start with data collection. Most digital analytics solutions don’t collect this data and it’s not an easy item to fix. The facet search engine is usually a black box, it can be hard to intercept what it is doing, the number of products returned can be huge, and it’s hard to pass back the data in an intelligible format. I’ve talked about this problem before and it’s an issue not just for faceting but for all forms of merchandising analysis. Knowing which products are on a page and in what order is hugely important but very difficult to capture with client-side tags. This is an area where server-side data capture, whether custom or via third-party tools, can be a real advantage.
Using the data is probably even harder than collecting it. The most potentially troublesome aspect of faceted search is that some searches and categories will return hundreds of items (maybe even more). Capturing every item returned is bulky.
String-based data collection will always work but it’s inefficient and clumsy. You could use the UDF strategy I mentioned earlier and one advantage is that you may actually need fewer functions to access this product merchandising data since the variety of query types is smaller. Another alternative would be to store each item as a row – similar in fashion to the way we unpacked the facet string. You might also consider truncation of the data to a smaller set of returned items. We know that usage of internal search results tends to fall off geometrically, so knowing that a product was the 25th (or 101st) in a series is almost completely uninteresting.
One obvious strategy would be to collect the “impressed” items not the returned items. This would tend to keep the number of items captured in the range of 20-30. A negative impact to this strategy is that it would make the study of product narrowing a bit more challenging since products would leap into a subset that weren’t recorded in the original selection but it has obvious advantages and actually brings additional information to the table in that only items potentially viewed are counted.
Another strategy might be to limit the number of recorded options to the top X – where X might plausibly be 10 or 20. In most facet situations, the usage rate of products past the first 20 is vanishingly small and can often be ignored for analytic purposes. The potential advantage to this strategy is that you could consider using fixed columns to capture the position of a SKU as well as the multi-row model suggested above. It shares the disadvantage of product narrowing analysis that the impressed model introduced.
A third strategy I’ve toyed with is to combine of these data reduction strategies and create merchandising sets. A merchandising set represents an ordered, unique collection of 10 or 20 products and is a straight lookup table. The advantage of a merchandising set strategy is that it gives you a very efficient storage mechanism for the facet rows - you just store a merchandising set key with each facet row. If you’re system doesn’t generate a huge amount of variation in product ordering, it might be worthwhile to think about this type of structure, but I fear that for many systems it would break-down under the weight of uninteresting variation.
While all of these strategies are helpful, none give really great purchase on this impression data; indeed, it's a rather difficult set of data to understand and use. In my next post, I’ll consider some more “lossy” types of facet set and facet use summarization that are less appropriate for supporting exhaustive facet analysis but may be more useable for a limited set of common problems.
[I wanted to call out this Linked-in post by EY’s Loren Hadley on “Mastering the Fundamentals of Mobile Analytics.” It’s a terrific intro into what it takes to get mobile measurement right.]
Understanding this in a way that allows for decision-making & analysis is very tricky. I've used something like your merchandising sets in the past but you're right, it's pretty easy to get out of control very quickly.
Posted by: Matt A | April 12, 2015 at 05:59 PM
Awesome post!!! This is what I was looking for!!! I agree! Would love to see this deployed in an e-commerce integration. Thanks for the walk thru on setting up faceted search!
Posted by: Amado Cramer | April 20, 2015 at 01:08 AM