I had a rare weekend where I missed posting. Some might attribute this to Memorial Day festivities. You might think that being from Indiana I was perhaps too engrossed in the rejuvenated Indy 500. But, in fact, you can take the boy out of Indiana AND the Indiana out of the boy since I really could care less about auto-racing. Instead, I had succumbed to our not infrequent summer in San Francisco colds. All well again (almost) and on to a new topic – building true visitor segmentations for online properties.
Before I dive into that, I should mention that we are going to be announcing the X Change line-up this week – so stay tuned for that – and if you haven’t yet registered for X Change now is the time – especially if you want to get a room at the Ritz! If you are serious enough about web analytics to read this blog, then X Change is the perfect conference for you and you really should go.
In traditional web analytics practice, visitor segmentation has a narrow and, to the rest of the marketing and analytic world, rather misleading usage. Visitor Segmentation is the name we give to the filtering tools that allow analysts to isolate the behavior of a group of visits or visitors meeting some set of behavioral criteria.
This is segmentation. But it isn’t what most marketers think of as segmentation. In the broader marketing world, a segmentation is a complete classification of customers and/or prospects. It is a powerful marketing tool that helps define creative approaches, offer strategies and acquisition sources; indeed, it influences nearly every aspect of a marketing program. It is typically derived from a combination of demographic and psychographic information based on primary research.
Nearly every company that is at all serious about its marketing has done this type of segmentation. Unfortunately, these traditional segmentations have proven less useful in online marketing than might have been expected. The main problem has been mapping site visitors to traditional segmentations. The variables (demographics and psychographics) simply don’t exist at the same level as the behaviors. So while segmentations can be used to help target and design online properties, there’s generally no way of integrating the visitor segmentation into the web reporting or success measurement.
It’s a big issue, and it’s one we’ve tried to solve from both ends of the problem.
The first approach we've tried is based on a set of extensions to the classic strategy for building a segmentation.
To build a traditional segmentation, you start with audience identification. Do you want a segmentation of customers or prospects or some combination of the two? You may even want a known subset of possible prospects (if you’re designing a site for dog lovers you might start out limiting your segmentation to dog owners). Once you’ve identified your target audience, you choose the best method to reach and question them. Often, primary research of this type is based on pre-existing panels. Sometimes, existing lists (of customers or of specific prospect types – like registered dog owners) may be used to simplify targeting. In most cases, research begins with a small set of qualifying questions. If a respondent is qualified, then a much broader set of questions are asked that cover key demographics and attitudinal information related to the product, site, campaign or company.
Typically this respondent data is then analyzed using common statistical clustering techniques. Clustering techniques are data-driven segmentations – they group survey respondents together based on their answers to all the questions. This type of segmentation is much more complex mathematically than rule-based segmentation – it is machine-driven scoring across dozens or even hundreds of different variables. This process creates a score or set of scores for each respondent that assign them to a unique group. In this way, the entire universe of customers and/or prospects is assigned to a segment.
Next comes the really hard step. Using the segment assignments, the analyst goes back and profiles each segment across all of the variables used in the analysis. In effect, each segment will have something distinct in its profile (some constellation of variable values) that caused a visitor to fall into the segment. For some segments, specific behaviors dominate. For others, a range of behaviors at varying levels may define membership in the segment. In any case, the analyst must translate these mathematical groupings into an understandable and usable marketing segmentation. It is in this process of naming and describing the segments that most of the real art in a segmentation lies. In effect, the analyst is interpreting the machine driven segmentation back into a language that the marketing organization can understand.
At this point, a traditional segmentation is pretty much finished. Unfortunately, the work of applying it to the online world is just beginning. None of the variables used to build the segmentation (demographic or psychographic) typically exist for any site visitor. So while we know that our visitors must (by definition) each belong to some group in the segmentation, we have no way of knowing which visitors fall into which groups or what the percentage of any given group is likely to be among site visitors. This lack of translation severely limits the usefulness of the segmentation.
In fact, the second of these problems (the % of site visitors that belong to a group) is quite a bit easier to solve. If the site has high-penetration among the customer-base or site visitors were used as a driver of the sample, then the distribution of visitors by group may already be known. If that’s not the case, then you can deploy an online survey instrument to collect data from site visitors and match it to your segmentation scheme.
This seems trivial, but there’s usually a hidden step here. Your primary research question set is often quite large – you may ask forty, fifty, even a hundred or more questions. That's more questions than is typically practical in an online flyby survey. If you can’t ask all the questions in your survey, you can’t simply replicate the research. Instead, you have to look for a reduced set of “golden” questions that drive most of the segmentation. In practice, it turns out that a reduced set of questions can usually be discovered but it is always an open question how greatly the survey can be reduced and how much granularity is lost in the process.
Adding the online survey component will answer the question “what percent of site visitors belong to each group in my segmentation.” What it still won't do, is allow you to assign any online visitor to a segment. For that, another set of steps is necessary.
With the online survey you have, in theory, established a join between the online data and the primary research data. You have a group of online respondents that you can categorize into your segmentation scheme and you have, if you’ve done it right, the entire set of online behaviors for those respondents in your web analytic tool. If you joined the two (typically by capturing a survey id in the web analytics tool or a visitor id in the survey tool), you now have all the data you need.
With this new data set (all online survey respondents, categorized by segment and containing their web behavioral data), you can create a behavioral profile of each segment. If you want to just color your primary segmentation with web behavioral data, that’s all you’ll do. You’ll take each specific segment (suppose we have a segment of dog lovers who care only about a specific breed – perhaps we’d call them “Purists”) and simply map their site usage vs. the site average. Perhaps “Purists” come to the site less often, don’t use Search, are more social, and are more likely to respond to offers from “breed-specific” companies. There is some art to this. The choice of descriptive behavioral variables is more difficult than it might first appear. Traditional variables like pages viewed, time on site and visits will rarely prove the most informative (I’ll talk about this in a later post).
If, however, you want to actually apply your segmentation scheme to all online visitors, you have to go one big step further. You need to find purely behavioral variables that will predict segment classification. This step turns out to be – at best – extraordinarily difficult. The data is all there. But it is my experience that making this link – from behavior to the demographic and psychographic segments does NOT work well. And if it doesn’t happen to work well, then you are stuck - because there is no other way around this problem. You simply won’t be using your segmentation scheme to classify online visitors.
Which brings me back to a point I mentioned earlier – that Semphonic has tried solving this problem in two directions. The other direction is to begin with a behavioral segmentation and then add in demographics and psychographics. In theory, this approach will have exactly the same issues. And, in practice, it sort of does, but it often seems to turn out better. In my next post, I’ll talk about the steps involved when you begin with a behavioral segmentation and I’ll try to explain why I think it usually ends up working better. After that, I plan to dive down into more specifics about how the process – including the join between survey research and behavioral data and the (rather complicated) process of doing behavioral segmentation.
Gary,
An excellent post on a difficult issue -- the bridging of behavioral and traditional segmentation. I did have a couple of comments and thought I will pass them along -- never been shy on that account :-)
I am still a novice in the web analytics area (learning a lot from the gurus in the field) and so my apologies in advance for any misinterpretation of your statements or erroneous conclusions. I will admit however, that I have done my share of traditional segmentation and other analytics.
My first question is why would anyone one want to predict traditional segments using online behavioral data or vice versa. For one, (in my opinion) in addition to the points you mention in your post, segmentation is also driven very much by what you are trying to accomplish with the segments you will end up with (using cluster analysis/Neural Networks or any other method). The other reason being that I have always percieved the online behavioral data as an extension of the data sources used in the traditional segmentation (much like data from any other channel). From this end, I have always strived to improve the traditional segmentation to the point that it can also explain and be used for online Marketing. Again, my perception could be misplaced.
When it comes to segmentation using online information, I too have tried to approach it from various directions. The first one entails pooling all the data about a visitor together from all sources (online, demographic/psychographic, firmographic, transactional etc.) and then picking relevant variables (either manually or using the variable selection feature in data mining tools) based on the objective of the segmentation. Once you have your sample with all key variables, you can use either cluster analysis or one of the many methodologies to create segments. And lastly, as you mention generate the profile and appropriate description/label for those segments.
The second approach I have tried is to go through the behavioral data using the same approach as traditional segmentation (using similar methodologies as cluster analysis). Once this was done, and pure behavior based profiles completed, I used the traditional segments and the demographics/firmographics aspects to sub-profile the segments. This was done less to see the overlap between the traditional and behavioral segments but more so from a contact strategy perspective.
The catch to the above approaches as you point out is the ability to link behavioral information at the visitor level with other information. I do believe this is not easy but definitely possible It will depend on what kind of customer infrastructure you have in place. If you are lucky, your company might have an individual level link between your online visitors and your traditional data. So for example, you will know that John who came to your site and did XYZ also interacted with these other channels, used so and so services/products, belongs to this industry, is a SOHO, belongs to a company with $$ sales, and if he was part of a survey then he responded in a certain manner etc. etc.
Anyway, just some random thoughts on the segmentation topic. Please feel free to shoot holes in my approach or thinking.
“To know yet to think that one does not know is best;
Not to know yet to think that one knows will lead to difficulty.”
- Lao Tzu
Posted by: Ned Kumar | June 04, 2008 at 12:37 AM
Hi Gary,
Very interesting post. I look forward to reading the sequel posts.
I realise the great value we would derive from combining “traditional” segmentation models with online behaviour. It is the next step in marrying the ‘what’ with the ‘why’. Those companies master this integration of research source stand to gain a significant advantage over their rivals.
I agree with you that applying a segmentation scheme to all visitors would be extremely challenging. I think the most we can realistically aspire to (given current technologies) is a representative sample by joining the survey data to the behavioural data.
However, I assume that in most cases the “joint” between the survey ID and the web analytics data is a cookie. With time this joint will suffer degradation as visitors delete their cookies.
So you are left with two main options:
1. Continuously run the survey (which might annoy site visitors unless traffic levels are extremely high so you can keep launch rates relatively low)
2. Use your sample data periodically for a relatively short period of time(until the next time you run the survey)
Nielsen Online offer a product, Market Intelligence, that joins web analytics with online survey data. Unfortunately, the web analytics capabilities are pretty basic. Nonetheless, MI enables you to combine the different survey variables to create segments on the fly. You can then superimpose the selected segment on the top level web analytics behavioural data.
I’ve used MI to collect mainly demographic and some psychographic data on the biggest media websites in Israel. It was fascinating. We got over 10,000 survey completes within days of launch. However, within two months the panel had to be refreshed. The refresh wasn’t hard to do technically but did raise issues of data consistency with some people.
Instadia, which was bought by Omniture in early 2007, was another web analytics tool with integrated survey capabilities. I believe it was Omniture’s intention to introduce this product (under the name ClientStep) as a SiteCatalyst plug-in some time ago (I last spoke to them about it around November 2007) but cannot remember any announcements made.
Is the cookie deletion problem something you’ve encountered as well? Would love to hear more about your experience with this matter.
Thanks,
Michael Feiner
Posted by: Michael Feiner | June 04, 2008 at 04:48 PM
Michael,
Great point. Cookie deletion is a big issue - not just with the join - but with segmentation in general. The loss of long-term tracking data significantly limits the reach of behavioral segmentation. This is hardly a problem unique to visitor segmentation but it certainly does have an impact. A behavioral segmentation solves the problem of applying the segmentation to all visitors (but doesn't resolve the cookie issue). But behavioral segmentations have their own issues - which is what I'll be talking about next.
Thanks for the thoughts!
Gary
Posted by: Gary | June 06, 2008 at 12:46 PM
Ned,
That's an interesting point about whether it makes sense to try and predict traditional segments from online data. I think it does make sense - it's just not always possible. Why would you want to do this? From my perspective, I'd like to be able to incorporate the company segmentation into my online reporting and analysis. When a mass-media campaign is running (for example), I'd love to be able to say that it drove Segment X more than Segment Y and Segment Z not at all (or some equivalent story). This is really interesting stuff - and applicable across a wide range of reporting needs. At some level, I'd also love to be able to incorporate these segments into analytics. So if I'm studying the impact of a site tool, I could say it worked well for Segment X but not Segment Y. That's potentially both interesting and useful. However, I typically can't do any of that unless - from behaviors - I can infer what traditional segments visitors belong to; I have to do this because I don't have the survey data to apply to tool users or campaign sourced visitors. Make sense?
It's really interesting that you've approached this problem from both directions just like we have. What's been your experience about which direction worked better? I'd love to compare notes sometime. I think this is one of the most interesting and challenging tasks in web analytics. And I believe that lot's of companies could significantly improve their online segmentations.
Thanks for the thoughts!
Gary
Posted by: Gary | June 06, 2008 at 12:54 PM
Gary,
I think we are on the same page but just coming at it from different semantics. In terms of incorporating the company segments into online reporting or observing which segment(s) benefited from an online campaign -- I am completely in agreement with you. I just never thought of it as 'prediction'. Maybe because I had the luxury of not having to work with cookies but actually had a unique visitor id for most customers which I could use to create the link between the online data and traditional/transactional data and segments.
Ned
Posted by: Ned Kumar | June 06, 2008 at 02:10 PM