Using Hierarchies to Classify Visitors
The Semphonic website is quite a simple one. At a guess, I’d say it probably only has a hundred or so pages on it - certainly less than two hundred. That’s way smaller than most of our clients’ sites – with thousands, tens-of-thousands and even hundreds of thousands of pages.
But even our small web site has many of the problems in analysis and organization that make content hierarchies such an important capability in web analytics. First, and most important, we have more than one business line. It’s actually quite rare to find a web site that has only one focus – that implies a company with a single product and no additional online services or support options.
It's the simple fact that most web sites have very distinct groups of content designed to appeal to and work for distinct types of visitors that makes content hierarchies essential.
For us, a significant portion of the site is devoted to CampaignTracker – our PPC and SEM reporting tool. If you looked at our website, you’d probably think that’s a big part of our company. But, in fact, it’s overrepresented there because the website is the primary vehicle for CampaignTracker sales but is a very secondary sales-support vehicle for web analytics consulting.
When I want to look at visitors to our web site, it’s essential that I understand what mix of content they were interested in – and I particularly want to understand whether they are a CampaignTracker buyer or a potential consulting client or both. Classifying visitors is essential to measuring web site success – and, of course, to measuring the success of individual pages and areas.
Simple visitor segmentation has usually focused on positive visitor cues – classifying a visitor as a CampaignTracker when and if the visitor views a page in the CampaignTracker content area. Likewise, a visitor might be classified as a consulting prospect when and if they view a page on analytics consulting.
Unfortunately, such an approach rarely provides a good visitor segmentation. On our site, the relationship between visitor interest in consulting and CampaignTracker happens to be asymmetrical. Most CampaignTracker prospects aren’t interested in consulting. But many – perhaps even most consulting prospects – will also check out CampaignTracker. For many other sites and products, however, the interactions between multiple product lines are too rich and too varied to capture with such a simple statement. And when you’re faced with trying to build useful visitor segments on a large site you are almost always faced with one additional problem that I discussed previously when talking about epiphenomena.
The problem is that heavily engaged users of your site will show up (and often drive the statistics for) virtually every area of your site. For publishing clients, a small segment of heavily engaged users inevitably show up in every single content area. And the smaller the overall usage of that area, the more the heavily engaged component influences the results. So one group visitors can actually be coloring lot’s of different segments – and significantly misleading the analyst about the nature of that area’s impact on the site and its usage.
As the number of consulting visitors on our site increases, this may mean that the measured behavior of my CampaignTracker pages will be increasingly influenced by traffic sourced with a different sales goal in mind. Since the traffic is largely direct, there is no simple way to segment it by source. That means I have to rely on behavioral cues to understand what the user was interested in.
What I’d like to be able to do is classify visitors based on both their absolute and their comparative interest in a content area. In other words, it’s nice to know that one of my hardcore engaged users looked at three pages about CampaignTracker. But when I know that he’s looked at 400 pages about web analytics, I get a better perspective on what kind of visitor I’m talking about.
When you get right down to it, no visitor segmentation scheme is going to be very robust unless it can capture both absolute usage and relative usage. Most decent products out there provide absolute usage segmentation – but relative usage is much less supported.
This ability to classify visitors based on their percentage usage of an area is extremely rich. Not only does it provide the best way to classify visitors by interest, it also lets you identify special kinds of customers. In a transaction site, for example, it’s much more interesting to know the ratio of customer support pages to transactions than it is the raw number of customer support pages. The same is true when you are tying to build "interest-baskets" on a publishing site. If you just look at how many visitors who looked a X also looked at Y, you’re likely to get lots of navigational "noise." You’ll no doubt discover, for example, that your content is highly related to the home page. But if you can analyze related content by percentage of mind-share, you are getting a much richer view of how visitors actually behave.
If it seems like a discussion of Content Hierarchies has devolved into another post on visitor segmentation, that’s because Content Hierarchies are primarily an enabler of analysis. By providing rich methods for grouping and using groups of content, a web analytics system can greatly augment your reporting, segmentation and general analytic capabilities.
After all, crisp segmentations often shed a whole new light on a wide range of measurements. And no capability is more important to producing genuinely crisp visitor segmentations than the ability to define segments based on their relative usage of content areas.