In my last post, I laid out a basic theory of how Web analytics works: Web analytics methods combine an assumption of intentionality and some mapping of the “natural structure” of a Web site.
The assumption of intentionality says nothing more than that a visitor’s navigational choices on the Web site – where they go, what they do and how long they spend there – reflect their actual interests.
On the other hand, how visitors traverse the Web site is controlled, to some extent, by the options and paths you provide. A Web site is not a virgin field free of paths. It has a structure – like a town – so that the visitor is encouraged or even forced to travel on certain paths to reach a destination. It’s quite rare – on a Web site or in a town – that you can go directly to the destination you have chosen. The presence, placement, and design of link paths on a Web site create a natural structure that tends to push visitors in certain directions.
It should take only a moment’s reflection to realize that there is a fundamental tension between these two basic principles. If we don’t take account of the structure of a Web site when we examine behavior, we are highly likely to misread intention.
In reflecting on that last post, it seemed to me that there was an interesting objection to the principles I described that was worth discussing in some detail.
It seemed to me that an analyst might well counter that it’s perfectly possible to do statistical analysis on the Web site – to correlate pages viewed to outcomes for example – without either of these principles or any other intellectual superstructure.
In fact, this is precisely the tack I see most statistical analysts take when they work with Web data. They correlate page with outcome and conclude that the pages with high-correlation to the outcomes are the ones that matter in producing those outcomes.
Of course, it doesn’t work. It’s not that the correlation doesn’t work. Correlation is simply a statistical technique and, for any given Web site, a correlation analysis will certainly show significant correlations between some pages and some outcomes. Unfortunately, correlations come in two forms: meaningful and meaningless. Meaningless correlations are still correlated – it’s just that they don’t tell us anything interesting about the world.
On a Web site, correlations are driven by both of the two fundamental principles I described. If, to place an order for Product X, you have reach Page Y to find the “Buy” button, then Page Y (and every Page necessary to get to Page Y) will be highly correlated with “Buying Product X”. The stronger the natural structure of the Web site, the stronger the correlation. Where it is impossible to achieve the outcome without viewing the page, the correlation will be perfect and, perfectly meaningless except as a mapping of the natural structure of the Web site.
Is correlation a good technique for creating a natural mapping of the site structure? Not really.
Since visitor intentions are also shaping a significant part of the navigational behavior, only a part of most correlation scores will be attributable to structure. Even where a correlation is perfect, some part of the underlying behavior is intentional not structural.
Simple correlation analysis does nothing to separate out the impact of natural structure and visitor intention – making it almost impossible to interpret and, therefore, almost completely useless.
I can’t tell you how many times I have seen professional analysts who, when first working with online data, delivered conclusions of this sort. You probably know the kind of thing I’m talking about: Viewing 5+ pages is highly correlated with conversion (on a Web site where it’s impossible to convert in less than 4 pages), spending more than 4 minutes on the Web site is highly correlated with conversion (on most web sites, conversion takes time), product Detail Pages are much better converters than Category Information Pages (Duh), the Privacy Policy is causing visitors not to convert (negative correlation), etc., etc.
It was to protect analysts against this type of analytic trivialization that we first created Functionalism. The idea behind Functional analysis is simple. Really simple. Every piece of a Web site is designed to fulfill some particular set of needs and the best way to measure the effectiveness of each particular piece is with respect to the needs they were designed for. That’s it.
To facilitate functional analysis, we created a whole set of page types along with appropriate measurements for each type.
A page designed to move visitors to the right place in the Web site (section fronts or category min-homes are common examples), we described as a Router. A Router can’t be measured by correlation to leads or sales, because most of the actual sales work is being done by subsequent pages. Instead, it should be measured by the percentage of visitors it moves into appropriate content.
A page designed to sell a product we described as a Convincer. A Convincer can (and should) be measured by correlation to sales. But the correlation is limited to the class of Convincer pages for a Product. This is the apples-to-apples comparison that restricts the correlation analysis to pages that are roughly equidistant from the outcome and, thereby, controls for the natural structure of the Web site.
Likewise, a Thank You page is 100% correlated with Product Sales, but, of course, the correlation is entirely meaningless. Thank You pages in the Functional scheme are measured by controlled re-engagement – how many visitors were they able to steer back into some kind of site flow. This can still be done as a correlation, but it’s to an entirely different set of outcomes.
It should be obvious from even this very perfunctory overview of Functionalism that it captures both Intentionality (Page Type classifications map to a designer’s view of visitor intention) and natural structure (measurements are restricted to class – preventing most egregious uses of statistical analysis).
One weakness of Functionalism should also be apparent. It’s based on the designer’s view of visitor intentionality. If the designer has misunderstood what visitors want to achieve on the Web site, nothing in Functionalism is likely to unearth that fact.
Whatever Functionalism’s weaknesses, however, it is a dramatically better method for online analysis than the unthinking application of statistical methods to online behavior.
I’ve found that Marketing Managers are often bitterly disappointed when they see the results of analysis projects that come from their advanced statistical teams. The same people that managed to create sophisticated direct response or media mix models often return Web analytics results that border on the inane. The problem isn’t with the methods of statistical analysis. The methods are valid and powerful, but before they can be unleashed, the online data needs to be framed appropriately. Techniques like Functionalism (and others such as Behavioral Use Case analysis) are designed to do just that.
It is in this melding of Web analytics framework and advanced statistical analysis that the future of online measurement lies. Which is why, in my next post, I’ll resume my original tack and take a look at traditional database marketing and show how, at a very basic level, the core principles are actually quite similar to those I've described for Web analtyics.
Great post Gary. I've seen these same things before, where a statistician takes two or three months to do an amazing in-depth analysis based on the visitor-level data, and comes out with findings that are either obvious (a lot of people see the homepage ...) or that we could have answered in a few minutes with an advanced segmentation tool.
I think what it comes down to, and what your approach identifies, is that you can't go off in a silo and conduct statistical analysis of web data without understanding the data, the site, or what it's telling you (and expect to generate anything interesting.) Statistical knowledge PLUS business knowledge is so crucial, and if you have both, you can use statistical methodologies for great things with web data.
Posted by: Michele Hinojosa | January 26, 2011 at 08:36 AM
I think that the underlying theme here is still get the right metrics for the right pages, call it what you will.
For instance, a Router page. If that page is a product index, is it good or bad to drive people to product pages. If the index is designed so well they don't need to see a specific product page (Convincer) then one could actually see Router success events going down - and have it be a good thing.
Know your site, define your goals and KPIs, and then measure, test, improve. And keep kicking the tires on new ideas and approaches to measurement to see what happens! Great post.
Posted by: David Schuette | February 01, 2011 at 06:05 AM
I think that the underlying theme here is still get the right metrics for the right pages, call it what you will.
For instance, a Router page. If that page is a product index, is it good or bad to drive people to product pages. If the index is designed so well they dont need to see a specific product page (Convincer) then one could actually see Router success events going down - and have it be a good thing.
Know your site, define your goals and KPIs, and then measure, test, improve. And keep kicking the tires on new ideas and approaches to measurement to see what happens! Great post.
+1
Posted by: how to check if a domain name is available | April 11, 2012 at 04:09 AM