Overview
Sophisticated organizations are increasingly finding good reasons to move data from their web analytics tools to other data processing and analysis platforms. In this series, I’ll be discussing the “why” and “how” of taking data from your web analytics solution and moving it into other platforms. This third installment will cover capturing usage trends.
In the first installment of this series, I covered some of the reasons why companies increasingly want to move data out of their web analytics solution. When they do so, the biggest challenge they face is finding the right data models to let them use the data effectively. Web analytics data is much too large to use conveniently in just any format – so you almost always have to do some significant aggregation on it. This is as true inside the web analytics tool as it is outside. But the type of aggregations you’ll probably want outside the tool are very different that the choices made by vendors to support reporting inside the tool.
The aggregation methods (or data models) I’m going to discuss today are relevant to all of the different uses of web analytics data: joining web data to secure customer data, driving actionable systems, building complicated automated reporting systems and supporting analysis in more sophisticated statistical tools.
Capturing Usage Trends
One of the most basic, interesting and actionable pieces of information you can have about a visitor is whether their usage, interest, or commitment to your web site, key tools, product, brand, etc. is going up or down.
It seems like such a simple question, but it turns out to be much harder to capture than you might expect. In everyday terms, the idea here is simple. Is a visitor doing more or less of some key activity?
Simple yes. But in this simple question lurk two difficult and ambiguous issues. What is the “before” you should compare to, and how can you quantify degrees of “more” or “less”?
I’ll deal with visits in this post, but everything I’m going to talk about is germane to any trended behavior from purchases to page views to measures of engagement. Most of the ideas I’ll discuss are borrowed from the discipline of Technical Trading. Years ago, I worked with a company that built software for momentum traders. The techniques used there are designed to answer exactly these two issues in a way that is both clear and compact.
The two most common trending measures used in web analytics reporting are current period vs last period and current period vs. year over period.
Though both these measures are useful at the aggregate level, neither is really appropriate at the visitor level. Using a single period for comparison at the visitor level will cause far too much noise as visitors bounce up and down repeatedly. You can reduce the noise by extending the length of the period you’re looking at, but you do so at the cost of reducing your ability to detect actual changes.
To combat this over-volatility, you can use an average. Comparing current period to visitor average will remove most of the volatility. Unfortunately, this technique will usually reduce too much volatility – especially for visitors with a long record of behavior. What’s more, it tends to overstate volatility in some visitors (who have a short-track record) and understate it in others.
Consider a visit-count per month for a visitor that looks like this:
5 4 5 6 5 4 6
This visitor has seven periods of behavior and an average of 5 visits per period. Now suppose the eighth, ninth and tenth visits look like this:
10 11 7
The eighth period will show a very strong uptick (appropriately). The ninth period will also show a strong uptick – that’s probably okay. But the tenth period will also show a strong uptick – which may not be ideal. Given an even longer tail of behavior, it can take a long time for an average to adjust.
To address this issue, technical analysts often use a moving average. The moving average is built from the last n periods of data, so for each new period data to calculate the average is dropped from the oldest period and added from the newest.
In our series above, we might use a four period moving average to yield a downward trend for the 10th period:
The moving average can be further fine-tuned using a weighting system. In most cases, you’ll weight recent behavior most and decline the weights the less current the behavior. A weighted moving average is even more sensitive to current periods versus past history but, done properly, it can achieve a nice balance between responsiveness to real trends and hypersensitivity. It is unusual (most weightings are either linear or exponential over past periods), but a weighted average can also be used to capture elements of both current and seasonal influence.
The values in the Visitor Trend I and II above are the types of values you might store in a visitor record about usage trend (be it around visits, or engagement or purchases). In effect, they reduce a fairly abstract question "is a visitor doing more or less" to a single number.
Usage difference from moving average is a compact and efficient way to represent a good answer to our first issue – is a visitor doing more or less of some key activity. However, it isn’t really a good answer to our second issue – quantifying the degree of more or less. Using the raw difference and comparing between visitors can be misleading because the range of variation for different visitors with different amounts of history can vary greatly.
I’m going to borrow another concept from Technical Trading to assist with this problem. The technique is called Bollinger Bands (named for John Bollinger who first proposed this method). With Bollinger Bands, there are three data points captured for each period. The middle data-point is the simple moving average covered above. The low data-point is calculated by subtracting one or more standard deviations from the middle data-point. The high data-point is calculated by adding one or more standard deviations from the middle data point.
The low and high data points form a band around the middle point which is the moving average. Technical Traders use Bollinger Bands to identify a commodity specific meaning of “high” and “low”. To a trader, when a value breaks the band it probably means they should bet in the opposite direction – since they expect the value to come back within the historical range. For our purposes, however, the bands can help demarcate truly significant changes in individual behavior.
Banding is particularly useful when you are measuring short-term volatility against a relatively rich set of data points (as is generally case with registered visitors on media sites, banking sites, brokerage sites, etc.).
Technical traders have many, many different techniques for understanding momentum and direction of change. I’ve covered only some of the simplest here (though I believe that simplest is probably appropriate in most cases). Applying these simple momentum concepts to your visitor-level aggregation data can give you a powerful visitor-level way to measure trends in individual behavior.
I also hope that it's obvious from this discussion that building data models for your aggregated data is not an exercise to be left to a programmer. Not only will a technical person typically fail to even consider this type of aggregation, but they will have no basis for making intelligent decisions about which type of average or weighting scheme is actually appropriate.
Whether you are studying retention issues, engagement issues, program impact issues, or simply trying to improve customer contact effectiveness, knowing whether you are gaining or losing mind-share with a visitor is essential knowledge!
I've always thought technical trading models were a great way to explain how to create, understand and take advantage of behavioral analysis. The Future Intent to Purchase Score (FIPS) model we used at HSN was essentially a "Bollinger Bands" type of model with a rate of change component.
Never though of applying same to web analytics ETL though - makes a ton of sense, Gary. Great stuff!
Posted by: Jim Novo | November 17, 2008 at 03:46 PM