One of the most enjoyable and interesting Huddles at X Change was the session I attended on Predictive Analytics. It was a chance to hear and talk with some of the companies that are really pushing the boundaries of Web analytics, often with advanced data warehouses, sophisticated data mining tools and a small army of dedicated analysts. I’m all for that, of course, but it can be frustrating, even discouraging, to listen to if you’re at a company that simply isn’t blessed with that level of resource or commitment. Is that what it takes to do Predictive Analytics?
In some cases, the answer simply is yes. Most advanced analytics techniques will require a new tool set and a data feed from your Web analytics tool. If your online data volumes are large, that necessarily puts you into the realm of advanced warehousing and systems like Quantivo, Netezza or Aster. It is a big commitment.
Not every predictive analytics application is quite so demanding, however. One of the techniques we discussed in that session – the use of predictive modeling to understand how industry and market forces are shaping your site traffic and performance - is well within the reach of any organization regardless of Web analytics tool or size of the organization.
Trends in your industry and in the broader economy will inevitably have significant impact on a business. Was your traffic up 5% this quarter? That’s great, unless your industry’s traffic was up 10%. Was your traffic down 5% during the great recession? Is that good or bad? Is your site conversion performance down because people simply have less disposable income?
Questions like these are vital, particularly in turbulent economic times. To know whether your actual performance was good or bad, you have to understand what your expected performance was. That expected performance is typically a function of past site history, current marketing data, and econometric or external data.
I’m using econometric data as a catch-all for every sort of exogenous data. For one of our clients (a traffic alerts site), weather turned out to be a critical external factor! For many businesses, key external factors include high-level economic indicators, market movements, industry trends and even competitive marketing spend data.
To build this type of model, you need to collect likely econometric and external data, past campaign data, and site history for key metrics like traffic, campaign-sourced visitors, conversion rates and revenue per visitor.
What type of econometric or external data is appropriate? There’s no one good answer.
If you’re in the housing industry, you’re likely to look at measures like housing starts, interest rates, new and existing home sales and median prices, economic leading indicators, consumer confidence measures, stock indices, REIT indices, etc.
If you’re a B2B marketer, industry plan data from companies like NPD, stock baskets, PPC search term costs, and brand awareness tracking numbers might be predictive.
For almost any company, it’s worth looking at least one or two broad econometric variables like consumer confidence, hiring plans, or stock market indices. It’s essential to include your own marketing spend data – since nothing will have as direct or dramatic an impact on your site performance as marketing spend. You’ll typically want to include that spend data either as dollars/day or in terms of the core measurement for the channel (GRPs, impressions, etc.). Either method works. If you can get (or estimate) competitive spend data all the better.
Getting all this data isn’teasy. The best sources are those with publicly available data updated for any time point. Stock market indices are good example. You can get stock market prices for any date and for any date range, making it much easier to integrate the data with site metrics. Research numbers tend to be published on a less frequent basis – monthly or quarterly. The more often the numbers are published, the better from the standpoint of the analysis.
Many sites only have a couple years worth of reliable data. If you are trying to correlate quarterly data, you may only have 8 data points – not enough to work with.
Which brings up another interesting point about this type of analysis – the incorporation of time is essential. Some econometric or external factors will correlate in real-time with your site data. The case I mentioned above - weather for a traffic site - is a good example.
Many external factor function more as leading indicators of performance. Unless you’re a brokerage (and based on our studies not even then) you wouldn’t expect stock market performance to correlate in real-time to your key site metrics. Short-term jumps and dips in the market are just noise when it comes to the broader economy. Your times periods have to be long enough to have a reasonable chance of impacting the model.
Which, as you can see, leads to one of those frustrating analytic balancing acts: your time periods need to be long enough to be significant but short enough to generate sufficient data points for analysis. It’s a trade-off for which there is no one right answer.
Incorporating time doesn’t just mean picking the right unit (days/weeks/months/etc.). If the external factors are leading indicators, they may not correlate at all with site metrics from matching time periods. Many basic analysis techniques simply line up data points and look for relationships at a set of fixed periods of time. That won’t work well here. You’ll want to analyze each of the factors as a potential leading indicator across multiple time periods.
This does take a little bit of work, but the good news is that it doesn’t require any huge investment. The real beauty of this analysis is that the number of data points is inherently quite small – even if you’re tracking at the daily level for 5 years, you’re only dealing with a few thousand rows of data. You can pull site metric data into an Excel spreadsheet or small flat file, do the joins to external data on a PC, and conduct the whole analysis on a laptop computer.
About the only thing you’ll need is a good statistical analysis package (you could use Excel but I’d recommend something a little richer). SAS and SPSS are a little bulky but both run on the PC and can easily handle this type of work. You might also consider something like JMP or Statistica.
At Semphonic, we’ve done more of this type of work for clients in the last few years than in the past. That reflects a couple of factors: the growing sophistication and maturity of the market, improvements in thinking about Executive Dashboarding and management reporting, and, of course, a difficult economic climate that often has effects on site performance that are too obvious to ignore.
Of all the types of predictive analytics we discussed at X Change, this type of project is the easiest to tackle. It doesn’t require a data warehouse or Web analytics data feed. It doesn’t take expensive tools or a large team. It should be interest to almost any organization and it can significantly improve the quality of your management reporting and your analysis of site and marketing-spend success.