Overview
Sophisticated organizations are increasingly finding good reasons to move data from their web analytics tools to other data processing and analysis platforms. In this series, I’ll be discussing the “why” and “how” of taking data from your web analytics solution and moving it into other platforms. This first installment will briefly cover some of the key reasons for moving data into another platform and introduce the “aggregation problem” that makes this process challenging and with which subsequent posts will struggle.
Why Move Data at All?
Users of web analytics tools tend to fall into one of two great camps: those who cannot understand why you’d ever need to move data out of such a great reporting solution; and those who cannot wait to press the button that will transfer the data!
Like most such issues, it all depends on your point of view. But I’ve noticed that the more mature a company is with web analytics, the more their sophisticated users tend to fall into the second camp. There are really four great themes that drive web analytics users from the “happy to be there” camp to the “let me out of here” camp.
Join to Secure Customer Data
For many of our clients, particularly in sectors like Financial Services where there is a great deal of highly secure customer information, the main reason to get online data out of the web analytics tool is to marry it up with customer data. That customer data includes almost everything that really matters about a visitor (account types, sizes, balances, relationships, demographics, lifestage, customer-care reports, etc.) and it’s hard to do compelling analysis or reporting of many online behaviors without at least some of that information. Nobody wants to move that data out to a SaaS vendor, and even where they want to, there’s a pretty good chance their privacy policies won’t allow it. So the only real alternative is to bring the data inside the firewall where it can be married with all that other stuff we really want to know.
Exposing Web Data to Actionable Systems
Current web analytics tools are for reporting and analysis. They aren’t designed as the control center for eMails, offer testing, direct mail, site personalization, CRM control, etc. It’s true that the level of integration is increasing in products like the Omniture Suite, but more often than not the integration level is about reporting, not control. This isn't necessarily a bad thing, just a common fact of life about information systems. For many multi-channel businesses, the biggest opportunities for using web data aren’t necessarily online and it’s a pretty good bet that the business already has a range of systems for these tasks and they DON’T integrate to any web analytics solution. Getting web analytics data into actionable places can drive more value to a business than almost any form of classic web analytics. So it’s no wonder that where opportunities of this type exist, they can become the main focus of a web analytics program.
Getting Analysis that can’t be done in the Tool
Earlier this year, my colleague June Dershewitz was working on a “Data Integration” presentation. We frequently rideshare to the office, and we began discussing why companies might want to move data out of a web analytics solution and into a data warehouse, statistical analysis or BI platform. We talked about the first two reasons already listed, but we also knew there were some measurement tasks that can’t be easily done in a web analytics tool (even the fancy ones) and we started to list those. After about five minutes, June called a halt to this since our list had grown far too long to fit on even a very dense powerpoint slide. I just did an extensive series on behavioral segmentation, and that’s a good example of the type of analysis that just won’t fit in a web analytics tool. But time series analysis, predictive modeling, scoring, and re-keying visitors are other common analytic tasks that just can’t be done in today’s web analytics tool.
Automating Complex Reporting Tasks
Yes, web analytic tools really are good for reporting. But they don’t meet every need. No web analytic tool available provides the kind of access to your data that a SQL database does. Moving data into and out of web analytics systems is cumbersome. Data transformation inside web analytic systems is either non-existent or very limited. What’s more, we find that clients that have complex distributed reporting needs simply aren’t supported by the level of automation available in web analytic tools. If you need to automate and distribute hundreds of customized reports to affiliates, franchisees or content contributors you are really going to struggle with a web analytics tool. And while the top web analytic tools provide reasonable export to Excel, Excel is not equivalent to being able to deliver complex customized reporting of the sort you might get from products like MicroStrategy or Business Objects.
In short, there are quite a few compelling reasons why your SaaS web analytics vendor isn’t a complete solution for your web measurement needs. It’s also important to realize that each of these different needs drives a distinct set of requirements for moving and transforming web analytics data. They won’t all use the same types of data or transformation techniques and they won’t all want to move data to the same kind of place.
The Aggregation Problem
Fortunately, moving data from one place to another isn’t really all that challenging these days. The internet makes transfers of large amounts of data quite easy. The costs of data storage and processing power are amazingly low. There are even convenient ways to “rent” these assets for specific projects or to handle growth. So the biggest challenge to moving web analytics data into other systems isn’t specifically technical.
The real problem is that web analytics data as it is generally structured doesn’t easily “fit” in most of the target destinations and systems you are likely to want.
By fit, I don’t mean just the raw size of the data (though the enormous amount of data produced by web sites greatly complicates the data modeling problem) – I mean that the data structure of the raw web data isn’t well suited to most real-world tasks.
When you get a true data-feed from you web analytics tool, what you’ll get is page-view level data (or server-call level data if your site is more ajax/flash). Each row of data will contain a timestamp, a visitor and session id, a bunch of web environmental information like the user agent and referral, the state of any custom variables, and, of course, the page or event name. It can be pretty broad. A standard feed from Omniture has something like 450 columns!
This page-level data is okay for some reporting automation tasks. You can move it directly into a SQL database (dropping a few hundred columns) and use it. But for joining to your customer database, exposing web data to actionable systems or doing complex analysis in tools like SAS or SPSS, the data model is all wrong.
In your customer database, you probably have 1 row per customer. Maybe that row keys into other tables that have multiple rows. But ideally, you want to fit most of the information you need about that customer into a single row. If you simply move all the page level data for a visitor into your customer database and join on a visitor id (assuming you’ve passed one to the web analytics tool), you’ll have a data model that’s pretty much unusable. It's too much data about each visitor in a structure that makes it too hard to use. Instead, you want to aggregate that data so that all of the interesting web behavior can be captured in a single row for each visitor or at least in a greatly condensed set of records.
Here’s the rub. Most simple aggregations of the data are uninteresting. Some types of variables are easy to aggregate. If you’re tracking the number and dollar value of orders, you can just sum by visitor. That’s easy. Total Orders is a great column in your visitor table.
But if you just sum up page views by visitor to get total pages and total visits, you’ve lost most of the interesting detail about the visitor. Total Pages isn’t nearly as enlightening as Total Orders.
So what I’m going to be writing about in this series are some ways to think about the web behavioral data model in the context of a SQL database or an SPSS file. Keep in mind that there is no one way to solve the aggregation problem for every business. The right data model is completely unique to each situation, but since many of the techniques and principals involved are quite general I hope to provide some genuinely useful guidance if your tackling this problem!
This aggregation problem is the primary reason I was excited to see a product like WebTrends Score and similar arrive. The idea of being able to construct an "event", which could be a series of actions on the site, creates a foundation for aggregation that could be actionable for outside systems, not to mention cutting the data down to managable size.
Posted by: Jim Novo | November 05, 2008 at 03:30 PM
Jim,
I agree. Products like Score are particularly useful for solving the problem of supporting "action" systems - one of the big four problems I mentioned. Using a product like Score, you're effectively aggregating a set of behaviors into a single column per visitor row - and using multiple scores you can capture a great deal of truly interesting behavior at the visitor level.
It's probably true to say that a fair number of the aggregation strategies I'll be talking about are similar to what you might do if you're lucky enough to be using Score!
Posted by: Gary | November 05, 2008 at 04:01 PM