Segmentation is at the heart of most real analysis – and in this series I’m focusing on analytic segmentation – not segmentation for reporting. In the last post, I showed how we used segmentation to create a control group to measure the impact of particular content and tool areas on a publisher’s site. In today’s post, I’m going to show how segmentation can be used to measure the before vs. after behavior of visitors for key events on the site.
Before I go there, another reminder about the upcoming Semphonic 2010 Vision of Web Analytics webinar. Due to a “technical glitch” (like most technical glitches a result of stupid CEO disease!), we had to cancel and re-invite attendees for this. If you were a victim of this (or if you just haven’t signed up yet), here’s the new link for registration!
And, of course, if you are looking for answers around mobile strategy and mobile measurement, click here to sign-up for a Greg Dowling breakfast on our site.
Now, to segmentation…
Example: Time-based Segmentation for Before v. After Analysis
Use Case: A financial services company wanted to understand site behavior before and after a lead was sourced.
Client Question: The client wanted to know how often visitors tended to come to the site prior to leads and what pages they looked out. They also wanted to undertand usage AFTER the lead – did visitors continue to come to the site and how did their content consumption change.
Measurement Issues: The challenge in before vs. after segmentation is that, by default, you can’t segment based on the order of events. This makes it extremely difficult to sort out visitor behavior that precedes or follows an event. If you simply build a filter that selects all visitors who generated a lead in a given month, you’ll have no way of deciding whether the behavior in terms of page views, paths, etc. is representative of before or after lead behavior – it will invariably be a mix of visitors some of whom generated a lead on Jan. 1st and whose behavior is mostly after and others who generated a lead on Jan. 31s whose behavior is mostly before.
Tool: We used Omniture’s Data Warehouse for this analysis.
Methodology: The real key to this segmentation happened to be tagging. I generally preach that you can’t rely on tagging for segmentation because analysis needs to be ad hoc. That’s true – and, in fact, we didn’t need any custom tagging for this site because we’d done the initial implementation. And because we so often do time-based segmentation, the initial implementation included the Omniture Time-Parting Vista Rule. With the time-parting Vista Rule, we had the ability to select and filter on a wide range of time-based variables including Date, Day of Week, and Time of Day.
No implementation can anticipate every possible analysis situation. And even very robust implementation planning can’t replace a flexible segmentation tool. But it’s also true that even a very flexible segmentation tool can be defeated by poor tagging. You have to collect the information before you can analyze it!
Given the existence of the Time-Parting rule, building the segments for this analysis was quite easy.
We started with a Visitor Frame (to get all qualifying visitor behavior). To this, we added a Visit frame that looked for a particular date – in this case Jan. 4. There was substantial daily lead volume on this site – enough to analyze effectively. But we created a number of different segments that selected different days.
Of course, this selection will give us everyone who visited the site on the 4th – not those who generated a lead. To get what we wanted, we had to add a criteria to this Visit filter:
Now, we are selecting all Visitor Behavior for anyone who finished the lead generation process on the 4th of January.
With this segmentation in place, it’s easy to isolate the behavior you want. In Discover, we’d just look at all the behavior for this segment for December 1st-January 3rd (prior behavior) and all the behavior from January 5th to Jan. 31st (after behavior). If we were creating an ASI, we could do the same thing in SiteCatalyst.
Using the Warehouse, we can run a report like this:
This report generates a visit number, date, and pages report for the entire period from December 1st to January 31 for any visitor who generated a lead on the 4th of January. We had to parse the before vs. after report manually from what turned out to be a pretty large file.
This underscores a strength and a weakness of Omniture’s Data Warehouse. You can usually get at what you want, but it often results in a rather clumsy data stream that takes quite a bit more work to analyze than if you are using ASIs or Discover.
The actual comparison of before vs. after data is usually pretty straightforward. I typically start with high-level metrics like Pages / Visit, Visit Number, and Visits / Week or Month. From there, I’ll want to compare content usage in the before v. after period. This can be at the Page Level or – if it’s well coded – the channel/sub-channel level. To do this comparison, you typically code each channel/sub-channel as a percent of total, match the two periods, and then compare the difference.
This gives you a report like this:
Before After ChangeTopic 1 8% 5% -38%
Topic 2 6% 11% 83%
etc.
For a complete profile, it’s a good idea to do the same thing for major acquisition sources and, of course, any important site events/milestones.
Building a Before v. After profile of this sort makes it easy to see and explain how site behavior evolves before and after a lead or other key site event.
Conclusions and Recommendations: It’s no surprise that site behavior changed pretty dramatically between the before and after periods. What did surprise the client was the amount of after lead site behavior there was.
In this case, a majority of lead generating visitors came back and spent large amounts of time on the site after the lead was generated. This presents a significant opportunity for customization of the site experience and actually led this company to create a special experience for these types of visitors.
Reflections: Web analytics filtering has some surprising limitations across most tools – not the least of which is the inability to look at any sequence or time-based information. However, with a little forethought in the implementation and some extra work at the segmentation level (building multiple “day” based segments), you can effectively create a Before v. After analysis. If you’re using time-parting as we were, you could extend this analysis even further and look at same-day post lead behavior. That may not be essential in some businesses (we didn’t care much about it here), but it can make a difference in short-lead cycle cases.
The main point is that time-based analysis is extremely powerful and common in nearly all types of business problems and it is absolutely essential to build your implementation so that your segmentation engine has access to date/time data for filtering. The secondary point is that no matter how flexible your segmentation engine, there is a dependent relationship between analysis and implementation – which is one of the reasons you can’t leave your implementation to people who don’t do or understand analysis!
BTW - Look me up at the Visitor Segmentation Panel at WebTrends' Engage this week if you are there!

Gary - you may have hit upon the holy grail of real time analytics. Thank you for bringing it to light :))
Would be interested to know where "tagging" would not be appropriate? Also, what do you think about the "accuracy vs. precision" debate on web measurement?
Thanks for an enlightening read.
Cheers,
Prince
Posted by: Prince | February 14, 2010 at 11:16 AM
Within the confines of traditional web analytics systems, I think tagging for date/time is nearly always appropriate. But tagging for ad hoc requests just never seems to happen. It has to be a really important analysis to get someone to change a tag. That's why I favor being extra careful about the implementation and also why I like web analytics systems that provide a lot of back-end configuration and ad hoc segmentation.
Accuracy vs. precision is certainly a fascinating topic and there's no doubt it's a legitimate one for web analytics. There are analytics problems that require very high accuracy - but many do not. Most analysis efforts do require reasonable precision.
But in web analytics we usually have a very poor idea of our accuracy and of our precision - and we'd probably be appalled to know the truth about either.
Posted by: Gary Angel | February 16, 2010 at 05:04 PM