It was a crazy end-of-week for me. This coming week Jesse Gross and I are doing Think Tank training (Functional Analytics, Use-Case Analytics, Engagement Analysis and Attribution Analysis - good stuff) for clients in Stuttgart. Then it's off to Berlin and the X Change. But in a fitting last minute frenzy, I had an orgy of talking this Thursday and Friday that left me hoarse and deeply tired of my own voice.
Thursday was one of the weirdest office days ever, as Fox News was out at Semphonic taping a documentary. Not only was our office a shambles (though like The Cat in the Hat they put everything perfectly back in place before they left), but the filming was endlessly tiring and distracting. I spent almost two hours in talking-head mode and another 30-40 minutes running a demo, not to mention shots walking up and down our offices, shots at the whiteboard, and shots at the drinking fountain (just joking)...It was exciting but exhausting. Which left me in no shape for a Friday in which I had scheduled two webinars with iJento including a 5AM start-time for an EU-convenient session. Brutal. If you missed both sessions, you should be able to check it out on our site (and iJento's) this week. Both Barry Parshall and I agreed that the afternoon session was the better of the two (imagine that) so that's the one you'll hear. It's actually pretty darn good. Barry was great, my voice didn't fail and I think you'll enjoy the content.
Anyway, it was a huge relief to go to the Symphony on Friday night. Not so much for the music* but because I didn't have to say a word for more than 2 hours!
I've covered much of my content from the webinars previously, but one piece that I haven't much covered and promised to address is the alternative techniques for building a digital segmentation. For the past year, I've been blogging and talking repeatedly about our Two-Tiered Segmentation techniques and how fundamental they've become to Semphonic's digital analytics practice. The segmentation techniques drive everything from implementation design to reporting to use-case analysis to data models in the warehouse.
The basic concept is pretty simple. Proper segmentation in the digital world requires traditional visitor based segmentation (the who) paired with a visit-based segmentation (the why or what). This creates a two-tiered segmentation within which live interesting metrics and KPIs, testing opportunities, and good data warehousing aggregations.
I think that most analysts who've had a chance to reflect on the basic concepts will almost certainly agree that this approach makes far more sense than a focus on sitewide KPIs and metrics and is a promising direction for modeling digital behavior in the CDW.
On the other hand, there's often a healthy skepticism about the challenges of building a Visit-based segmentation. How successfully can one really infer intent when it comes to web visits? That is the trick of course. I've talked and written extensively about some of the methods to do this.
What I haven't covered in great depth is three quite different approaches to actually implementing visit-based segmentations.
The first method of creating visit-based segmentations is to create a set of hierarchical filter-based rules. This is the approach we take when we're building a visit-based segmentation with Omniture, Coremetrics, or Google Analytics. In Discover, for example, you have a rich segmentation builder that allows the analyst to construct a complex segmentation based on visitor, visit and even page-based criteria. With both Exclude and Include Logic, it's perfectly possible to create quite complex systems of hierarchical rules. That's important, because we generally craft visit-based segmentations to be mutually exclusive.
This works and, in fact, it's by far the most common method we've used to create two-tier segmentations. For Use-Case analytics and Management Reporting, it's ideal.
There are some limitations inherent in this method. First, if you're building segmentations in the Web analytics tool, you're often terribly limited in the amount and nature of the offline customer data you have. One of the most frequent questions I get asked (and it came up again this last Friday) is whether an online segmentation should include offline data. To which I, like any analyst, pretty much answer "the more the merrier." It's a rare analyst who will turn down additional data for a segmentation. For many enterprises, offline data is absolutely critical.
Even where you're working exclusively in an online world, however, filter-based rule segmentation has some limitations. Unlike clustering techniques common in traditional segmentations, rule-based segments have a very hard time capturing more ambiguous groupings - especially around quantities of behavior. If you did X amount of behavior A and Y amount of behavior B, which other customers are you most like? Rule-based segmentation doesn't answer that type of question well.
The limitations in rule-based construction can also be challenging. Rule-based systems don't handle any type of weighting or scoring. It's usually impossible to create rules of the sort "visitors who did X more than Y" without specifying particular values of X and Y - something that you generally don't want to do.
That isn't the only limitation in filter-based rule construction. You often can't access critical visit-segmentation variables around which action a visitor did first or how long it is since they did something. When you're trying to understand visit intent, it's often essential to know whether a visitor did an internal search right away or after viewing fifteen pages, or whether they'd visited the Website and started an order earlier in the day. Sequence often makes a huge difference when it comes to understanding visit intent. Any time-based or sequence based behavior is surprisingly hard to capture with filter-based systems.
So is there an alternative?
There is, but it's harder, more expensive and more complex (well - that's what you'd expect). By importing Web analytics data into a statistical analytics package, you can take advantage of a much richer set of variable manipulation, rule construction, and analytic methods all in an environment that joins offline and online data.
Traditional clustering techniques can be used to construct visit-based segmentations that group visits along dozens or even hundreds of variables and take full account of both the frequency and intensity of behavior.
Using clustering techniques to create visit-based segmentations will almost certainly yield a superior visit segmentation to one constructed by even a very clever analyst in a rule-based filtering system.
On the other hand, there are some significant drawbacks to this method from an operational standpoint.
First, it's a lot harder. You have to export, format, join and analyze the data. Worse, most Web sites spin off volumes that make it pretty much impossible to analyze the entire web population. Sampling isn't an issue if all you're looking for is a marketing model that will help you understand how visitors use your Website. But that's rarely the only (or even primary) goal of a visit segmentation. If you want to generate reports, create tests, build data warehousing or CRM aggregations, or do any form of targeting, you need a method for attaching segment codes to every visit. That means your building a model that then needs to be translated back into some form of rules so that you can actually apply to each visit/visitor. You don't want to constantly export data, analyze it in SAS, and the re-import the segmentation codes.
In some cases, you'll find that you can actually translate cluster-based segments back into rules in rule/filter based system and still have a pretty decent approximation of the segmentation. When that happens, it's very convenient and it means you can re-create your statistical models in a system like Discover. Not surprisingly, however, it doesn't always happen. The more you push the data and the more sophisticated your rules, the less chance there is that you'll be able to build a decent approximation of them in a Web analytics tool.
Which brings me to the third method of instantiating a visit-based segmentation - ETL rules. At Semphonic, we've built a pretty sophisticated system for processing Web analytics data feeds and incorporating our segmentation concepts directly into the process. Initially, we create the segmentation (which can be rule or cluster based) and then we instantiate the segmentation rules into the ETL. Because we have full programmatic control of the process, ANYTHING we need to instantiate a cluster or rule-based model is available to us. This means that the segmentation is incorporated directly into the data stream as it's processed and pushed into warehouse tables.
As I spotlighted in the iJento webinar, we can do exactly the same things with the iJento system. They've built a flexible data model and an ETL system that gives us almost full access to the records and logic. What that means is that Semphonic's segmentation schemes can be directly incorporated into an iJento-based warehouse on a near real-time basis. That's pretty cool.
Digital segmentation on the Two-Tiered model isn't pie-in-the-sky stuff. It isn't "advanced" stuff you can postpone while you get along with the never-ending business of tagging, building reports and serving ad hoc requests. It's the bedrock of a good web analtyics program - driving everything from useful reporting to testing to targeting. As the systems and options to deliver worthwhile segmentation continue to grow and expand, the competitive necessity for moving beyond a traditional web analytics program keep growing.
Whether it's driving a rule-based segmentation in Omniture Discover, creating Hadoop based ETL, or creating an analytics warehouse in a system like iJento, the choices and systems for delivering segmentation in digital analysis have never been better or the need to do so more pressing!
I'll check-in next week from Berlin and I'm also intending to post a guest contribution from our own Matthias Bettag on several Websites that provide comparative information on analytics tools.
Wir sehen uns in Berlin!
*Though it was a beautiful Symphony program - classic MTT - with a sure-fire crowd-pleaser to anchor the program (Beethoven's Pastoral), a far less known but pristinely beautiful piece in the classic tradition (Mahler's Blumine) and a challenging modern piece by (a German composer - how appropriate!) Schnittke (Violin Concerto #4) that I and probably most of the audience had never heard before. I particularly enjoyed the placement of a violinist in the side terrace above us for Schnittke's piece. A kind of Fiddler on the Roof! Amidst dissonance, the ominous chimes and the orchestral clock ticking there lived a furious violin battle and some lovely little harmonies as well as a surprisingly complete ending. And you would have to be almost immune to the solace and joys of music not to smile with pleasure at the opening lines of the Pastoral - the musical equivalent of a girl skipping across a golden field. I could speak for a hundred years and never say anything a tenth so beautiful.