[If you’re going to be out at the Adobe Summit this week and would like to chat about big data models, digital analytics and how EY can help you get it right, just drop me a line and we can connect!]
When I talk to companies who are thinking about or trying to build a big data digital analytics warehouse, the two business problems that usually seem to be driving the train are attribution and optimizing user journey. They are deeply related, of course. Attribution is just a view of the user journey where campaign touches are of particular interest. So if these two applications are driving big data warehouses, it’s clear that how you model the user journey is really important.
There are people who believe that a user journey model should be constructed directly from the detail data which, for digital, would typically mean link-by-link and page-by-page. I don’t think that’s right, but before I explain why I want to revisit the notion that big data analytics is all about leaving the data in the raw form in which it’s sourced. People tend to give special pride-of-place to the data in its “raw” form, but there is no good reason why they should.
The raw form of data always represents a series of choices by someone who designed the data capture technologies or systems and those choices are typically made with little or no consideration of the downstream analytics. Even when such consideration is given, we need to remember that the measurement system designer’s view is no more authoritative than anyone else's. What’s more the designer has to account for many, many different types of possible measurement which may require data at very different levels of analysis from what is optimal for a particular problem.
Some advocates for big data analytics will argue that it’s always preferable to work with “detail” level data. But what’s “detail”? Even a moment’s reflection should make it obvious that the concept of detail is largely meaningless. In digital, is “detail” every single mouse movement on the screen? Is it page loads? Is it visits? A raw data capture system might choose any one of these three levels as the “detail” level of data it captures. It can’t be true that an arbitrary decision by a technology designer somehow confers a favored position on that particular level of data. In general, the more detail you collect or expose in a model, the more questions you can answer. What’s the trade-off? When you add detail you generally make it harder to answer those questions you could have answered with a higher level of generality.
It’s true that with sufficient analytic power, any higher-level view can be derived from the lowest level of detail. So collecting data at the lowest level of detail possible guarantees that you can build any higher level view necessary and the reverse, of course, is not true. But that simple fact doesn’t mean that it’s reasonable to collect or use data at the lowest level of detail. Trust a data scientist to use an electron microscope to decide if your car has a dent!
The job of the data modeler is to transform the data into the form that best supports ongoing analysis. You’re never going to support everything of course, and that’s why it’s always useful to keep the raw data around. But if you can build data models that make most analysis easier and faster, you’ve done your job.
So what’s the right level of data to understand user-journey?
For most enterprises, the key fact about user-journey analysis is that the data is inherently multi-sourced. You aren’t simply stringing together one web visit after another – all with exactly the same type of detail data. More likely, you’re combining data from a digital analytics system, an email system, a store or branch system, a call-center and maybe some other stuff as well.
What makes this tricky is that the underlying detail from every source is fundamentally different (and oh, by the way, the level of detail captured by every source is quite different). On the web, you typically have a fairly fine-grained view that shows page-by-page interactions. In the call-center, your view may be at the call level or it may capture each transfer as an independent record. At the branch or store, you may have a CRM like record with activities and notes or you may only have a transaction record.
If you simply combine all this data, these differences in the level and type of detail data make analysis virtually impossible.
So the first step in building a user-journey model is to get to a level of abstraction where each touchpoint can be represented equally. Of course, we also want that level of abstraction to be as little lossy as possible for every data source!
For web and mobile data, there really aren’t too many alternatives. You can’t go up to the visitor level, because it’s the journey you’re trying to represent. The page level, on the other hand, is to detailed since there isn’t likely to be a corresponding level of detail in your non-web sources. Almost by default, therefore, you’re likely to end up at the visit or unit-of-work level that I described previously.
At this level, our concepts of two-tiered segmentation are hugely important. I’ve written about two-tiered segmentation often, so I’ll just recap here. In a two-tiered segmentation, the first tier is the visitor type. This is the traditional visitor segmentation based on persona or relationship. The second tier is a visit or unit-of-work based segmentation that is behavioral and is designed to capture the visit intent. It changes with each new touch (or even inside the touch if you’re using a unit-work concept).
We usually describe the two-tiers as describing who somebody is (tier 1) and what they are trying to accomplish (tier 2).
That second tier of visit intent is perfect for building out a user-journey model. It provides a highly aggregated but relatively non-lossy representation of what a visit was about and how successful it was. Both are critical pieces of information when you’re trying to model user-journey.
The full beauty of 2-tiered segmentation comes into play in a full omni-channel setting. Because the same model can be used across almost ANY touchpoint. The intent-based segmentation can be applied easily and intuitively to calls, branch or store visits, social media posting, even ATM usage. It can also be applied – rather less intuitively – to display advertising and email touches.
Applying the 2-tiered segmentation to a user journey data model gives us a structure something like this:
- TouchDateTime Start
- TouchType (Channel)
- TouchVisitorID
- TouchVisitorSegmentCodes (Tier 1)
- TouchVisitSegmentCode (Tier 2)
- TouchVisitSuccessCode
- TouchVisitSuccessValue
- TouchTimeDuration
- TouchPerson (Agent, Rep, Sales Associate, etc.)
- TouchSource (Campaign)
- TouchDetails
This compact structure with one row per touchpoint visit or unit of work is a good foundation for many kinds of journey analysis. With this structure, you can analyze multi-channel paths, do rich attribution modeling, and support journey personalization.
Hearkening back to my last post, you could chain journey records in memory to create a very powerful representation of an entire user-journey for sophisticated real-time personalization.
I think this type of data structure is a likely candidate to be a part of any user-journey data model.
However, I also think it’s worth considering if there aren’t some useful aggregations above this level. After all, this file structure still leaves us with multiple rows per visitor. That’s not ideal for many kinds of statistical analysis.
So how can we take this type of record and aggregate it to the visitor level without losing the journey information?
One way is to model in the abstract the key customer journeys. This type of modeling (common in customer experience projects), can then be used to create a visitor level data structure in which the individual touchpoints are rolled up.
Let’s say, for example, that you modeled the acquisition journey for a big screen TV like this:
- Initial research to Category Definition (LED vs. LCD vs. Plasma – Basic Size Parameters)
- Feature Narrowing (3D, Curved, etc.)
- Brand Definition (Choosing Brands to Consider)
- Comparison Shopping (Reviews and Product Detail Comparison)
- Price Tracking (Searching for Deals)
- Buying
With an abstract model like this in hand, you can map your touchpoint types to these stages in user journey and capture a user-journey at the visitor level in a data structure that looks something like this:
- VisitorID
- Journey Sub-structure
- Journey Type (Acquisition)
- Current Stage (Feature Narrowing)
- Started Journey On (Initial Date)
- Time in Current Stage (Elapsed)
- Last Touch Channel in this Stage (Channel Type – e.g. Web)
- Last Touch Success
- Last Touch Value
- Stage History Sub-Structure
- Stage (e.g. Initial Research) Start
- Stage Elapsed
- Stage Success
- Stage Started In Channel
- Stage Completed in Channel
- Channel Usage Sub-Structure
- Web Channel Used for this Journey Recency
- Web Channel Used for this Journey Frequency
- Call Channel Used for this Journey Recency
- Call Channel Used for this journey Frequency
- Etc.
- Stage Value
- Etc.
This stage mapping structure is a really intuitive representation of a visitor’s journey. It’s powerful for personalization, targeting and for statistical analysis of journey optimization. With a structure like this, think how easy it would be to answer these sorts of questions:
- Which channel does this visitor like to do [Initial Product Research] in?
- How often do visitors do comparison shopping before brand narrowing?
- When people have done brand narrowing, can they be re-interested in a brand later?
- How long does [visitor type x] typically spend price shopping?
These and many, many other journey questions will fall out of simple SQL queries and easy statistical modeling once you’ve created a journey-mapped structure at the visitor level. Imagine how much harder it would be for an analyst (or worse, a merchandiser) to answer any of these questions using the raw detail data.
There really is a kind of beauty in a good data model and the ease with which important but previously complex queries can suddenly be handled speaks to a kind of elegance that is rather wonderful.
Next time…I’m not really sure about next time. Perhaps conversations at the Adobe Summit will provide grist for the mill. See you there!
Gary, I am always impressed by the quality level of your posts and papers.
Not only they are very powerful methodologies (and thus replicable), but they also make a lot of sense and are yet simple.
“Simplicity is the ultimate sophistication.”
― Leonardo da Vinci
Thanks for sharing your knowledge. It has often provided me with a great mindset and approach to make sense of numbers, data and most importantly, to derive real insights that DO HAVE business impacts at the core.
Julien
Posted by: Julien | March 24, 2015 at 08:05 AM
Hello, Thank's for this exellent post. I have a question. Can you give a little description of the requested fields and what the results should be. An example is always welcome. Fields:
•TouchVisitorSegmentCodes (Tier 1)
•TouchVisitSegmentCode (Tier 2)
•TouchVisitSuccessCode
•TouchVisitSuccessValue
Luc
Posted by: luc | April 29, 2015 at 12:18 AM