About fifty years ago, C.P. Snow published a famous and by now almost trite conceptualization of the split between science and humanities as “two cultures” which could not communicate. It struck me at eMetrics that a similar problem exists today in web analytics – there is vast gulf between the culture of web analytics practictioners and the culture of web analytics theorists.
These two worlds sometimes presented themselves in such gross opposition as to make the individual hourly sessions that followed the expert/case-study format at eMetrics (giving one half-hour to each) either annoying, bewildering or amusing - depending, I suppose, on your cast of mind.
What is alarming isn’t that a gap exists – it does in every profession. What is troubling is the extent of the gap - the often apparent lack of ANY connection between the problems and methods of the practitioners and the words of the professional theorists and talkers. This is not a good thing.
And nowhere is this split more telling and more damaging than around the troubling issue of web analytics data quality.
Two presentations that I saw embodied this split for me and made it particularly vivid.
The first was a beautiful presentation given by Rufus Evison of dunnhumby. Though highly specific to grocery-store data (the sort of data we all wish we had but don't - and a bit frustrating on that account), Mr. Evison's presentation was more than just an excellent piece of real-world practice. It was a primer on the importance of finding and using the right data.
The dunnhumby presentation began with the "McNamara Fallacy." (The name is ironic since McNamara was a brilliant data analyst - possibly the best I have ever read or known of. But the fallacy was attributed to him by another excellent analyst - perhaps correctly - for McNamara's role in the Vietnam war. As with many a profession, analysts are better remembered for their mistakes than their successes!)
Here is the McNamara Fallacy around which Mr. Evison built his presentation:
- The first step is to measure whatever can be easily measured. This is okay as far as it goes.
- The second step is to disregard that which can't be measured or give it an arbitrary quantitative value. This is artificial and misleading.
- The third step is to presume that what can't be measured easily really isn't very important. This is blindness.
- The fourth step is to say that that which can't be easily measured really doesn't exist. This is suicide.
When an analyst begins with this quote, it’s fair warning that they take their data seriously. And I have come to believe that taking the data seriously is the hallmark of the culture of real practictioners.
Not too much later, I sat through a presentation by someone who apparently trains people in web analytics. It included a single slide with the words “data accuracy” and a big X written over them. I cannot quote the accompanying dialog exactly but it was essentially this:
“Web analytics data isn’t always accurate. So you have to trend it.”
In terms of McNamara's Fallacy, this is at least blind and probably suicidal.
I have written before on the utter fallacy of the “trending as a protection against data quality argument.” No worse idea has ever been foisted off on our community, and it is deeply disturbing to see this particular advice given to beginners. It is as if a teacher of mathematics calmly advised his students: “When you do an addition problem, don’t bother to double-check your work. If your answer is bigger than either number you added, it’s probably right.”
Trends, alas, are as likely to be the result of data quality problems as they are to protect against data quality problems.
Indeed, the whole idea of a trend as a protection against data quality problems is an intellectual muddle. It would be more correct to say that insight can be gleaned from imperfect data (luckily since we never get a perfect data set). It would probably be correct to say that, provided the data is not too imperfect and that the imperfections are (somewhat) random, changes in the data set may reflect changes in the real-world.
It would be about equally true to say the our inferences from a snap-shot (not trended) picture of the world are (if the data is not too imperfect and the imperfections somewhat random) possibly indicative of the real-world.
Trending, you see, has absolutely nothing to do with the underlying issue - "Is the data good enough to use and how much confidence should we have in what we find?" There is no difference between using the data to describe a snapshot of the current situation (how much content did prospects look at this month compared to how much content existing customers looked at this month) and a trend (how much content did prospects look at this month compared to how much content they looked at last month).
It is a good and useful warning to beginners to make it clear that web analytics data never achieves the near-perfection of some financial or manufacturing systems. It is an imperfect and damaging falsehood to suggest that trending is somehow the catch-all response to this fact.
Given this direction, beginners will take seriously and grossly misinterpret a range of web analytics issues. They may believe that a 13 month trend up in 1st time visitors is actually meaningful. They may believe that a trend in demographic data for their site as reported by a competitive research tool is meaningful. They may believe that an upward (or downward) trend in conversion rate after they have switched tools is real. Trends are meaningful only if the trend is not, itself, the product of flaws or shortcomings in the data. And trending NEVER, EVER provides protection against problems in interpretation resulting from a lack of statistical significance in the data.
Unless you understand where and how the problems in your data arise, trending is no protection against fundamentally misinterpreting what you are seeing. That’s just the way it is and every experienced, hands-on practitioner will know, understand and appreciate this simple fact.
Which brings me back to the two cultures problem.
There is nothing impossible or even unlikely about being both a practitioner and theorist. Indeed, it is hard to be good at the former without being at least adequate at the latter. And it is most certainly not the case that every theoretic presentation should be ignored or treated as the work of empty punditry.
But theory utterly divorced from practice is devoid of interest. And in no single attitude is this more apparent in web analytics than in a lack of respect for the data.
Web analytics data is messy. Difficult. Plagued by problems that are hard to understand and tricky to solve. It is so much easier if you can ignore these problems that for the unblooded theoretician, the temptation to over-simplify is almost irresistible.
This is an explanation of why the phenomenon is so widespread as to have created or at least personified two worlds. It is not an excuse.
A real data analyst will never lose sight of this giant problem. A real analyst may spend half their presentation telling you how hard they worked to get the right data and get it clean. It may be dull. It may not be the stuff of dreams. But it is, manifestly, true.
The gap is far too wide in our field between the words of our theorists and the work of our experienced practitioners. And in no place would it be more profitable to build a bridge than around this issue of web analytics data.
For it is my general observation that if you are getting a shallow, fake and unworried explanation of handling web analytics data quality, you should expect a similar lack of intellectual effort around everything else you are hearing.
Indeed, the particular “trend your data” presentation that so annoyed me (though it was, in fact, hardly different from and better presented than a half-dozen other ones I have seen) was not all misguided. It had a fair number of sound ideas jumbled up amid a nearly equal number of bad ones.
But if you cannot trust your theorist, consultant, strategic advisor or analytics educator on this fundamental point of intellectual honesty and effort, I do not think you will be well served to trust them at all. To paraphrase a famous dictum on the effectiveness of marketing spend, you may know that 50% of what you are hearing is true, but how can you decide which 50% it is?
[Personal Postscript:
Not all - or even most - was doom and gloom. I enjoyed eMetrics, the majority of the presentations I saw, and nearly all of the conversations I shared. I hope my sessions on analytic planning, SEM Analytics and measuring engagement were worthwhile for people. Each had it's joys and disappointments from my perspective!
I missed most of the Keynote presentations since I had to work mornings. But in the afternoon sessions, in addition to the Rufus Evison presentation, I enjoyed among others: Diane Hoag’s presentation of a case study dealing with an organic search traffic decline resulting from a Google release (simple but real); the more practical elements of the panel on Multivariate Testing; Alex Langshur’s talk focused on public sector analytics – particularly the parts on categorizing SEO Terms and the compelling final report he showed; and Michael Stebbins’ highly-polished presentation on Universal Search (it wasn’t analytics focused, but if you want to do Search Analytics you have to understand Search and Michael provided a good, quick introduction to "Universal" in the session we shared). There were many other presentations I wish I could have seen - I tend to choose a bit randomly and go mostly to see people I don't know!
I especially enjoyed co-presenting with Nancy Abila – probably my single favorite experience at eMetrics. Thanks Nancy!]
Gary, I'm not a web analyst, merely
a passionate web user. Perhaps the
gap between practitioners and theorists
could be lessened by reflecting on the words
of another former US Secretary of Defense:
"As we know, there are known knowns.
There are things we know we know.
We also know there are known unknowns.
That is to say, we know there are some
things we do not know.
But there are also unknown unknowns,
the ones we don't know we don't know."
- Donald Rumsfeld -
Posted by: BLarner | May 12, 2008 at 04:41 PM
Obviously, there is something very dangerous about going to work for the DoD. I will have to tell Phil Kemelor (our Washington VP) not to bid any Pentagon jobs!
Posted by: Gary | May 12, 2008 at 10:41 PM
Hi Gary,
I read a great post once about accuracy v precision that would fit well into this (wish I could find it). If your data is precise you can trend it over time because the error bars are small. If your data is accurate your get better snapshots.
Knowing that your data isn't accurate is fine, but having to work to get it into a position to be precise before you use it to work out if the changes you've made have worked is probably one of the hardest things. That is something that has always annoyed me about WA tools that limit table lengths and/or go into a sampling mode in the long tail.
Alec
Posted by: Alec Cochrane | May 14, 2008 at 06:53 AM
Alec,
That's a good point and I think it gets to the heart of what was originally intended. Analysts wanted - rightly - to make it clear that the numbers in web analytics were not precise enough to use as an official system of record. But they also wanted to make it clear that the numbers could be used to understand the business environment and changes within it. As far as it goes, this was dead-on. Unfortunately, like many good ideas this was somehow transmuted into a very bad idea - the completely mistaken belief that trending is somehow the reason you can use the data (instead of the fact that it is accurate enough for some level of confident prediction) and that if you trended you didn't need to worry if your data actually was accurate enough to use (never mind precise enough for back-office purposes). This turns out to be very bad advice indeed - especially to those just entering the field who are all too likely to believe it! The sad truth is that we see many web analytic data sets that are simply NOT accurate enough to use for many purposes.
Posted by: Gary | May 14, 2008 at 09:56 AM
Hi Gary,
Excellent post, and the best presentation against trends against accuracy I have read. I'm 100% with you here; I have always detested the trend excuse to data quality laziness. I just spent a week at the TDWI World Conference (data warehouse and BI), and let me tell you that those guys are darn serious when it comes to data quality (I spent two entire days on that topic alone).
True, we make decisions on trends, but as you put it so well, providing we can trust those trends are not caused by variations in the quality of data! I mean, how hard is it to understand?
Anyone who has done a web analytics implementation, who's been through the "tunnel", i.e. that period when numbers don't make any sense, knows that Web Analytics is certainly not a plug'n play world... Data quality assessment is still mandatory, and we should not hide behind "the Internet data is mesy" excuse for not addressing it.
I don't know if the gap here is really between practioners and theorists. It seems to me to be between good and bad analysts...
Posted by: Jacques Warren | May 21, 2008 at 01:14 PM
Jacques,
Thanks. Regarding your last point - I get what you're saying because it is more about good vs. bad analysts. But I have noticed that people who NEVER really practice are particularly susceptible to this mistake. All of us have been bad analysts at one time or another and abused not used our data. But if, like you and I, you've been burned by data quality issues in real life you tend to be much more keenly aware of the problem! Theory from non-practictioners is nearly always vapid, I've found, because it lacks the practical foundation that experience provides.
Gary
Posted by: Gary | May 21, 2008 at 02:17 PM