[Continuing a short series on eMetrics] Another eMetrics presentation that got me thinking was given by Ian Houston of Visual Sciences. Ian had the misfortune (like me) to have a time slot at the very end of the last day. So, of course, nobody else saw it. But Ian gave a really interesting talk about cookies, cookie deletion and the impact on analysis – an issue that is certainly hot right now. In addition, he's posted some very thoughtful blogs on the topic (http://www.visioactive.com/). Ian’s done a lot of study in this area – certainly more than I have – and I thought his recommendations were well-considered and thoughtful. He also pointed out a few things I flat out didn’t know (particularly around overuse – Ian mentioned that each browser has domain-specific limits on the number of cookies and those limits aren’t always as advertised. So if you are dropping too many cookies, you may be causing your own rejection/deletion problems). And it's very worth checking out his surprising initial findings around Daily Cookie data on his blog. But what struck me more than anything else in Ian’s presentation was the urgent need for an analyst to understand the data quality issues involved in an analysis. This is miles, maybe light years away, from the "trend and get over it" school of thought. You can’t ignore data quality issues. You have to understand data quality to know what analysis you can (and can't) do. And no, trending numbers isn’t always the answer. Suppose, for example, that Firefox users are significantly more likely to delete cookies after each session. That means that as Firefox usage goes up, your unique visitor counts will too. That's a trend. Or that cookie deletion is simply becoming more common. That too will drive a trend. That’s fine, as long you understand what’s driving the data. But if you’ve decided that because it’s your site and you’re trending your own numbers then you don’t need to worry about data quality you are getting the wrong answer. Wrong absolutely. Wrong trended. Just plain wrong. And you are passing those wrong answers onto your investors, your sponsors and your advertisers. That isn’t good. Indeed, it is perfectly possible for your absolute numbers to converge on reality while your trends are increasingly distorted. Think about that. Trending solves one particular kind of data quality problem. It can be neutral or actually exacerbate other data quality issues. So unless you understand what variables drive poor data quality, then you have absolutely no guarantee that trending will solve your problems. The same is true for new visitor analysis. Ian makes an absolutely critical point. An analyst needs to differentiate between measurement where we are confident of what we’re measuring (if the cookie is there it’s the same computer and we do have a repeat visitor) versus what we don’t (the cookie isn’t there so it may be a new visitor or it may be cookie deleter). If you have the same level of confidence between these two analyses, then you’re missing the point. Yes data quality does suck. No, don’t get over it. An analyst should be obsessed with data quality. Unrelenting in pursuing techniques and methods to make an analysis accurate not flawed. Obsessed with what’s known for sure and what’s doubtful - and, perhaps most important, absolutely clear to others about which is which. Is there a time to "get over it?" Yes. When you’re reasonably satisfied your answer is right. Not exact, but right. Just because web data is riddled with problems, analysts tend to suffer from either analysis paralysis or cavalier indifference. Neither, of course, is ideal. But if you ask someone how to get to San Francisco from Los Angeles, would you rather have someone make up an answer or say "I don’t know?’" If you’re like me, you’d rather have an admission of ignorance. It’s a fact as old as Socrates. In the end, the better part of wisdom is knowing what you don't know.

Gary,
Your last sentence is quite capital. "Knowing what [I] don't know" is a form of control over the analytical dilemna. At least, I know where my numbers can fool me, so that I can factor it in my analysis, and make/test hypotheses that can help me validate and refine the suspicious numbers.
I could not attend the EMetrics, but I saw nowhere that there was a debate about the recent comScore study, the accuracy question, etc. If not, there should have been one, and I ask Jim to make room for it at the next one in Washington. It's of course not a question of denegrating the value of Web Analytics because of those problems; we all produce everyday extremely valuable insights. But the data quality issue can not just be ignored. I find your critique of the "trend excuse" quite powerful, and the "other side" should try to respond. It's time we all debate this visitor question to the bone.
Posted by: Jacques Warren | May 15, 2007 at 05:32 PM
Gary: LOVE the post and I strongly agree with you. I have heard the "data quality sucks" argument for years and here's something I've noticed: People who tell you to "manage based on trends" usually have some ulterior motive --- they work for a vendor, or are paid by a vendor, or use technology that is so severely limited that there is little that can be done to improve the situation (and thusly it's easier to say "get over it!")
When I first wrote the cookie research some people accused me of yelling "fire" in a crowded theater and called my research "irresponsible." The problem is that there IS a fire and people need to be warned so they don't get burned.
Thanks for taking a stand on this one.
Eric T. Peterson
CEO, Web Analytics Demystified
http://www.webanalyticsdemystified.com
Posted by: Eric T. Peterson | May 16, 2007 at 10:31 PM
Hi Gary,
Thanks for the great recap of my presentation. You nailed my point about focusing on what you "know" about and from the data perhaps even better than I did in the talk. ;)
Cheers,
-Ian
Posted by: Ian Houston | May 17, 2007 at 11:44 AM