I read with interest Eric Peterson’s comments on Technorati and Avinash’s annual web analytics blog ratings. Eric’s salient point is that Technorati has some deep and serious flaws as a means of evaluating how referenced a blogger is. The flaws Eric describes include technical difficulties and massive susceptibility to gamesmanship – and the examples he pulls are from a very small sampling of web analytic sites including Avinash's own site. I see that Avinash has published his rankings anyway – though the formula has been broadened a bit to include additional data.
Avinash notes "I emailed the 35 web analytics blogs I track for my rankings and they all kindly sent me their feedburner subscriber numbers. Thanks folks (!!). Only one blogger declined to share his feed subscribers, the list is poorer for that."
I guess Eric must have sent his data and then declined to be in the study since I know I declined to send my data. And I don’t suppose that even in the notoriously data flawed world of web analytics one and one equal one.
In truth, my reasons for declining had nothing to do with the reliability of the numbers (though Eric’s findings hardly surprised me) and more do with the nature of the undertaking itself. Indeed, I wouldn't have normally bothered posting on the whole thing had Eric's post not triggered a flood of comments both pro and con. So here's my two-cents worth.
Before I get into my concerns about the undertaking and why I felt no particular desire to participate, I’d like to consider Eric’s post more carefully.
I, personally, know nothing about Technorati’s system. I’ve never checked my own ranking and have no idea what it is. Until a few days ago, I’d never even visited the Technorati site. If I was a professional blogger or even a semi-pro, that would be either a lie or inexcusable. Perhaps it’s inexcusable anyway. But blogging is a corporate function for me and a very, very small part of my job. And since Semphonic mostly deals with high-end companies and large engagements, I’ve never been all that concerned with raw traffic.
But what is Technorati’s system really? Strip away the fancy words like "Authority" and it boils down to a relatively simple system for helping searchers locate frequently referenced entries by subject. That’s pretty much it. And the operative word here is simple. It’s a basic search tool for new searchers to locate potentially interesting blogs.
Now one thing I do know is how extremely gameable and flawed Search Engine Rankings have historically been. I see, and measure, how silly, arcane, and meaningless are the factors that ultimately contribute to a number one ranking on many search engines for the vast majority of highly competitive terms. And I have no real reason to believe that Technorati is somehow much better at this than the very smart people at Google, Yahoo and Microsoft. In fact, from what I was able to glean about how the "Authority" ranking actually works coupled with the obvious gamesmanship that Eric reveals in his post, I think it’s clear that the Technorati system is way less sophisticated than current state of the art in organic rankings.
So you have to wonder why anyone would think it makes sense to use this measurement as anything more than a mediocre "starting" search proxy. Suppose you heard a claim like this: "We are the number #1 authority in "brand marketing" because Google ranks us #1 for that term." You'd laugh, right?
You’d also laugh if someone made a claim like this "We are the most popular web site in "brand marketing" because Google ranks us #1 for that term." Laughable.
You'd probably laugh especially hard if it were shown that the site making the claim was actively gaming the system.
But, apparently, it’s not so laughable when a renowned web analytics authority makes pretty much that same claim.
So now Avinash has adjusted his formula - turning it into a much more complicated regression formula and including Feedburner numbers provided by the sites themselves. I’m not sure I see how this helps. Not only is a major portion of the ranking still garbage, but it seems trivial to game Feedburner numbers too.
Nor am I exactly clear what all this fiddling with the formula is really about. Reading the rankings from a year ago, I assumed that the point was to measure what Technorati so loosely calls "authority." There is a direct counterpart to this in the academic world, where the number and quality of cites are studied to track influence and the spread of ideas. You could argue that using Technorati numbers like academic cites is self-serving, stupid or both – but it at least had a certain intellectual consistency to it.
Now unlike citations (which is essentially what I take Technorati to be measuring), RSS Subscribers is a measure of a certain kind of "engaged" popularity. And it would very hard to see how it has any relation to the original concept of authority at all.
This means you have two metrics designed to measure fundamentally different things competing in a single regression equation. What exactly is it supposed to predict – authority in the academic sense or engaged popularity? This looks to me like a classic case of throwing advanced statistical techniques at a problem without really understanding what you’re about.
In my experience, this normally results in a muddle. And frankly, I think the whole exercise is a muddle.
You see, it wasn’t because of concerns about the data that I originally chose not to participate. I didn’t know anything about the data.
My concerns were more basic.
I know people love popularity contests. They love knowing who’s ahead in the Presidential election. Indeed, lots of people seem to care far more about who’s winning than who the better candidate is. But I detest that, and I think it’s corrupting on the whole system. It's also a terrible use of measurement, something I take seriously indeed.
Yes, people love this stuff at every level. From what movies made the most money, to whether people approve or disapprove of Britney’s haircut.
But for me to spend time providing numbers, I’d have to think that there was actually some useful purpose (other than self-promotion) behind the exercise.
Some people might suggest that the purpose of using the Technorati numbers in this "Top 10" fashion is to help people find blogs that they otherwise wouldn’t. That's obviously how Technorati intends them to be used in the first place. From Technorati's standpoint, trying to help a new searcher get started, their method surely looks reasonable if far from perfect.
But this, obviously, cannot be what Avinash intends. If the system measures anything at all then it’s going to favor blogs people have already "discovered." So Avinash would just be pointing people to blogs they already know about and could easily find. And it's hard to see why Avinash would need to use a machine ranking to point people to interesting web analytics blogs.
It’s also hard to see or understand why anyone already reading an extremely popular blog like Avinash's would care about which web analytics blogs are #1 on Technorati or even most viewed without having some understanding of "to/by whom" and "about what."
A few people (I hope not many in our community) might even believe that this sort of ranking has something to do with being a "good" blog.
Unfortunately, this type of "reference" metric will never be very interesting as a clue to what blogs you might find worth reading. I doubt there is such a metric. But in the spirit of thoughtful measurement, I have come up with a few that I think would be more useful in this regard:
- Navel-Gazing Index (NGI): Number of Words about the author and authors friends / Total Non-Stop Words. Lowest score is best.
- False Modesty to Real Immodesty Index (Pygmy/BigMe Index): Count all fake modest phrases / all boastful phrases. Very large or very small numbers are worst.
- Intellectual Property Index (IPI): Percent of a blog’s contents that are actually about the supposed topic. Highest numbers are best.
- "Borrowing" Shamelessly Index (BSI): The percent of ideas on a blog "borrowed" from other people that aren’t attributed. High scorers should be whipped!
- Words per Blog (Blather Index): High scores are…best (Jim Novo and I rule!).
I’ve never scored anybody’s blogs by these metrics (except sub-consciously), but I’m willing to bet that they’d provide a more useful guide to good blogs than any form of form of reference counting.
I mean, c'mon - it’s really this simple. There are some jobs only a machine can do. You have to rely on Google to rank billions of pages for relevance. Not because the computer does a better job than a person but because there aren’t enough people to ever do the work.
But if you want to find a web analytics blog that suits you, go read some. There aren’t that many. Sure, you couldn’t read all of them all the time. But you can read one from most of us and decide which ones you like.
They vary in so many ways and serve so many different interests that it’s pointless to think that there is one that’s right for everybody. Some people read the National Enquirer, some people read the NY Times while others read USA Today. There may even be people who read all three.
Which is why, in the end, I never even bothered with the step Eric took of actually looking at Avinash’s numbers. Because when you get right down to it, the whole exercise is really just a big waste of time.
Technorati's "authority" ranking is just a prettied up word for an eminently gameable search algorithm. No more and no less. And for anyone who’s already beyond the "first search" stage in web analytics it's pretty much useless and always will be. We all should know better. And thanks to Eric, now we pretty much do.
Because no matter whether the ranking is gamed or perfect, it’s pointless to send a machine to do a man’s work.
Scores for this Post (mostly estimated):
· NGI: 0.014
· Pygmy/BigMe: 0.017
· IPI: Debatable
· BSI: 0 (I think)
· Blather: 1266 (oof)
[Don’t forget to register early for X Change – Like "Top Gun" for Web Analysts! The Early Registration Discount ends July 31st! Use Coupon Code XCForum at http://www.semphonic.com/conf to reserve a spot!]
I like your blog, it’s always fun to come back and check what you have to tell us today.
Posted by: Andria | July 31, 2007 at 12:12 AM
Great post Gary, somehow reminds me of the "don't believe the hype" writing of Stephen Levit in his wonderful book "Freakonmics". Keep blathering.
A brief comment on Technorati: it seems that they are incredibly good at generating SEO placement s in Google from their "search engine" so perhaps we should think of it at republished content rather than search so to speak. The relevancy of their listings as an authority is clearly at fault, however as a collection of bloggy ramblings and photos relevant to any given subject they appear to be the best in the business.
Posted by: Mike P | July 31, 2007 at 02:01 AM
Gary, FWIW I also declined to send Avinash my Feedburner stats and also asked to be excluded from his list. Thanks for the feedback on my post and the interesting perspective on the debate!
Eric T. Peterson
Web Analytics Demystified, Inc.
http://www.webanalyticsdemystified.com
Posted by: Eric T. Peterson | July 31, 2007 at 05:20 AM
Gary, thanks for the plug (I think?) I read every word of your posts!
Posted by: Jim Novo | July 31, 2007 at 05:00 PM
I think you score higher on blather.
;)
Doug
Posted by: Douglas Karr | August 01, 2007 at 08:55 AM
Great post. Thanks for putting those thoughts out there.
Besides the methodological issues, the thing I find the most odious about Technorati is that, based on a single metric which they have the audacity to call "authority", they come up with a ranking. And ranking implies best to worst.
Personally, I just as soon see guys like you, Eric, Avinash, and Novo post your OPINIONS about which blogs you like (and don't, if you'd go that far).
That would be a helluva lot more valuable than any Technorati ranking.
Posted by: Ron Shevlin | August 01, 2007 at 11:17 AM
Great post.
Does this mean that US magazine isn't better than the NY Times?
Posted by: Paul Holstein | August 02, 2007 at 07:41 AM