My Photo


  • Clicky Web Analytics

Your email address:

Powered by FeedBlitz

« Social Media Measurement and Analytics | Main | PR & Social Media Measurement »



Marshall Sponder sent me your post because we had had a discussion about sampling and data accuracy. While I agree an organization needs to be pay attention to not unintentionally skew their analysis when analyzing a subset of data, I disagree that all monitoring tools are unable to create a robust and precise sample of data. Yes, if you rely strictly on using keyword or boolean expressions then the process to exclude/include ontopic conversations becomes quite brittle and ineffective. However, at Collective Intellect, we rely on semantic technology to create robust filters to collect and organize our sample datasets. In other words, our technology does not rely on someone knowing every combination or variation on a term; our engine is able to contextually understand, recognizing the difference between crocs (the reptile) and Crocs (the shoes). We've tested our categorization accuracy and have achieved quite precise results. You can read more here , if you are interested.

Thanks for posting about this topic, it's a good one. If you'd like to chat more, please drop me a line.


We've actually used CI with some of our clients and while I have some reservations about the flexibility of the classification capabilities it's certainly better than boolean keyword based systems.

I'm not sure, however, that even extremely precise categorization will solve all the sampling issues I discussed. It's extremely easy to bias a sample even with perfect categorization - simply by missing some concept that's also relevant. Nor does this really address sampling issues at the sourcing level. I remain unconvinced that the proper function of Social Media Measurement is to create an accurate sample for customer research purposes (of the sort implied by a commitment to brand sentiment tracking) and very skeptical that if such is the intent that it is actually possible.

Of course, some of my remarks on Samples were actually directed more toward systems that rely on human readership for classification and I'm guessing you'd be more inclined to agree with me here. If (and I assume you don't) you believe that human readership is necessary to accurate classification then it stands to reason that you can't use a machine-learning system to build your sample. If the machine-learning system builds an accurate sample, there would be no need for a human reader! In point of fact, most human-reader systems do use a sample method based on keywords, boolean logic or other method (such as influence) which significantly distort the population and may sacrifice any of the attendant benefits of improved classification...


The comments to this entry are closed.