In Russ Rueden’s X Change Huddle on Survey and Web Analytics Integration, the topic of online survey sample size came up. It’s a big issue because many organizations find themselves deploying almost as many different surveys as tags and they don’t want to suffer from what I’d call “uncertainty principle syndrome” – damaging the user experience that they’re trying to measure.
But how many respondents are really enough?
There are two schools of thought about sample size – one school holds that as long as a survey is representative, a relatively small sample size is adequate. Perhaps 300-500 respondents can work. The other point of view is that while maintaining a representative sample is essential, having a large sample size is almost as important.
This is a big issue because it impacts all sorts of decisions including the length of your survey, your collection mechanism, and, of course, you’re sampling rate.
So what’s the right answer?
If you aren’t integrating your survey data with web behavioral data, then a relatively small sample size might be okay (I’d emphasize the might). But if you want to combine behavioral analysis and survey data, then forget a sample of 300 or 500 respondents.
Those numbers simply won’t work.
Let me give you a real-world example showing why that’s true. We’re working right now with a client that samples more than 1000 site visitors a month for their satisfaction survey. They asked us to do a study of the impact of using one of two alternative tools on their site on both overall site satisfaction and visit accomplishment.
On this site, the tools are used in about 10% of visits. Since the site gets more than 10 million visits a month, that still yields a heckuva lot of behavior to study – more than 1 million tool visits every month. No problem there.
But our representative sample only captured about 100 respondents who’d used either tool.
Between the two tools, one served about 70% of the queries. So for the second tool, we had about 30 respondents to deal with. Getting the picture?
For our analysis, we wanted to track visit reason vs. satisfaction vs. outcomes for tool users. With some visit reasons only accounting for about 10% of visits, there were cases where we were trying to analyze the outcomes for all of 3 visitors. Sorry – not possible.
And that’s with a survey size above 1000 and a simple cross-tabulation of visit intent and one fairly common behavior. Sure, we could add lots more months to the picture. But tracking behavior over extended periods of time adds all sorts of complications to the analysis. The combination of seasonality, site change and macro-economic change make this dangerous. Very few of our client sites remain constant for six months.
If doing behavioral analysis with 1000 survey respondents is challenging, imagine what it would be like with a sample size of 300. Impossible. I don’t believe that with a 300 sample size there is a single site behavior you could cross-tabulate with any survey variable and still have a significant population.
So what’s the right answer?
If I had my druthers, I’d recommend that large volume sites strive for a much higher sample size – 15K would be nice on a monthly basis. With the old constraints of cost gone, there’s really no reason not to go for as large a sample as your site volume will support without impacting your user experience.
Obviously, this upper-limit depends on your site volume and your take-up and completion rates.
You can’t do much about volume, but if your survey length is impacting your take-ups or completion rates, then I’d be willing to sacrifice a whole bunch of questions to get to the increased size. The fact is that on many 30-40 question surveys, we’d only expect to use at most 5-6 of those questions in a behavioral analysis. I’d bet even money that your analysts feel the same way and that a heavy majority of questions on many long surveys hardly ever get studied at all.
At some point you may have to make a decision: do you want a whole lot of really shallow information or do you actually want to do analysis on a narrower set of data?
When it comes to behavioral analysis combined with survey integration, the right answer is pretty obvious. A representative sample is essential, but size really does matter. Let yourself get talked into a 300 person sample, and you might as well throw all that work you did to integrate online survey data with behavioral data in the junk pile.
[Just a last reminder - I'm doing a free webinar on web analytics and realtime customer event marketing this Wednesday. Register here!]

Great post Gary. Putting aside the question of whether you need to break down survey data by behavioral data, here's one way to look at it. How often does Management get together to make decisions based on the customer feedback? Whatever frequency decisions need to be made is what needs to determine sample size.
Let me explain... If the survey data is reviewed every month but it really takes a whole year’s worth of data (looking year over year, let’s say) to get to statistical significance on the differences, then you obviously need more data. It doesn't help to have a bunch of Executives sitting around and pouring over fluctuations in data that are due to common cause variation vs an actual change in the customer sentiment. So you need to either increase the sample size or else reduce the frequency at which you analyze the data.
To the question of integrating survey data with behavioral data, you're absolutely right. You need a TON of data. And in my Web Analytics Benchmarking study last year, two-thirds stated they had not built the capability. http://bit.ly/3CLQUp At Intuit, I've run website optimization experiments (a/b tests) where part of the success measures were customer sentiment scores. You need to carefully plan out ahead of time whether or not you'll be able to detect a difference based on your sample sizes.
If you determine you won't likely reach statistical significance but you want to go ahead anyway because some data is better than no data, that’s okay too if everyone is on board with the methodology. And if the customer sentiment needs to be manually categorized in any way, make sure that is done in a “double-blind” sort of way. This means that the person doing the categorization has no idea which comments are in which test cells when they are doing their scoring.
Cheers,
Jared Waxman
Posted by: Jared Waxman | September 20, 2009 at 02:10 PM