Tuesday, July 21, 2009

Surveying: How Big Should Our Sample Be?

How often when conducting or helping clients conduct surveys are we asked to answer the question that plagues all survey methodologists: “How big should our sample be? Whereas the answer, “It depends.” is correct, it is neither informative nor appreciated. So what can we say instead? Well, it does depend, and partly on the statistical tests that you wish to perform. Luckily, there are many web-based calculators available to help us provide a more thoughtful and informed answer. Unfortunately, we often make little use of them.

One of the calculators I have found very useful to help me answer this question is from the Vanderbilt University (Nashville, TN) Department of Biostatistics. Attached below is a link to where it can be found. Once you download it you will see that you can calculate sample sizes by identifying the type of statistical tests that you are most likely to conduct using your survey data. I’ll show you how I used it to calculate the sample size I would need that meets my requirements for conducting t-tests with my survey data.

Link to Calculator:

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

Scenario: I wanted to calculate the sample size I would need to identify whether males’ and females’ mean ratings of the importance of subsidized childcare in our states’ current economic climate were different.

1. I first went to: t-test à Output à and selected Sample Size as that is what I wanted it to calculate.

2. I then chose Independent as my Design because I wanted to ascertain whether there was a significant difference between two different groups (males and females) of respondents.

3. Next I entered an alpha (α) value of .05. Alpha the probability of a Type I error (falsely rejecting the null hypothesis) for a two tailed test.

4. For independent tests power is probability of correctly rejecting the null hypothesis. Traditionally we set power to be .8 so that 80 times out of 100, when there is an effect, we’ll say there is.

5. δ represents the difference in population means for the variable of interest (in my case importance of subsidized childcare). I’ll generally set this difference to be quite small, usually .2 standard deviations, if I have no data on this from previous surveys, literature reviews, etc.

6. σ represents the within-group standard deviation. Again, I may use information from prior studies or may have to provide a best guess. Usually I start by pondering whether I believe it is above 0.5 sd, above 1.0 sd, etc. For this example I have set it at .3

7. Lastly, for independent tests m is the ratio of control to experimental patients. If likely, I will set this at 1, assuming the number of persons in each group is the same. For this survey I assumed more females than males would respond so I set it at 1.2 (meaning I would survey 12 females for every 10 males).

8. Finally I hit calculate.

And here is what I get:

“We are planning a study of a continuous response variable from independent control and experimental subjects with 1.2 control(s) per experimental subject. In a previous study the response within each subject group was normally distributed with standard deviation 0.3. If the true difference in the experimental and control means is 0.2, we will need to study 33 experimental subjects and 40 control subjects to be able to reject the null hypothesis that the population means of the experimental and control groups are equal with probability (power) 0.8. The Type I error probability associated with this test of this null hypothesis is 0.05.”

Thus I know I need, at minimum, 70 persons to respond, in the proportion of 33 males and 40 females. Knowing that the likelihood of a 100% response rate is low, I’ll most likely survey more than 70 persons. Assuming my survey produces a 75% response rate, I will survey (70/.75) persons or a total of 94 persons.

I now know I need to survey 94 persons to account for the fact that not everyone will respond to my survey. But of these 94 persons, how many should be male and how many should be female?

Using Algebra II I know:

M + F = 94 and F/M = 1.2

Thus, M/M + F/M = 94/M leading to 1 + 1.2 = 94/M or 2.2 = 94/M

Cross multiplying I get 2.2M = 94 leading to M = 94/2.2 or 43.7 (43).

I now know I need to survey 43 males and 94-43 or 51 females.

Not so hard to do – and my clients are thrilled!

2 comments:

Lance Bledsoe said...

Great info, and thanks for the detailed walk-thru. I'll plan to look back here the next time I need to estimate a sample size.

Survey Tool said...

It’s always great to visit a helpful and useful post like you did. Ideas like this are amazing; it can be applicable to my marketing. Thanks for sharing this.
http://www.surveytool.com/online-surveys-that-pay/