Between you and me, many researchers feel uncertain about getting sampling right. They know that sampling has a substantial impact on the overall quality of a project, but they often get confused by the contradictory arguments of the scientific community. Some say that only random sampling is statistically valid, while others say that random samples don’t exist in social sciences. Obviously, this article can not solve the general dispute, but we can give some orientation with the main sampling techniques.
Generally speaking, all samples aim to be exemplary for a broader context. In qualitative research, the cases should be typical for the topic, while quantitative researchers strive for samples that allow them to draw dependable conclusions on the corresponding population. In either way, the selected cases mustn’t be too specific to make sure you can still generalize the insights to a broader context.
So let’s begin our journey to sampling by having a look at the alternative: doing a full survey. In order to understand your target population, you would include data points from everyone into your analysis. Obviously, collecting this data will become more challenging the bigger the population is. But also data management and analysis can cause severe headaches, as concepts like statistical significance don’t necessarily mean a thing. Even data-prone companies like Google draw samples from their big data sets to keep them manageable.
“If the extracted data is still inconveniently large, it is often possible to select a subsample for statistical analysis. At Google, for example, I have found that random samples on the order of 0.1 percent work fine for analysis of business data.”
Hal. R. Varian
So, let’s talk about sampling: How should you select the right cases?
Probability sampling includes a couple of different approaches. They all have in common, that the elements in your sample must have a known probability of being selected. This allows you to project conclusions from the sample to the corresponding population.
The simplest solution is random sampling. Since you draw the sample by chance, you can be sure that there is no systematic bias in it. However, you need to get a complete overview over all members of your basic population (“sampling frame”) before you can randomly select your sample. In most cases, this is not feasible. And even if it is, you should be aware that random samples are vulnerable to random sampling errors (surprise!). This is one of the reasons why results should always be replicated before deemed true.
Systematic Random means that you select every nth person in a flow of people, for example every 10th visitor of a website. This method can be a very practical choice, but it also has its pitfalls. If the flow of people doesn’t stand for the population you are looking for or if the flow has unnoticed systematic patterns, you will not be able to generalize the insights. Even worse, as this effect is hard to address, you won’t be able to quantify the accuracy of your research.
We will leave it with these two examples, even though there are more sophisticated approaches of probability sampling (e.g. Stratified Sampling, Cluster Sampling, Multi-Stage Sampling). It should have become clear though, that probability sampling can be challenging in research practice. In theory, you can’t generalize without a probability sample. But even if you have one, you’re not safe from having a biased sample. And in most cases, probabilistic sampling will be a rather costly and work-intensive option.
As probability sampling can be expensive and inefficient, researchers have always looked for practicable alternatives. This is where non-probability sampling comes into play. All these methods have in common, that the probability of being in the sample is not known, but apart from that, their approaches can be really different. And admittedly, there are so many inadequate ones that the general perception of non-probability is pretty bad.
Whether it’s convenience sampling, self-selection sampling or snowball sampling, the main concern here is that the corresponding bias cannot be addressed. What’s the population, these samples can be regarded representative for? It’s very likely that these samples just stand for themselves and don’t allow you to generalize the insights. In other words, they are not samples in the proper sense.
However, there are also reputable methods of non-probability sampling, that try to address the sampling bias with scientific arguments. One of them is Quota Sampling. As a prerequisite, you need to know the demographics of the basic population to determine quotas based on the qualities you’re looking for.
Especially if you worry about sample failures, you should consider quotas. A typical problem with opinion polls is getting low response rates. You can draw perfect random samples at the beginning of the research process, but your data will be skewed if you get only partial feedback. Assuming that non-respondents may significantly differ from respondents, you will not be able to project your results to the general population. You can prevent these cases by making sure that you have enough interviews for all subgroups of your target group. So, all in all, quota sampling can contribute to data quality.
WHICH SAMPLING APPROACH WORKS BEST?
As we have seen in previous posts, data quality can mean a lot of things: You can keep an eye on the costs or define quality by the timeliness of data collection. You can focus on statistical validity or comparability to previous studies. This is why there is no general answer to the question. Most approaches have their right to exist, but it depends on the context in which they are used.