Sampling

Authors

Summary

This section details the process of constructing a study sample. It covers sampling methods, practical steps for creating sampling frames, and sample code. See the randomization resource for details on how to randomize units within the sample to treatment and comparison groups.

Overview

Random assignment of units to treatment and control helps to get an unbiased estimate of the average treatment effect (ATE) within the sample and thus supports internal validity, i.e., the degree to which the estimated effect reflects the causal relationship between treatment and outcome. However, random assignment does not determine a study’s external validity, i.e., the degree to which the estimated effect is generalizable to the full population from which the sample is selected. Maximizing external validity is particularly important in cases where there are plans to scale up a given program to the larger population if it is found to be effective within the sample. Note, however, that maximizing external validity through sampling is not sufficient to ensure that the estimated effects will generalize to the larger population or to other populations (see Muralidharan and Niehaus 2017).

External validity depends on how well the sample selected for a study represents the broader population of interest—here meaning the population eligible for the program or intervention—and, consequently, on how the study sample was selected from the underlying population. In this resource, we will cover two different ways of selecting a sample: randomized samples and convenience samples, as well as different methods of determining the list from which to draw observational units—also known as the sampling frame.

Randomized samples

Creating a sample representative of an underlying population involves 1) creating a sampling frame representative of or identical to the target population, and 2) randomly drawing units from that frame for the study sample.

The sampling frame

The sampling frame is a list of units from which the sample will be drawn. A randomly drawn sample from a sampling frame will—if large enough—be representative of the sampling frame, but if the sampling frame itself is not representative of the underlying population the study sample will also not be representative of the underlying population. Thus, ideally, the sampling frame is identical to your underlying population of interest (i.e., consists of all units eligible for the program). However, it will not always be possible to obtain a list of the entire underlying population of interest—for example, you may not have access to the list of all households in hard-to-reach villages of a district, or if randomization happens upon arrival (see the randomization resource) you might not know the sampling frame before the randomization is conducted.

As the sampling frame has implications for the external validity of the study, it is important to understand exactly how the sampling frame was constructed. It would also be worth trying to create a sampling frame identical to or representative of the target population when possible.

Pre-existing lists

The ideal situation is one where the sampling frame is given by a pre-existing list, say, from the government or an NGO. If using such a list, be sure to understand how it was created (to better understand whether it is, in fact, representative of the larger target population) and when it was last updated.

Sources of pre-existing lists include:

A list of respondents already created by the partner organization.
Administrative data (such as a list of hospital patients, students in local schools, etc.).
Local government institutions such as the registry office, national agencies such as ministries, professional organizations, or agencies and institutions such as hospitals.

If using a pre-existing list, note that the definition of the observational unit is not always unambiguous; for example, “a business” or “a household” can mean different things in various settings. It is therefore important to clearly define the observational unit and then to follow up with a couple of households (or businesses, etc.) to check the reliability of the list for your use.

Creating a sampling frame

Sometimes no list exists, which can be the case for populations where creating a comprehensive list would be difficult, such as business clients, users of a specific service, or households within a region. This may also apply to populations like undocumented immigrants, migrant workers, or informal businesses. Options for creating a sampling frame without a pre-existing list include:

Following a standardized procedure: For example, the sampling frame could consist of every patient visiting the emergency room in a given period, random digit dialing, or shoe leather sampling, in which a person manually (or using satellite imaging) identifies households based on a randomized distance from randomly chosen points.¹
Conducting a community meeting to map all of the households (or businesses, or other units) in the area. Note that this method may miss units and therefore may not be appropriate in all areas, for example in larger towns or when populations are mobile.
Door-to-door census listing: A census can be expensive and time-consuming but may be the only option. The time required depends on how spread out the units are in the enumeration area, how much information is collected on each unit, and whether there are any administrative hurdles, e.g., permission from local leaders to work in the area. If doing a door-to-door listing exercise, keep the following points in mind:
- Using the same survey team for the census and the actual data collection can make it easier for enumerators to find households later, but be sure to budget enough time to complete the listing exercise so that the survey start is not delayed.
- Drawing a map of the area with landmarks and zone divisions can help in allocating households to enumerators for surveying.
- Taking GPS readings during the household listing can help enumerators find respondents later. It can also be used to map out the area e.g., to ensure that there are clear boundaries between villages or other cluster units.

Regardless of the method for creating a sampling frame, it is helpful to:

Collect sufficient contact information so that respondents can be found later. This includes verified phone numbers of the respondent and family members (e.g., the listing team can call the number on the spot to verify), names and potentially nicknames of the household head and spouse, etc.
Collect information to verify membership in the target population (e.g., eligibility for the tested program) as well as any variables needed for stratification. For example, a study that targets adult women will need to collect gender and age information on all household members and may need to collect information on household income in order to stratify on this variable.
When some of the target population is illiterate, consider adding a unique way to verify the identity of the household that does not depend on the spelling of names. This could be a token with intrinsic value that has a unique household ID on it. Alternatively, ask for national IDs (when available) to confirm the spelling of names, though note that being perceived as linked to the government is not always advantageous.

Multi-frame designs

Research teams may have access to multiple sources of eligible respondents (e.g., lists of customers from different mobile phone companies). If none of the lists are large enough on their own, one option is to pool the lists together to create one sampling frame. Key advantages of this approach include:

Increasing the potential sample size (especially for groups of interest).
Lowering the cost of sampling if particular frames are expensive to sample (and can be substituted for sampling frames that are less expensive to access) (Lohr & Rao 2006).
Validating across lists (e.g., using a shorter, high quality list to validate a longer list of possibly lower quality).

If pooling multiple frames, it is important to identify ex ante whether respondents appear on multiple lists. This is often done by including questions in the survey that identify any of the possible frames to which the respondent belongs––see guidance from the World Bank under the Guidelines on Sampling Design section for more information. Additionally, it is important to document the process of pooling sampling frames so sample weights can be constructed appropriately (Wu 2008). For a discussion of how to calculate weights in a multi-frame design, see Lohr & Rao (2006).²

Assigning units to the study sample

Once the sampling frame is determined, from a conceptual standpoint, assigning units from the sampling frame to the RCT sample works exactly the same way as assigning units to treatment and control groups. One possibility is to use simple randomization where each unit will be randomly assigned to the treatment or control group through a method such as flipping a coin, drawing from an urn with red/black balls with replacement, or, preferably, using statistical software. In most cases, permutation sampling is used, in which a sample of fixed size (or a given share of the sampling frame) will be randomly assigned to the treatment and control groups, e.g., by drawing red/black balls from an urn without replacement, or similarly using statistical software. The “randomization methods” section of the randomization resource covers these methods in greater detail.

Heterogeneous treatment effects and stratified sampling

As mentioned above, the main reason for randomly sampling units from the full population of interest is to maximize external validity of the ATE. However, although selecting a sample using randomization ensures that the sample will be representative of the underlying population in expectation and the treatment and control groups will be comparable in expectation, randomization does not ensure that this happens for each sample due to sampling variance. This uncertainty is particularly relevant if there is reason to expect that the treatment effects may be unit-specific and differ across groups (i.e., if there are heterogeneous treatment effects).

Imagine, for example, that you expect that the treatment effect for a given program will be larger for women than for men. In that case, external validity would be challenged if the sample included more men than the underlying population, just as the internal validity would be challenged if the treatment group included more men than the control group. One way to address this challenge is by using proportionate stratified sampling, in which a fixed share from each population stratum is allocated to the RCT sample, where the share of the stratum in the sample is proportional to its share in the population. Sampling should be stratified based on observed covariates that are expected to moderate the treatment effect.

Similar to how stratifying at the treatment assignment stage creates treatment and control groups that are more similar to each other, stratifying at the sampling stage creates a sample that is on average more similar to the underlying population (if stratified sampling is proportionate), which reduces the variance of the sample relative to the underlying population. Proportionate stratified sampling can ensure that individuals or groups are not inadvertently underrepresented in the sample or in one study group due to random sampling. However, if the stratum is small overall, employing proportional stratified randomization may not be sufficient to ensure that groups can be compared to each other (see also the power calculation resource).

If there are constraints on the total sample size, or if individuals/groups with the characteristics of interest occur with relatively low frequency in the population, the researcher may decide to use disproportionate stratified sampling (i.e., the frequency of the strata of interest in the sample is by design not proportional to their representation in the underlying population) and focus on the statistical power for estimating stratum-specific treatment effects. If the researcher knows how sampling was conducted, sampling weights should be used to help obtain (closer to) externally valid treatment effect estimates. This however, as Muralidharan and Niehaus (2017) note, does not fully resolve the lack of representativeness.

Multi-stage sampling

Multi-stage sampling involves multiple rounds of sampling at increasingly smaller levels of aggregation to draw the study sample. For example, one could obtain a representative sample of households with children under 5 in a country by sampling at the district, then village, then household level (3 stages). This approach is commonly employed with nationally representative surveys, often with disproportionate stratified sampling to ensure sufficient representation of certain types of groups (e.g. female-headed households). Multi-stage sampling is conducted using the same general approach as described above (create a sampling frame and assign units within the sampling frame to the study sample) at each stage. As above, for the study sample to be representative of the underlying population, each sampling frame should be identical to or representative of the population, and each sample should be randomly drawn from the frame.

The procedure should be carefully documented, as sample weights need to be calculated and used in the analysis for the results to generalize to the underlying population. See the data analysis resource for more.

Convenience samples

In some cases, random sampling from the population of interest is not practical or even possible (for example, if you are unable to obtain a census of the study population and hence cannot construct a representative sampling frame). So-called “convenience samples” may be selected for logistical, cost, or other external reasons, rather than because they are representative of an underlying population of interest. This could be, for example, all households in a given city or area, everyone who signed up for the experiment on an online survey platform, or a set of schools that were already scheduled for the next roll-out wave of a program.

It is worth noting that “convenience” samples are by far the most common in the social sciences. For example, most if not all laboratory experiments in economics are conducted with convenience samples (often students at the university), or an NGO may not find it logistically feasible to test out a teacher training program in very remote schools. However, “convenience” can be a bit of a misnomer, as these designs are not always chosen purely for convenience or to reduce costs. While significant cost and information constraints on the part of the researcher or the partner organization can play a role, in field experiments, important ethical or legal considerations also often determine the sample, or at least restrict the population from which the sample can be selected.

For example, there may be ethical reasons to not withhold a treatment from a control group who would otherwise be eligible to receive it (such as the set of schools already scheduled for the next roll-out wave of a program in the example above). In these instances, designs such as “randomization at the margin” or “randomized phase-in” may be more appropriate. In these cases, the sampling frame involves a (non-random) population that is not yet eligible for the treatment and will later be randomized into treatment and control. Using one of these designs may be the only way to ethically conduct a randomized experiment (e.g., it would be unethical to randomize access to entitlements, so an encouragement design can be used as an alternative). These alternative designs require greater care in drawing inferences for the whole population of interest. The same holds true for the case where political constraints or the implementing organization’s priorities (e.g., regional focus) constrain the population from which the sample can be drawn. See below for some ways to improve external validity with convenience samples. See also How to Randomize and the discussions in Glennerster & Takavarasha (2013), Duflo et al. (2007), and Heard et al. (2017).

Improving external validity with convenience samples

There are a few things the researcher can do to alleviate constraints that arise when (full) random sampling is not possible, with consequences for external validity by extension:

As much as possible, document criteria used to select the population. For example, if cost constraints mean that surveyors can visit no more than three villages within a day’s travel from the capital, describe how these villages were chosen.
Measure key characteristics of the sample population, such as wealth and income levels, demographics, and other covariates that are likely to influence outcome levels. You can compare such descriptive statistics of the sample population with other populations of interest (e.g., national population or potential intervention target population in other countries or regions) to help characterize the uniqueness or generalizability of the study sample.
Within the constrained sampling population, use random sampling to select experimental units. For example, if household income is a cutoff for program eligibility and the study employs randomization at the margin to estimate the program’s impact (i.e., the income cutoff is raised), randomly selecting the study sample from newly eligible households will ensure the sample is representative of the sampling frame (though not the underlying population of all households).
When possible, try to link the convenience sample with a representative sample of the underlying population. For example, in a study on microcredit access, Crepon et al. (2015) combine a convenience sample of households likely to borrow with a larger, random sample of households in the study villages, to capture results representative at the village level. See also the World Bank’s Guidelines on Sampling Design for high frequency mobile phone surveys for more.
If you know how sampling was conducted, consider using sampling weights in the analysis by reweighting the estimates by the inverse of using the probability of a household being sampled. See the data analysis resource for more details.

In combination, these measures may facilitate generalizability down the line, for example, by making it easier to combine convenience samples from different studies in a meta-analysis to construct population ATE estimates.

Sample code

In Stata, the sample command randomly selects units, without replacement, from the sampling frame. The default is that units are chosen with an equal probability of selection, but the command can accommodate stratification and different probabilities of selection.

As with random treatment assignment, the procedure should be verifiable, replicable, and stable. More details are provided in the sample code section of the randomization resource, but, as with random assignment, the basic procedure is as follows:

Create a file that contains only one entry per sampling unit (e.g., one line per household, one line per cluster). This might mean creating a new file that temporarily drops all but one observational unit per cluster.
Sort this file in a replicable and stable way (use stable sort in Stata, i.e., sort varlist, stable).
Set the seed for the random number generator (in Stata, set seed). Make sure that the seed is:
1. Preserved: Some operations (such as preserve/restore in Stata) erase the seed, and then any random number sequence following that is not determined by the seed anymore and therefore not replicable.
2. Used only once across parallel operations: Every time the same seed is set, the exact same random number sequence will be produced afterward. This is more often a concern with random assignment–for example, if assigning daily batches to treatment arms–but is theoretically also a concern in random sampling.
Randomly assign units to the study sample, then merge the random assignment back with the original file to obtain a list of all observational units with assignments.

Sample commands for clustered sampling and stratified sampling are below:

Clustered sampling: When the cluster is the sampling unit, the procedure involves separating the population into clusters/groups (e.g., villages or schools), randomly drawing a sample of clusters, and sampling units in the cluster (either all or a random sample of them). For example, in Stata, using school_id as the cluster variable:

set seed 821771239
sort school_id, stable

* treat each cluster as a single observation and drop duplicates 
duplicates drop school_id 
sample x // x denotes % of clusters in the population to select
sort school_id

* merge sampled clusters back with original list
merge 1:m school_id using “original_dataset”

Stratified sampling can be proportionate or disproportionate:
- Proportionate stratification: The share of each stratum in the sample is proportionate, ensuring that the sample is representative of the overall population and especially small groups are not undersampled. For example, if widowed households comprise 5% of the population, then they comprise 5% of the sample. In Stata, this can be done as follows:

by stratum: sample x
/* strata denotes the (categorical) stratifying variable 
(e.g., widow) and x denotes the percent to be sampled 
within each stratum. */

by stratum: sample x, count
/* to draw x rather than x% from each stratum, specify 
the count option */

Disproportionate stratification: The strata sample is not proportionate to the strata population (e.g., widowed HHs comprise only 5% of the population but 20% of the sample). It is useful when you want to ensure sufficient power to detect heterogeneous treatment effects by stratum. In Stata, this can be done as follows:

sample x if stratum==1
sample y if stratum==2

/* x denotes the fraction of the sample that will be 
comprised of respondents from stratum 1 and y the fraction 
of the sample comprised of respondents from stratum 2 */

See the experimental design description in Blimpo & Dower (2019).

See the data analysis resource for more information on when and how to use weights in analysis.

Additional Resources

Heard, Kenya, Elisabeth O’Toole, Rohit Naimpally, and Lindsey Bressler. 2017. “Real World Challenges to Randomization and Their Solutions.” https://www.povertyactionlab.org/sites/default/files/research-resources/2017.04.14-Real-World-Challenges-to-Randomization-and-Their-Solutions.pdf
StataCorp LLC. 2023. Stata Base Reference Manual: Release 18. College Station, TX: Stata Press. https://www.stata.com/manuals/

References

Blimpo, Moussa. 2019. “Asymmetry in Civic Information: An Experiment on Tax Incidence among SMEs in Togo.” https://doi.org/10.1257/rct.4394-1.0.

Crépon, Bruno, Florencia Devoto, Esther Duflo, and William Parienté. 2015. "Estimating the Impact of Microcredit on Those Who Take It Up: Evidence from a Randomized Experiment in Morocco." American Economic Journal: Applied Economics, 7 (1): 123–50. doi: 10.1257/app.20130535

Duﬂo, Esther, Rachel Glennerster, and Michael Kremer. 2007. “Using Randomization in Development Economics Research: A Toolkit.”

Glennerster, Rachel, and Kudzai Takavarasha. 2013. Running Randomized Evaluations: A Practical Guide. Princeton, NJ: Princeton University Press. http://runningres.com/chapter-four.

Lohr, Sharon, and J. N. K. Rao. 2006. “Estimation in Multiple-Frame Surveys.” Journal of the American Statistical Association 101 (475): 1019–30. www.jstor.org/stable/27590779

Muralidharan, Karthik, and Paul Niehaus. 2017. “Experimentation at Scale.” Journal of Economic Perspectives 31 (4): 103–24. https://doi.org/10.1257/jep.31.4.103.

World Bank. 2020. Guidelines on Sampling Design. Washington, D.C.: World Bank. https://documents1.worldbank.org/curated/en/742581588695955271/pdf/Guidelines-on-Sampling-Design.pdf.

Wu, Changbao. “Multiple-frame Sampling.” In Encyclopedia of Survey Research Methods,edited by Paul J. Lavrakas, 488-489. California: SAGE Publications, Inc., 2008. http://dx.doi.org/10.4135/9781412963947.

Research Resources