Finite Population Correction

I want to estimate proportions and their confidence intervals for different IPUMS international samples. Most of these samples include a 10% of the population, which is usually considered a relatively large portion of the population. The data sets do not provide a finite population correction (fpc) factor, and I wanted to know which is the usual practice in this case.

Should I include a finite population correction when computing confidence intervals or not?

If yes, should I use the same factor for all households or a different factor for each stratum (especially when households have unequal weights)?

In general, you can use either the household (HHWT) or person (PERWT) level sampling weights available via IPUMS International. This should allow you to calculate relatively accurate confidence intervals around your population estimates. More specifically, the IPUMS International project has this page dedicated to variance estimation with IPUMS International data. The detail listed here should be helpful. Let us know if you have any further questions.

Thanks, Jeff! I’ve read the guidelines on how to estimate the variance with IPUMS International data, and there is nothing written about finite population correction.

However, in the cited paper by Cleveland et al (2011) it says: “A finite population correction factor is necessary to adjust the standard error of the mean or proportion for samples of more than 5% of the total population.” (p. 9)

I am therefore confused. Should I use it or not? If yes, how should I do it?

The answer to your question really depends on the specific details of your analysis. As the Cleveland et al. (2011) paper states (pp. 13):

The IPUMS samples are large, and for the great majority of studies there is little risk of drawing invalid inferences because of underestimated variance. Geographic clustering can lead to overestimated standard errors for a set of variables describing household characteristics, but analysis based on these estimates will be conservative at worst. For studies of weak relationships or small population subgroups, however, there can be risk of misleading estimates of statistical significance. The effects of clustering are of greater concern because underestimated standard errors have the potential to lead to erroneous findings of statistical significance. However, most census research has minimal household clustering because it focuses on particular subpopulations that rarely cluster in households.

Additionally, as is discussed on the IPUMS International variance estimation page:

An alternative, thanks to improvements in the analytical power of modern statistical software, is to incorporate information about sample design into estimation procedures. All major statistical software programs, including SAS, Stata, SPSS, and R, now allow researchers to specify basic elements of complex sample design. These programs make use of Taylor Series linearization to adjust variance estimates and tests of statistical significance. IPUMS users can specify the household identifier (SERIAL) as the cluster variable (or primary sampling unit) for any analysis that might be influenced by household clustering, and can also specify the weight variable (PERWT) to account for the effects of heterogeneous sample weights. The IPUMS staff is developing a cluster variable that will offer the potential for more refined variance estimates. The new variable will identify geographic clustering as well as household clustering.

As of September 2015, we have added a new variable to aid in accounting for the effects of stratification on sample variance. As discussed above, stratification improves the precision of samples, and findings of statistical significance without adjustments for stratification will be conservative. Accordingly, adjusting for stratification effects is of less concern than adjusting for clustering. The new STRATA variable includes information about explicit strata whenever such information is available, and includes geographic pseudo-strata for systematic samples following the procedure described in Davern et al. (2009).

So, if you want to ensure that your standard errors are calculated as accurately as possible, you should incorporate sampling weights (as discussed above), use the household identifier (SERIAL) as the cluster variable, and use the STRATA variable to identify stratification. You can incorporate these variables in the survey set up options in most statistical software. However, in most cases, these procedures will not make too much of a difference.

Jeff, thanks again for your prompt reply! I believe however that my question addresses a different point.

I am incorporating the effects of sampling weights, cluster structure and stratification when computing standard errors and confidence intervals. I am adjusting the estimation for these features of the sampling design following IPUMS International guidelines.

My question is whether, on top of doing this, I should adjust my standard errors for a finite population correction factor. It is in this particular aspect that I do not find a ‘consensus’. Both the IPUMS notes and Cleveland et al. (2011) paper refer to sampling weights, clustering and stratification. I am okay with that. But Cleveland et al. (2011) also include a finite population correction that is not mentioned in IPUMS notes.

I am therefore wondering what is the usual and most accepted practice.

The answer to this specific question really depends on the priorities and objectives of your research. Most guides will say that if sampling is done without replacement and if the sample fraction is greater than 5%, a population correction is necessary. This is true because the central limit theorem does not hold for large sample densities and estimates will be large. Most IPUMS census samples meet those criteria: most are 10% samples taken without replacement. That said, census samples in IPUMS International are almost all very large, and the precision gained by correcting with finite population correction is likely not necessary. For studies of smaller sub-populations and or rarer phenomena, it could be more useful.

On top of all of this is a recent movement across the social sciences to not place too much emphasis on statistical significance, without an at least equal emphasis on social or economic significance. The March issue of The American Statistician discusses this detail (see the introduction to the issue here). Additionally the journal Nature recently published a note, cosigned by a long list of scientists from across many academic disciplines, discussing the perceived overemphasis on statistical significance.

Although I cannot give you a certain recommendation for whether you should or shouldn’t use a finite population correction, I hope these resources help.