If I want to use a 10-year ACS file. Which weight should I use?

ACS multi-year files pool 5 years of single-year ACS data at most. If the population I am interested in is really small, is it possible for me to pool ten years of data? What weight should I use? Thanks!

Some researchers do pool data to increase the sample size. This is essentially what the Census Bureau is doing with the 5-year files, combining five 1% files into a 5% sample. It is important to understand, however, that this is now a sample of the 5-year period instead of a 1 year period. Similarly, if you pool two 5-year files to create a 10% sample, you are now sampling from the 10-year period.

It is important to use weights to make the sample representative of the entire population and account for some sampling error, however which weight(s) you decide to use will depend on your analysis. For some additional information about weights, you can refer to this forum thread.

I had an econometrics instructor who used to say, “This is a free country, you can do whatever, but in our discipline, you would have to defend the methodology of what you are doing”. There is little problem of combining the data sets in the software, but you, indeed, won’t have the weights that would work for that pooled data set.

The multi-year data sets (currently, the 5-year data set; there also used to be a 3-year data set that the Census had to drop due to lack of funding) are rather complicated: the samples are pooled by month, the weights are averaged, and then post-stratified to the averaged control totals over the time period (paying attention to the vintage of those control totals). For a 10-year hypothetical product, you will have seam effects when the new Census data are incorporated into the demographic projections. The control totals are some combinations of housing units, group quarters, and population (no real details are given in the methods document). Then there are some adjustments to geography, to the admin records, etc.

There is no freaking chance you’d be able to reproduce that sort of methodology on your own. The best you can do is to divide the annual weights (main + replicate) by 10 (= number of yeas you pool), but you don’t really know what kind of biases you are building into your sample, as you are unable to calibrate it properly to the underlying population.

Also (direct quote), “In addition to the adjustments to the single-year weighting methodology for weighting the multiyear data, there are other steps involved in the multiyear estimation that are not weighting related. These include standardizing definitions of variables, updating the geography for place of work and migration characteristics, and the adjustment of income, value and other dollar amounts for inflation over the period.”

Finally, if the population of your interest is really small, chances are that it would have experienced sweeping changes in the past 10 years, so the weights that peg it to the middle of that period may not do justice to that change.

If you are happy with sweeping all these details under the carpet, then go ahead and pool.