Collapse Census and ACS at the same time - weighting

Dear all

I would greatly appreciate your help on how to use the weights when collapsing over multiple samples.

I am trying to collapse Census and ACS files from 1950 - today at the BPLD-by-state level. First, I am trying to calculate the shares of immigrants from (birthplace) country c in state s for various time frames, e.g. 1950-1960, 2010-2020, etc. Second, I also would like to calculate the averages of various variables, such as INCINVST.

Would it be reasonable to specify perwt as the weight and simply run the collapse or are there any potential pitfalls associated with this, in particular when using multiple samples from both the Census and ACS? How would I deal with cases where perwt was set to zero?

This would be the sandbox example:

collapse (mean) INCINVST [pw=perwt], by(statefip bpld)

Thank you in advance for your help!

In general when pooling multiple samples, I would say there are two main things to be aware of:

  1. changes in universe across samples (which is related to the 0 weights). You’d generally need to drop individuals in some of your samples in order to maintain a consistent universe.

  2. the potentially changing population size across samples. If you want to weight each sample equally, you’ll need to adjust the weights so their sum in each subsample is equal. This is usually only an issue for longer time horizons.

Once you’ve decided how to deal with these issues, your example looks correct. This Census Bureau report is a helpful summary of the issues that arise in interpretation of multi-sample data.