Weighting ASEC pooled data - accounting for repeating records?


We want to pool ASEC data across years to increase power/sample size when looking at state-level estimates of job characteristics for subgroups by race/ethnicity. However, the pooled dataset will have repeated observations per household (and person in that hh) for consecutive years. We are not matching these repeated observations, but we want to account for these multiple observations in the estimates and SEs. Can the sample stata code for replicate weights provided by IPUMS CPS (below) account for the multiple observations per household that appear in consecutive years of pooled ASEC data?

svyset [iw=wtsupp], jkrweight(repwtp1-repwtp160, multiplier(.025)) ///

vce(jackknife) mse

If not, is there any guidance on stata code that does this?

In the Source and Accuracy documentation for CPS, there is a section that gives a correlation coefficient formula for calculating the SE of an estimate averaged over n years. But we want a 5-year estimate, rather than a 5-year average (that is, an estimate across the pooled years, rather than the average of each single year estimate). So I’m not sure this formula applies to what we are trying to do.

Some papers I found have used every-other year to avoid repeating observations entirely, but ideally we would like to use 5 consecutive years. Others just seem to ignore the issue. Someone from NBER suggested using just months 1-4 of each year, but this would not increase our sample size the way we would with 5 whole years of data. Another NBER source suggested using the Huber option for robust SEs, but I am not sure if this is needed if we are using the replicate weights. If the replicate weights do not account for repeating observations across consecutive years, do you have any suggestions on how to obtain precise SEs?

Thanks for your advice,

The replicate weights do not account for repeated households. I would recommend limiting your analysis to households in their first rotation (i.e. MISH = 1-4), which will remove repeated households from your data. You could then perform your analysis again without removing repeated households to see if including repeated households biased your estimates. Like your NBER source recommend, I have also seen Census documentation that suggests using the Huber option to account for repeated households.

Hope this helps.