Hi,
I am conducting an analysis on long-run outcomes. The investigation we are conducting is relatively esoteric, so to increase our sample size, I was wondering whether it is okay to merge two different non-overlapping 5 year datasets (i.e. merge 2005-2009 with the 2010-2014 ACS).
I wanted to ask if this is statistically acceptable or justified (I guess the biggest concern would be with resampling the same household between two different 5 year periods, since the Bureau orders a household to be sampled only once in a five year period). Furthermore, how difficult or cumbersome would it be to standardize variables and what should we watch out for? Is this something that is recommended?
In general this is ok, as long as you’re ok with your data being an average over a ten year period. You may want to use single-year samples and combine them, since you won’t be able to use most of the unique features of the 5-year samples once you combine two of them. If you use one-year samples, you’ll need to divide the weights by 10. I recommend reading the IPUMS USA multi-year ACS page, available here. In particular the Census Bureau paper about combining samples.
I wouldn’t worry about resampling a household. It will be very rare and in any case it is still a representative sample even if a household is sampled twice.
What did you have in mind in terms of standardization? One thing to definitely consider is adjusting income, etc. for inflation. For this purpose, it might be easier to use one-year samples. The multi-year samples have income adjusted to the level of the final year. So if you use 2005-2009 and 2010-2014 samples, be aware that income etc. in the first file is in 2009 dollars and in the second sample it is in 2014 dollars.
1 Like