Sampling for longitudinal analysis

I am interested in tracking the changes in MIGRATE1 responses across race and income for all the PUMAs or available counties in one state. I am interested in changes by decade but also year to year.

I am confused about which samples to choose from for year-to-year comparison. I understand that standard errors are smaller in 5-year collections because they pool five years, does this mean that I’m better off using 1-year samples if I want 2017, 2016, 2015,…,2011 estimates? Or that for better accuracy, I might as well forget about year-to-year changes and just use non-overlapping 5-year sets 2008-2012 and 2013-2017?

What about mixing the length of collected samples, using 1-year files, then 3-year files? Since each sample contains weights and error, would it be too detrimental to use varying lengths of samples?

Thank you!

This choice really depends on some specific details of your given project and research question. If you are looking at a small slice of the sample (i.e., in terms of geographic location, occupation, or some demographic characteristics) then, the advantages of a larger sample size in the pooled multi-year files will likely be worthwhile. If, on the other hand, you do not have much concern about running into any issues associated with a small sample size (i.e., large standard errors), then the advantages of using the single year files will likely be worthwhile. In general, although I’d recommend not mixing the length of years in files, it is difficult to give any broad recommendations. If you have additional specific questions about your research that relate to IPUMS data, feel free to send an email to