I have been working with the 5-year data sample for about two years with tremendous success. Now, some of our partner organizations have asked us for information on how our variables have changed over time. What is the recommended method for combining data sets from multiple years (i.e. 2000 - 2020) without creating a massive data extract of six gigabytes?
There are a few ways to work with large numbers of samples if you have limited computing power or storage space.
You can limit the variables in your sample to only what you use to keep the file size manageable. If you are studying only a subset of observations, for example, a few states or a specific age range, you can use the select cases feature to limit your extract to the population of interest. You can also break your extracts down by sample, creating multiple extracts each containing a few years of data; however, I understand your goal is to study change over time, so this may not be the best approach for your research. If it’s not necessary for your research to include every year in the timespan between 2000-2020, you could also select one ACS 1-year sample from each five-year range you are looking at rather than using the 5-year ACS samples to cover every year. This will create a dataset that still allows for studying change over time but has fewer observations overall.
If you have adequate computing power but insufficient computer storage to hold a very large file, you may consider storing your data on an external hard drive.
Note that it is not recommended to compare multi-year ACS datasets with overlapping periods.
Thank you for this! I will look into using the 1-year samples from multiple years to create a more manageable extract.