Generating state-level estimates using YEAR in multi-year ACS samples

I am using ACS 5-year samples to create yearly, state-level estimates of the state population by race. My question is about how to best assign a single year for these estimates. I understand SAMPLE identifies the IPUMS sample from which the case is drawn. For the multi-year ACS samples, YEAR indicates the last year of data included. For example, when browsing the data, the 2016 YEAR variable list the 2012-2016, ACS 5-year for SAMPLE. As I understand, this means the 2016 YEAR sample uses data from 2012, 2013, 2014, 2015, and 2016 (rather than a moving average like 2014, 2015, 2016, 2017, and 2018).

When I collapse the data by state and year, is the YEAR variable the correct variable to use with multi-year samples? In other words, if I want population estimates for 2016 by state, is collapsing the 2016 ACS 5yr sample (as selected below) the best approach?

MULTYEAR is the variable used to identify the specific year of survey in multi-year ACS/PRCS samples, whereas YEAR reports the last year of data included in these multi-year samples. In the 2016 5-year sample, all observations have YEAR = 2016, but have MULTYEAR values ranging between 2012-2016. Both variables come preselected in all multi-year ACS/PRCS data extracts. To obtain annual population estimates for each year from 2012-2016 in the 2016 ACS 5-year sample, you will want to collapse the data by MULTYEAR and then multiply by five (since the weights in the 5-year sample come scaled by a factor of five). Collapsing the data by YEAR would instead give you an estimate of the total population averaged over the 2012-2016 period (no requirement to multiply by five in this case)

Thank you, Ivan. To clarify, when collapsing by MULTYEAR, do I lose the averaging (or “smoothing”) work of the ACS 5-year samples? I want to keep the 5-year averages to create my yearly assigned estimates. However, I want to be sure the year I assign to those estimates fits best practices, because it is not intuitive to me to assign the year by the end year of the 5 year estimate period. Rather, my instinct is to use the middle year (e.g., a 5 year average of 2012-2016 would be assigned to 2014).

I plan to repeat this process for multiple 5-year samples. When doing so and collapsing by MULTYEAR, I end up with overlapping estimates for each year (e.g., the 2016 and the 2015 5-year ACS sample each include 2012, 2013, 2014, & 2015). It is not clear to me how to determine which of these 2012 estimates, for example, to assign to 2012. Or should they all be the same (i.e., the 2012 cases in each 5-year sample will be the same cases)?

The ACS 5-year samples are reweighted or “smoothed” when compared to estimates from the 1-year samples. The 2022 5-year Accuracy of the Data report explains that the pooled 5-year data are reweighed using the procedures developed for the 1-year estimates with a few adjustments concerning geography, month-specific weighting steps, and population and housing unit controls. Since the multiyear estimates represent estimates for the period, the controls are not a single year’s housing or population estimates from the Population Estimates Program, but rather are an average of these estimates over the period. This smoothing is not lost when collapsing by MULTYEAR, but estimates will differ slightly from the 1-year data due to these adjustments.

However, as explained in this Census Bureau blog post, estimates using the entire period of the 5-year file do not represent a midpoint or an average of the five 1-year estimates. Rather, they characterize the 5-year period as a whole. Assigning estimates from the 2016 5-year ACS to 2014 may not be accurate if conditions drastically changed during the collection period. If 2014 was an outlier year, this information would be swallowed up by the pooled data. Additionally, the Census Bureau strongly recommends against comparing estimates in overlapping 5-year periods since much of the data in each estimate are the same. The difference between the overlapping 5-year estimates in essence is measuring the difference between the non-overlapping portions. Rather, users should compare 5-year estimates that don’t have any overlapping years of data such as the 2016 5-year ACS and the 2021 5-year ACS.

1 Like