Working with mixed PUMAs in 5-Year Data

I am working on some calculations for New York City — specifically around the number of non-citizens in NYC’s labor force — and using the 2022 5-Year ACS sample.

The problem I keep running into is that I cannot seem to get an accurate total count of people living in NYC because the PUMA codes changed from 2021 to 2022. No matter how I write out the R code, I keep getting a total sum of 7.4 million people living in New York, when I know I should see closer to 8.6 million.

Here is the code I have tried so far:

ACS_NYC ← ACS2022[ STATEFIP == 36 & PUMA >= 3701 & PUMA <= 4503] #This version just tries to catch all the possible PUMA codes across all the years

ACS_NYC ← ACS2022[MULTYEAR <= 2021 & STATEFIP == 36 & PUMA >= 3701 & PUMA <= 4114 | MULTYEAR == 2022 & STATEFIP == 36 & PUMA >= 4103 & PUMA <= 4503] #This version tries to sort out PUMAs base don the year the survey was taken

Both methods yield the same result. Any help you can offer is greatly appreciated. I have already run the same code with 1-year data from 2022, but I would like to use the 5-year data to assimilate this prohject with the rest of my work in other areas.

PUMAs are population-based geographic areas that contain at least and approximately 100,000 residents. Since populations change over time, PUMAs also change over time. The 2010 PUMA definitions are used in the 2012-2021 ACS, while the 2020 PUMA definitions are used in the 2022-2031 ACS. The 2022 5-year ACS therefore includes data that use 2010 PUMA definitions, and data that use 2020 PUMA definitions. You’ll need to adjust your PUMA-based definition of New York City for the different PUMA definitions that are used in the 2022 5-year data.

Your second approach looks correct to me. When I define the NYC population as you did in your second approach, and sum the value of PERWT (the person level sampling weight) for all those who live in NYC in the 2022 5-year ACS, I get an estimated population of 8,615,432. Code review is beyond the scope of IPUMS User Support, but I suggest checking all of your code, paying careful attention to applying the sampling weights and ensuring that you have used parentheses to apply or/and operators appropriately when accounting for the year-specific PUMAs.

Note that the population in a 5-year ACS sample is the estimated average population across the 5-year period, and will differ from the estimated 5-year population in each of the 1-year data files individually.

I want to add that you can also identify New York City using the IPUMS USA variable CITY. The CITY code for NYC is the same across all years, accounting for PUMA changes over time, reducing the work you would need to do to identify the city in a multi-year dataset. Also see the CITY comparability tab, where we provide crosswalks between PUMAs and large places, which could be helpful to confirm which PUMAs correspond to NYC.