Household Count Mismatching

I am running into a weird mismatch of PUMA household counts between my downloaded 2015-2019 IPUMS datasets and tidy census get_pums() estimates. Under my original dataset, no alteration to sample size (outside of keeping the default 7,613,000 household sample) or sample cases. I downloaded the PUMS2019 household-level data to “PUMS18195” and run the following to get the estimated total population using household weights (HHWT) for multiple PUMAs.

Agg ← PUMS18195 %>%
group_by(PUMA,STATEFIP,YEAR) %>%
summarize(
total_HH = sum(HHWT))

Agg[which(Agg$STATEFIP==1 & Agg$PUMA==100 & Agg$YEAR==2019),]

The output total_HH is 92,126 households for PUMA 100.

When I run the code in tidycensus with get_pums()

get_pums(
variables = c(“PUMA”),
state = “AL”,
survey = “acs5”,
year = 2019
) → ALdf

ALdf %>%
distinct(SERIALNO, .keep_all = T) %>%
group_by(ST, PUMA) %>%
summarize(
total_HH = sum(WGTP),
)

The output total_HH is 74,488 for PUMA 100.

Does anyone know why this may be the case? I am using the PUMA populations in a crosswalk to move from PUMA to county, so I want to make sure my estimates are accurate as I am using those estimates to further calculate the parameters of my eventual model. Any help would be appreciated!

get_pums(
variables = c(“SERIALNO”,“SPORDER”,“PUMA”,“WGTP”,“ELEP”,“FULP”,“GASP”),
variables_filter = list(SPORDER=1),
state = “AL”,
survey = “acs5”,
year = 2012
)

I am now also running into the following issue. When trying to collect PUMA level electricity cost data for the end-years 2012-2015 I receive an error that “PUMA” is not an available variable. After looking into it, I am under the impression that there are inconsistencies with PUMAs in the 2012-2015 end-years. Is there a way to collect PUMA-level data for these years?

I am not able to replicate your estimate of 92,126 but am able to exactly replicate the tidycensus estimate when I restrict to one person her household (PERNUM == 1) and omitting group quarters (keeping GQ == 1, 2, or 5).

Regarding your second post, I cannot help with questions about tidycensus, but am not aware of any comparability issues with PUMAs for 2012-forward–there may be some ideas on the ACS Data Users Group forum.

I should clarify–there aren’t comparability issues with the 2012-forward PUMAs, but the 2012-2015 5-year ACS files span the 2011/2012 change in PUMA definitions. Because of this, it is possible that either tidycensus or the Census API might just withhold the PUMA variable (but I don’t know that). As you navigate the definition change, you may be interested in the IPUMS USA variable CPUMA0010, which provides codes for Consistent Public Use Microdata Areas (ConsPUMAs); these are the smallest geographic units that can be consistently identified across the change.

According to the get_pums() documentation, you need to specify the PUMAs you want in the "puma = " argument. You can also select the PUMAs you want within specific states using the "state = " argument. Based on my understanding, you don’t request the PUMA as a variable in the variables argument; instead you use the state or puma argument.