I am running into a weird mismatch of PUMA household counts between my downloaded 2015-2019 IPUMS datasets and tidy census get_pums() estimates. Under my original dataset, no alteration to sample size (outside of keeping the default 7,613,000 household sample) or sample cases. I downloaded the PUMS2019 household-level data to “PUMS18195” and run the following to get the estimated total population using household weights (HHWT) for multiple PUMAs.
Agg ← PUMS18195 %>%
group_by(PUMA,STATEFIP,YEAR) %>%
summarize(
total_HH = sum(HHWT))
Agg[which(Agg$STATEFIP==1 & Agg$PUMA==100 & Agg$YEAR==2019),]
The output total_HH is 92,126 households for PUMA 100.
When I run the code in tidycensus with get_pums()
get_pums(
variables = c(“PUMA”),
state = “AL”,
survey = “acs5”,
year = 2019
) → ALdf
ALdf %>%
distinct(SERIALNO, .keep_all = T) %>%
group_by(ST, PUMA) %>%
summarize(
total_HH = sum(WGTP),
)
The output total_HH is 74,488 for PUMA 100.
Does anyone know why this may be the case? I am using the PUMA populations in a crosswalk to move from PUMA to county, so I want to make sure my estimates are accurate as I am using those estimates to further calculate the parameters of my eventual model. Any help would be appreciated!
get_pums(
variables = c(“SERIALNO”,“SPORDER”,“PUMA”,“WGTP”,“ELEP”,“FULP”,“GASP”),
variables_filter = list(SPORDER=1),
state = “AL”,
survey = “acs5”,
year = 2012
)
I am now also running into the following issue. When trying to collect PUMA level electricity cost data for the end-years 2012-2015 I receive an error that “PUMA” is not an available variable. After looking into it, I am under the impression that there are inconsistencies with PUMAs in the 2012-2015 end-years. Is there a way to collect PUMA-level data for these years?
I am not able to replicate your estimate of 92,126 but am able to exactly replicate the tidycensus estimate when I restrict to one person her household (PERNUM == 1) and omitting group quarters (keeping GQ == 1, 2, or 5).
Regarding your second post, I cannot help with questions about tidycensus, but am not aware of any comparability issues with PUMAs for 2012-forward–there may be some ideas on the ACS Data Users Group forum.
According to the get_pums() documentation, you need to specify the PUMAs you want in the "puma = " argument. You can also select the PUMAs you want within specific states using the "state = " argument. Based on my understanding, you don’t request the PUMA as a variable in the variables argument; instead you use the state or puma argument.