Household counts larger than expected

Ethan_Jantz · February 16, 2022, 7:22pm

I’m working to estimate household counts using PUMS data and running into numbers 2.5x larger than those from the ACS for the same period. Is this an issue with my method for counting (summing HHWT)? Here is the code used to generate the counts. Using PUMS data I’m getting ~12 million households in IL where the ACS estimate for the same period is ~5 million with a 10,000 MoE.

library(ipumsr)
library(dplyr)

ddi <- ""
ipums_path <- ""

ipums_data <- read_ipums_micro(ddi = ddi,
                               data_file = ipums_path)

IL_hh_pums <- ipums_data %>%
  filter(GQ == 1, STATEFIP == 17) %>%
  group_by(STATEFIP) %>%
  summarize(households = sum(HHWT))

IL_hh_acs5 <- tidycensus::get_acs(
  survey = "acs5",
  year = 2019,
  geography = "state",
  state = "IL",
  variables = c("households" = "B11012_001"),
  output = "wide"
)

IL_hh_pums # 11,899,895 households
IL_hh_acs5 # 4,846,134 households, 10,459 moe

KariWilliams · February 17, 2022, 4:09pm

When calculating household totals from the PUMS data available from IPUMS USA, there are two key considerations. First, you should restrict to only one person per household (e.g., PERNUM == 1) to avoid counting a household more than once. Second, while it looks like your code addresses group quarters, it seems you are only counting GQ values of 1; note that values of 2 and 5 should probably be included as well.

Ethan_Jantz · February 17, 2022, 6:30pm

My data doesn’t include PERNUM, since it only contains household-level variables in the extract. I was working through this with Ivan in a previous thread (see here). Adding more GQ values into my filter would increase the household count, so I’m still unsure why I’m getting such large counts in my data. Is there another variable in household-level data that is equivalent to PERNUM?

Ethan_Jantz · February 18, 2022, 8:01pm

I believe I was able to figure this out. There is no PERNUM variable in household-level data extracts, but by ensuring there were no duplicate SERIAL values I was able to get a count within the ACS5 estimate ± the margin of error. See below:

library(ipumsr)
library(dplyr)

ddi <- ""
ipums_path <- ""

ipums_data <- read_ipums_micro(ddi = ddi,
                               data_file = ipums_path)

IL_hh_pums <- ipums_data %>%
  filter(GQ == 1, STATEFIP == 17) %>%
  distinct(SERIAL, .keep_all = T) %>%
  group_by(STATEFIP) %>%
  summarize(households = sum(HHWT))

IL_hh_acs5 <- tidycensus::get_acs(
  survey = "acs5",
  year = 2019,
  geography = "state",
  state = "IL",
  variables = c("households" = "B11012_001"),
  output = "wide"
)

IL_hh_pums # 4,844,000
IL_hh_acs5 # 4,846,134; moe 10,459

Zachary_Marhanka · February 25, 2022, 12:50am

Hi Ethan, I am running into the same issue, but still getting larger household counts than reported on somewhere like QuickFacts. I am also seeing large shifts in household counts year to year. I did not use the tidycensus package, instead downloading directly from IPUMS. I accounted for duplicates through SERIAL, but that does not solve the overestimates. Has anyone else had this issue in the past?

Ethan_Jantz · February 25, 2022, 1:15am

I didn’t download the IPUMS data using tidycensus. I masked my file path for my IPUMS data and codebook. The tidycensus call is pulling the ACS data I compared my counts with. Sorry if there was any confusion there.

Zachary_Marhanka · February 25, 2022, 1:52am

When you pulled the ACS data through tidycensus did it automatically calculate the moe?

Ethan_Jantz · February 25, 2022, 2:25am

Yes, you can copy/paste the tidycensus call into R and get the same output. The Census API provides MoEs with all ACS estimates.

Topic		Replies	Views
Why do I fail to replicate published ACS aggregates? USA	2	222	March 2, 2023
Household Count Mismatching USA	4	275	March 1, 2022
Inconsistency with PUMS data USA	2	311	April 13, 2021
Obtaining Total Household counts for 1990 and 2000 USA	1	374	May 25, 2021
Discrepency in 2019 US Households USA	1	248	December 8, 2020

Household counts larger than expected

Related topics