Poverty estimate inconsistency in ACS 2022 5-year estimates

I’ve identified an inconsistency between IPUMS USA ACS 2022 5-year estimates and ACS 2022 1-year estimates for poverty and was hoping another user or staff member could help me understand what might be behind the discrepancy.

Analysis: I’m running an analysis to understand the number of individuals (PERWT) living below 200% of the poverty level (0 - 199% POVERTY) according to 2022 ACS 5-year estimates. I excluded observations for which POVERTY = “000” or “N/A” and found that:

  • 78,978,881 individuals live below 200% poverty (0-199%).
  • 243,532,353 individuals live above 200% poverty (200% - 501%).
  • 8,586,360 individuals are N/A (POVERTY = 0).

Discrepancy: When I ran the same analysis on IPUMS 2022 ACS 1-year estimates, I found that:

  • 92,105,144 individuals live below 200% poverty.
  • 233,661,114 individuals live above 200% poverty.
  • 7,521,304 individuals are N/A.

…and Census Bureau tables show that:

Would you be able to help me understand this discrepancy between the IPUMS and Census ACS 5-year estimates?

Hi Sophia. I replied to your email and am posting the same information in case others in the forum have the same question.

Using the online data analysis tool, I tabulated POVERTY for those with a POVERTY value between 1 and 199 (I excluded 0s, which are N/A codes). For the 2022 1-year ACS sample, I got an estimate of 85,905,126 people with family income under 200% of the poverty line. For the 2022 5-year ACS sample, I got an estimate of 78,978,881.

These estimates differ from the Census Bureau’s published estimates for two reasons. First, we generally expect Census Bureau estimates to differ from estimates that use IPUMS data because the Census Bureau creates their estimates using a dataset only available internally. The Public Use Microdata Sample (or PUMS, which is what IPUMS harmonizes and provides to users) differs from this internal data—edits and allocations are made by the Census Bureau for various reasons, including to protect respondent privacy. The PUMS is also a smaller subset of the entire ACS sample, and the Census Bureau estimates are made based on the full sample.

Second, IPUMS and the Census Bureau define poverty using different definitions of families. In both cases, poverty is defined according to family income. IPUMS defines poverty at the family level based on our variable FAMUNIT. FAMUNIT is constructed by IPUMS. From the FAMUNIT description:

The Census Bureau defines “primary families” as groups of persons related to the head of household, and “primary individuals” as household heads/householders residing without kin. In the IPUMS, primary families and primary individuals are identified in FAMUNIT with a code of 1; each secondary family or secondary individual receives a higher code. Note that IPUMS primary families (FAMUNIT=1) may also include individuals that the Census Bureau does not consider to be in the primary family if they are linked to someone related to the household head by SPLOC, MOMLOC, or POPLOC. For example, IPUMS links unmarried partners of the head to the household head using SPLOC and so these partners will be included in the IPUMS primary family unit, but because they are not related by blood or marriage to the household head, they will not be included in the Census Bureau’s primary family unit. To recreate the Census Bureau’s definition of the primary family, users can select only those individuals in the IPUMS primary family whose RELATE value is less than or equal to 1100.

You can read more about how poverty is measured by IPUMS here. This IPUMS forum post on the same topic from 2015 may also be helpful. If you are hoping to replicate the Census Bureau’s estimates, you could try using the source variables, which are the original variables provided by the Census Bureau, rather than harmonized variables that are harmonized and edited by IPUMS. The source variable measuring family income (using the Census Bureau’s definition of family) in the 2022 1-year ACS is US2022A_FINCP. You can find source variables by toggling to source variables in the extract system:

Hi Isabel,

Thanks for getting back to me. That makes sense, but I am still unsure why I would get two different results when doing the same coding in different IPUMS data extracts (e.g., extract I created on 2/6 from IPUMS is close to Census; extract I created on 2/16 from IPUMS exactly matches your estimates below). Did the weights change between 2/6 and 2/16/2024?

There were no changes to those IPUMS USA samples between February 6th and February 16th. I took a look at your extracts from those dates and compared the weighted (PERWT) estimates of the number of people with POVERTY values between 1 and 199 (excluding the 2013 and 2014 data) and got identical estimates. Since there was no change in the data extracted on the two dates, you must have done something different to one of the data files to arrive at different estimates with the two files.

Hello Isabel,

I share the same concern that Sophia expressed above, that there is potentially an issue wrong with the POVERTY variable in the current version of the IPUMS 2022 5-year ACS microdata. I just did a run in SDA of the count of people with values of 1-199 of POVERTY across all of the 1-year microdata samples from 2015-2022, and compared the average of the 2018-2022 values to what I get from the 2022 5-year ACS sample. The results are below. As you can see all of my numbers from the 1-year files align with what Sophia got from her testing of the extracts she made on 2/6/2024. They do not align with her numbers from the extracts made on 2/16/2024 or using the SDA, but that’s beside the point.

The big issue I see here is that there doesn’t seem to be a good reason why the average of the estimates across the 1-year files for 2018-2022 would differ so dramatically from the estimate made using the 2022 5-year file (about 79 million vs 93 million, respectively) – a different of nearly 18 percent. A difference between how IPUMS calculated poverty and how the Census Bureau does would not explain this difference because I’d presume the poverty calculation made by IPUMS is consistent between the 1-year and 5-year samples. It’s possible that this stark difference also exists in the original data from the Census when comparing the 1-year and 5-year files (I’ll explore that when I get a chance). But it’s also possible the maybe a simple error was made, such as not adjusting the poverty thresholds in the 5-year file for inflation to reflect 2022 values in all years by MULTYEAR (but properly adjusting the income values to 2022 dollars). That would result in a substantial undercount of the number of people below 200% FPL.

image

The notion that the difference in how IPUMS derives the POVERTY variable and how it’s done by the Census Bureau is not the source of the big difference we’re seeing here is reinforced by the table below, where I compare the 1-year estimates from IPUMS SDA to the 1-year estimates from the ACS summary file across the same set of years. As you can see, the estimates are very similar across the years.

image

Edit: in the blue box above, is should say “average of 1-year 2018-2022.”

I’m having the same issue.
When calculating the poverty numbers and rates for Los Angeles County, California - using both 2022 1 year and 5 year pums I get very different numbers and rates then the Census published data. In past years the differences were well within the margin of error.

I run this analysis every year and have never been so far off the published data.

1 year
100% poverty
Census published
Universe Below 100% %
9,571,103 1,327,645 13.9%
IPUMS calculated
Universe Below 100% %
9,522,045 1,133,459 11.9%

5 year
100% poverty
Census published
Universe Below 100% %
9,782,602 1,343,978 13.7%
IPUMS calculated
Universe Below 100% %
9,722,123 1,050,948 10.8%