Poverty estimate inconsistency in ACS 2022 5-year estimates

I’ve identified an inconsistency between IPUMS USA ACS 2022 5-year estimates and ACS 2022 1-year estimates for poverty and was hoping another user or staff member could help me understand what might be behind the discrepancy.

Analysis: I’m running an analysis to understand the number of individuals (PERWT) living below 200% of the poverty level (0 - 199% POVERTY) according to 2022 ACS 5-year estimates. I excluded observations for which POVERTY = “000” or “N/A” and found that:

  • 78,978,881 individuals live below 200% poverty (0-199%).
  • 243,532,353 individuals live above 200% poverty (200% - 501%).
  • 8,586,360 individuals are N/A (POVERTY = 0).

Discrepancy: When I ran the same analysis on IPUMS 2022 ACS 1-year estimates, I found that:

  • 92,105,144 individuals live below 200% poverty.
  • 233,661,114 individuals live above 200% poverty.
  • 7,521,304 individuals are N/A.

…and Census Bureau tables show that:

Would you be able to help me understand this discrepancy between the IPUMS and Census ACS 5-year estimates?

Hi Sophia. I replied to your email and am posting the same information in case others in the forum have the same question.

Using the online data analysis tool, I tabulated POVERTY for those with a POVERTY value between 1 and 199 (I excluded 0s, which are N/A codes). For the 2022 1-year ACS sample, I got an estimate of 85,905,126 people with family income under 200% of the poverty line. For the 2022 5-year ACS sample, I got an estimate of 78,978,881.

These estimates differ from the Census Bureau’s published estimates for two reasons. First, we generally expect Census Bureau estimates to differ from estimates that use IPUMS data because the Census Bureau creates their estimates using a dataset only available internally. The Public Use Microdata Sample (or PUMS, which is what IPUMS harmonizes and provides to users) differs from this internal data—edits and allocations are made by the Census Bureau for various reasons, including to protect respondent privacy. The PUMS is also a smaller subset of the entire ACS sample, and the Census Bureau estimates are made based on the full sample.

Second, IPUMS and the Census Bureau define poverty using different definitions of families. In both cases, poverty is defined according to family income. IPUMS defines poverty at the family level based on our variable FAMUNIT. FAMUNIT is constructed by IPUMS. From the FAMUNIT description:

The Census Bureau defines “primary families” as groups of persons related to the head of household, and “primary individuals” as household heads/householders residing without kin. In the IPUMS, primary families and primary individuals are identified in FAMUNIT with a code of 1; each secondary family or secondary individual receives a higher code. Note that IPUMS primary families (FAMUNIT=1) may also include individuals that the Census Bureau does not consider to be in the primary family if they are linked to someone related to the household head by SPLOC, MOMLOC, or POPLOC. For example, IPUMS links unmarried partners of the head to the household head using SPLOC and so these partners will be included in the IPUMS primary family unit, but because they are not related by blood or marriage to the household head, they will not be included in the Census Bureau’s primary family unit. To recreate the Census Bureau’s definition of the primary family, users can select only those individuals in the IPUMS primary family whose RELATE value is less than or equal to 1100.

You can read more about how poverty is measured by IPUMS here. This IPUMS forum post on the same topic from 2015 may also be helpful. If you are hoping to replicate the Census Bureau’s estimates, you could try using the source variables, which are the original variables provided by the Census Bureau, rather than harmonized variables that are harmonized and edited by IPUMS. The source variable measuring family income (using the Census Bureau’s definition of family) in the 2022 1-year ACS is US2022A_FINCP. You can find source variables by toggling to source variables in the extract system:

Hi Isabel,

Thanks for getting back to me. That makes sense, but I am still unsure why I would get two different results when doing the same coding in different IPUMS data extracts (e.g., extract I created on 2/6 from IPUMS is close to Census; extract I created on 2/16 from IPUMS exactly matches your estimates below). Did the weights change between 2/6 and 2/16/2024?

There were no changes to those IPUMS USA samples between February 6th and February 16th. I took a look at your extracts from those dates and compared the weighted (PERWT) estimates of the number of people with POVERTY values between 1 and 199 (excluding the 2013 and 2014 data) and got identical estimates. Since there was no change in the data extracted on the two dates, you must have done something different to one of the data files to arrive at different estimates with the two files.

Hello Isabel,

I share the same concern that Sophia expressed above, that there is potentially an issue wrong with the POVERTY variable in the current version of the IPUMS 2022 5-year ACS microdata. I just did a run in SDA of the count of people with values of 1-199 of POVERTY across all of the 1-year microdata samples from 2015-2022, and compared the average of the 2018-2022 values to what I get from the 2022 5-year ACS sample. The results are below. As you can see all of my numbers from the 1-year files align with what Sophia got from her testing of the extracts she made on 2/6/2024. They do not align with her numbers from the extracts made on 2/16/2024 or using the SDA, but that’s beside the point.

The big issue I see here is that there doesn’t seem to be a good reason why the average of the estimates across the 1-year files for 2018-2022 would differ so dramatically from the estimate made using the 2022 5-year file (about 79 million vs 93 million, respectively) – a different of nearly 18 percent. A difference between how IPUMS calculated poverty and how the Census Bureau does would not explain this difference because I’d presume the poverty calculation made by IPUMS is consistent between the 1-year and 5-year samples. It’s possible that this stark difference also exists in the original data from the Census when comparing the 1-year and 5-year files (I’ll explore that when I get a chance). But it’s also possible the maybe a simple error was made, such as not adjusting the poverty thresholds in the 5-year file for inflation to reflect 2022 values in all years by MULTYEAR (but properly adjusting the income values to 2022 dollars). That would result in a substantial undercount of the number of people below 200% FPL.

image

The notion that the difference in how IPUMS derives the POVERTY variable and how it’s done by the Census Bureau is not the source of the big difference we’re seeing here is reinforced by the table below, where I compare the 1-year estimates from IPUMS SDA to the 1-year estimates from the ACS summary file across the same set of years. As you can see, the estimates are very similar across the years.

image

Edit: in the blue box above, is should say “average of 1-year 2018-2022.”

I’m having the same issue.
When calculating the poverty numbers and rates for Los Angeles County, California - using both 2022 1 year and 5 year pums I get very different numbers and rates then the Census published data. In past years the differences were well within the margin of error.

I run this analysis every year and have never been so far off the published data.

1 year
100% poverty
Census published
Universe Below 100% %
9,571,103 1,327,645 13.9%
IPUMS calculated
Universe Below 100% %
9,522,045 1,133,459 11.9%

5 year
100% poverty
Census published
Universe Below 100% %
9,782,602 1,343,978 13.7%
IPUMS calculated
Universe Below 100% %
9,722,123 1,050,948 10.8%

In this forum thread, multiple people have asked questions or raised concerns related to the IPUMS USA variable POVERTY, specifically estimates derived from 5-year ACS samples. We appreciate your patience as we looked into the various questions.

First, I will provide some information about recent changes to the POVERTY variable and another variable that is used to calculate POVERTY, FAMUNIT. Then I will address comparisons between estimates derived from IPUMS data to estimates published by the Census Bureau. Finally, I will provide some information about why estimates from the 5-year ACS samples do not match estimates from the 1-year ACS samples.

1. Recent changes to POVERTY and FAMUNIT

On February 6th of this year, IPUMS released an update to IPUMS USA that included fixes to the POVERTY and FAMUNIT variables. From the IPUMS USA revision history:

POVERTY has been modified to improve internal consistency. POVERTY is now determined based on the IPUMS family unit (FAMUNIT), which differs from the Census Bureau family unit for primary families, for all family units within a household. Previously, for those with FAMUNIT > 1, the value of POVERTY was based on an IPUMS calculation using family size, number of children, and FTOTINC with in combination with Census Bureau published thresholds and everyone with FAMUNIT == 1 in the ACS and PRCS samples was assigned the Census Bureau recode value of poverty.

An error was found and corrected in FAMUNIT. The spouse of the head of household as identified by SPLOC was not identified as being related to the head of household in RELATE. As a result, the coresident grandchildren of the head of household and the head of household’s spouse were reported to not be part of the appropriate family unit in FAMUNIT. These third-generation children are now included in the IPUMS primary family unit.

These changes affect measurement of the POVERTY variable and explain differences between extracts from IPUMS USA created before and after February 6, 2024.

2. Differences between Census Bureau published estimates of poverty rates and estimates that use IPUMS data

We do not generally expect estimates from IPUMS microdata to exactly match Census Bureau published estimates for multiple reasons. One reason is that the data used by the Census Bureau to generate their official estimates are the full dataset available only internally to the Census Bureau, while the public use microdata (provided by IPUMS) are a subset of those original data. In the case of poverty estimates, IPUMS and the Census Bureau have differing definitions of families. As noted above, IPUMS recently made an update that improves consistency in the POVERTY variable by using IPUMS family definitions and the income for those family members when assigning values to POVERTY; because persons in FAMUNIT == 1 were previously assigned the Census Bureau measure and most households have only one family, the previous version of POVERTY produced estimates closer to those reported by the Census Bureau. While using IPUMS family definitions and the income for those family members limits the ability to replicate published estimates, it allows researchers to assess poverty with consideration for the income provided by all family members for more family types (e.g., those with a householder and an unmarried partner). Users can read more about the different definitions of families used by IPUMS and the Census Bureau in the FAMUNIT variable description (which links to other relevant variables and documentation). We plan to offer a “CBPOVERTY” variable in the future that will integrate the original Census Bureau income-to-poverty ratio recode; in the interim you can use the unharmonized source variables available from IPUMS USA (e.g., US2022A_POVPIP for the 2022 1-year ACS) for analyses where you need to replicate Census Bureau poverty estimates.

3. Discrepancies between 5-year ACS samples and 1-year ACS samples in estimates using POVERTY

Inflation adjustment practices explain the differences in POVERTY values between the 1-year ACS samples and the corresponding single-year estimates in the 5-year ACS samples (using MULTYEAR). When the Census Bureau releases 5-year ACS sample data, they inflate all dollar amounts (including income values used to calculate poverty status) to the last year in the 5-year file. IPUMS then converts the dollar amounts to 1989 dollars in order to determine poverty thresholds. See the IPUMS USA user note on poverty for more information on the general process (though I should note this note reports the 1999 matrix and indicates we adjust to 1999 dollars, which is incorrect; we use 1989 dollars and the corresponding 1989 matrix in the POVERTY variable). We then apply the Census Bureau adjustment factor to each individual year within the 5-year files, using MULTYEAR and ADJUST. As a result, the last year in a 5-year sample (e.g., 2022 in the 2022 5-year file) matches the corresponding 1-year file (the 2022 1-year ACS), but other years will not as each 5-year file will make a different inflation adjustment.

Estimates of POVERTY calculated from the PUMS data for the 5-year and 1-year samples are expected to differ.

Thank you for this detailed answer. While I understand that some discrepancy is expected, with all due respect, I guess I don’t understand how any explanation could make researchers feel comfortable with a discrepancy this large–a discrepancy that suggests that 15 million fewer people lived below 200% of the FPL when comparing Census 5-year to IPUMS 5-year estimates (92 million vs. 78 million) in 2022.

Furthermore, based on ScogginJ’s answer on 3/20/24, it looks like the estimates were changed again on 3/20/24 such that they matched my 2/6/2024 extract, but have since been changed back such that they do not and once again reflect the discrepancy.