Poverty estimate inconsistency in ACS 2022 5-year estimates

I’ve identified an inconsistency between IPUMS USA ACS 2022 5-year estimates and ACS 2022 1-year estimates for poverty and was hoping another user or staff member could help me understand what might be behind the discrepancy.

Analysis: I’m running an analysis to understand the number of individuals (PERWT) living below 200% of the poverty level (0 - 199% POVERTY) according to 2022 ACS 5-year estimates. I excluded observations for which POVERTY = “000” or “N/A” and found that:

  • 78,978,881 individuals live below 200% poverty (0-199%).
  • 243,532,353 individuals live above 200% poverty (200% - 501%).
  • 8,586,360 individuals are N/A (POVERTY = 0).

Discrepancy: When I ran the same analysis on IPUMS 2022 ACS 1-year estimates, I found that:

  • 92,105,144 individuals live below 200% poverty.
  • 233,661,114 individuals live above 200% poverty.
  • 7,521,304 individuals are N/A.

…and Census Bureau tables show that:

Would you be able to help me understand this discrepancy between the IPUMS and Census ACS 5-year estimates?

Hi Sophia. I replied to your email and am posting the same information in case others in the forum have the same question.

Using the online data analysis tool, I tabulated POVERTY for those with a POVERTY value between 1 and 199 (I excluded 0s, which are N/A codes). For the 2022 1-year ACS sample, I got an estimate of 85,905,126 people with family income under 200% of the poverty line. For the 2022 5-year ACS sample, I got an estimate of 78,978,881.

These estimates differ from the Census Bureau’s published estimates for two reasons. First, we generally expect Census Bureau estimates to differ from estimates that use IPUMS data because the Census Bureau creates their estimates using a dataset only available internally. The Public Use Microdata Sample (or PUMS, which is what IPUMS harmonizes and provides to users) differs from this internal data—edits and allocations are made by the Census Bureau for various reasons, including to protect respondent privacy. The PUMS is also a smaller subset of the entire ACS sample, and the Census Bureau estimates are made based on the full sample.

Second, IPUMS and the Census Bureau define poverty using different definitions of families. In both cases, poverty is defined according to family income. IPUMS defines poverty at the family level based on our variable FAMUNIT. FAMUNIT is constructed by IPUMS. From the FAMUNIT description:

The Census Bureau defines “primary families” as groups of persons related to the head of household, and “primary individuals” as household heads/householders residing without kin. In the IPUMS, primary families and primary individuals are identified in FAMUNIT with a code of 1; each secondary family or secondary individual receives a higher code. Note that IPUMS primary families (FAMUNIT=1) may also include individuals that the Census Bureau does not consider to be in the primary family if they are linked to someone related to the household head by SPLOC, MOMLOC, or POPLOC. For example, IPUMS links unmarried partners of the head to the household head using SPLOC and so these partners will be included in the IPUMS primary family unit, but because they are not related by blood or marriage to the household head, they will not be included in the Census Bureau’s primary family unit. To recreate the Census Bureau’s definition of the primary family, users can select only those individuals in the IPUMS primary family whose RELATE value is less than or equal to 1100.

You can read more about how poverty is measured by IPUMS here. This IPUMS forum post on the same topic from 2015 may also be helpful. If you are hoping to replicate the Census Bureau’s estimates, you could try using the source variables, which are the original variables provided by the Census Bureau, rather than harmonized variables that are harmonized and edited by IPUMS. The source variable measuring family income (using the Census Bureau’s definition of family) in the 2022 1-year ACS is US2022A_FINCP. You can find source variables by toggling to source variables in the extract system:

Hi Isabel,

Thanks for getting back to me. That makes sense, but I am still unsure why I would get two different results when doing the same coding in different IPUMS data extracts (e.g., extract I created on 2/6 from IPUMS is close to Census; extract I created on 2/16 from IPUMS exactly matches your estimates below). Did the weights change between 2/6 and 2/16/2024?

There were no changes to those IPUMS USA samples between February 6th and February 16th. I took a look at your extracts from those dates and compared the weighted (PERWT) estimates of the number of people with POVERTY values between 1 and 199 (excluding the 2013 and 2014 data) and got identical estimates. Since there was no change in the data extracted on the two dates, you must have done something different to one of the data files to arrive at different estimates with the two files.

Hello Isabel,

I share the same concern that Sophia expressed above, that there is potentially an issue wrong with the POVERTY variable in the current version of the IPUMS 2022 5-year ACS microdata. I just did a run in SDA of the count of people with values of 1-199 of POVERTY across all of the 1-year microdata samples from 2015-2022, and compared the average of the 2018-2022 values to what I get from the 2022 5-year ACS sample. The results are below. As you can see all of my numbers from the 1-year files align with what Sophia got from her testing of the extracts she made on 2/6/2024. They do not align with her numbers from the extracts made on 2/16/2024 or using the SDA, but that’s beside the point.

The big issue I see here is that there doesn’t seem to be a good reason why the average of the estimates across the 1-year files for 2018-2022 would differ so dramatically from the estimate made using the 2022 5-year file (about 79 million vs 93 million, respectively) – a different of nearly 18 percent. A difference between how IPUMS calculated poverty and how the Census Bureau does would not explain this difference because I’d presume the poverty calculation made by IPUMS is consistent between the 1-year and 5-year samples. It’s possible that this stark difference also exists in the original data from the Census when comparing the 1-year and 5-year files (I’ll explore that when I get a chance). But it’s also possible the maybe a simple error was made, such as not adjusting the poverty thresholds in the 5-year file for inflation to reflect 2022 values in all years by MULTYEAR (but properly adjusting the income values to 2022 dollars). That would result in a substantial undercount of the number of people below 200% FPL.

image

The notion that the difference in how IPUMS derives the POVERTY variable and how it’s done by the Census Bureau is not the source of the big difference we’re seeing here is reinforced by the table below, where I compare the 1-year estimates from IPUMS SDA to the 1-year estimates from the ACS summary file across the same set of years. As you can see, the estimates are very similar across the years.

image

Edit: in the blue box above, is should say “average of 1-year 2018-2022.”

I’m having the same issue.
When calculating the poverty numbers and rates for Los Angeles County, California - using both 2022 1 year and 5 year pums I get very different numbers and rates then the Census published data. In past years the differences were well within the margin of error.

I run this analysis every year and have never been so far off the published data.

1 year
100% poverty
Census published
Universe Below 100% %
9,571,103 1,327,645 13.9%
IPUMS calculated
Universe Below 100% %
9,522,045 1,133,459 11.9%

5 year
100% poverty
Census published
Universe Below 100% %
9,782,602 1,343,978 13.7%
IPUMS calculated
Universe Below 100% %
9,722,123 1,050,948 10.8%

In this forum thread, multiple people have asked questions or raised concerns related to the IPUMS USA variable POVERTY, specifically estimates derived from 5-year ACS samples. We appreciate your patience as we looked into the various questions.

First, I will provide some information about recent changes to the POVERTY variable and another variable that is used to calculate POVERTY, FAMUNIT. Then I will address comparisons between estimates derived from IPUMS data to estimates published by the Census Bureau. Finally, I will provide some information about why estimates from the 5-year ACS samples do not match estimates from the 1-year ACS samples.

1. Recent changes to POVERTY and FAMUNIT

On February 6th of this year, IPUMS released an update to IPUMS USA that included fixes to the POVERTY and FAMUNIT variables. From the IPUMS USA revision history:

POVERTY has been modified to improve internal consistency. POVERTY is now determined based on the IPUMS family unit (FAMUNIT), which differs from the Census Bureau family unit for primary families, for all family units within a household. Previously, for those with FAMUNIT > 1, the value of POVERTY was based on an IPUMS calculation using family size, number of children, and FTOTINC with in combination with Census Bureau published thresholds and everyone with FAMUNIT == 1 in the ACS and PRCS samples was assigned the Census Bureau recode value of poverty.

An error was found and corrected in FAMUNIT. The spouse of the head of household as identified by SPLOC was not identified as being related to the head of household in RELATE. As a result, the coresident grandchildren of the head of household and the head of household’s spouse were reported to not be part of the appropriate family unit in FAMUNIT. These third-generation children are now included in the IPUMS primary family unit.

These changes affect measurement of the POVERTY variable and explain differences between extracts from IPUMS USA created before and after February 6, 2024.

2. Differences between Census Bureau published estimates of poverty rates and estimates that use IPUMS data

We do not generally expect estimates from IPUMS microdata to exactly match Census Bureau published estimates for multiple reasons. One reason is that the data used by the Census Bureau to generate their official estimates are the full dataset available only internally to the Census Bureau, while the public use microdata (provided by IPUMS) are a subset of those original data. In the case of poverty estimates, IPUMS and the Census Bureau have differing definitions of families. As noted above, IPUMS recently made an update that improves consistency in the POVERTY variable by using IPUMS family definitions and the income for those family members when assigning values to POVERTY; because persons in FAMUNIT == 1 were previously assigned the Census Bureau measure and most households have only one family, the previous version of POVERTY produced estimates closer to those reported by the Census Bureau. While using IPUMS family definitions and the income for those family members limits the ability to replicate published estimates, it allows researchers to assess poverty with consideration for the income provided by all family members for more family types (e.g., those with a householder and an unmarried partner). Users can read more about the different definitions of families used by IPUMS and the Census Bureau in the FAMUNIT variable description (which links to other relevant variables and documentation). We plan to offer a “CBPOVERTY” variable in the future that will integrate the original Census Bureau income-to-poverty ratio recode; in the interim you can use the unharmonized source variables available from IPUMS USA (e.g., US2022A_POVPIP for the 2022 1-year ACS) for analyses where you need to replicate Census Bureau poverty estimates.

3. Discrepancies between 5-year ACS samples and 1-year ACS samples in estimates using POVERTY

Inflation adjustment practices explain the differences in POVERTY values between the 1-year ACS samples and the corresponding single-year estimates in the 5-year ACS samples (using MULTYEAR). When IPUMS releases 5-year ACS sample data, we inflate all dollar amounts (including income values used to calculate poverty status) to the last year in the 5-year file. IPUMS then converts the dollar amounts to 1989 dollars in order to determine poverty thresholds. See the IPUMS USA user note on poverty for more information on the general process (though I should note this note reports the 1999 matrix and indicates we adjust to 1999 dollars, which is incorrect; we use 1989 dollars and the corresponding 1989 matrix in the POVERTY variable). We then apply the Census Bureau adjustment factor to each individual year within the 5-year files, using MULTYEAR and ADJUST. As a result, the last year in a 5-year sample (e.g., 2022 in the 2022 5-year file) matches the corresponding 1-year file (the 2022 1-year ACS), but other years will not as each 5-year file will make a different inflation adjustment.

Estimates of POVERTY calculated from the PUMS data for the 5-year and 1-year samples are expected to differ.

Note: This post previously incorrectly stated that the Census Bureau adjusts dollar amounts inf the 5-year ACS to account for inflation (inflating all dollar amounts to the last year of the file). It has been corrected to indicate that IPUMS applies this inflation adjustment, not the Census Bureau.

Thank you for this detailed answer. While I understand that some discrepancy is expected, with all due respect, I guess I don’t understand how any explanation could make researchers feel comfortable with a discrepancy this large–a discrepancy that suggests that 15 million fewer people lived below 200% of the FPL when comparing Census 5-year to IPUMS 5-year estimates (92 million vs. 78 million) in 2022.

Furthermore, based on ScogginJ’s answer on 3/20/24, it looks like the estimates were changed again on 3/20/24 such that they matched my 2/6/2024 extract, but have since been changed back such that they do not and once again reflect the discrepancy.

Hello! Thank you so much for your help with this! Not to belabor this point, I’m just new to IPUMS USA and want to make sure I understand fully. I ran some quick tabulations today using the SDA to get the raw respondent counts for <200% poverty by ACS 1-years and for 5-year multyears. I figured that was the best way to see what real inflation adjustment changes occurred when combining the single year into the 5-year files. This is what I’m seeing:

image

The 2022 5-year differences seem much larger than the 2021 and 2019 differences. These are being driven by inflation adjustments? It makes sense that the earliest multyear would have a larger adjustment given it’s the farthest from the file year, but the large jump between the earliest multyear in 2021 (-76,210) and the earliest multyear in 2022 (-120,572) seems like a lot. Especially since multyear 2018 is in both files, and the adjustment is -60,297 in 2021 and -120,572 in 2022. There isn’t anything else driving this change?

Please let me know if I’m misunderstanding anything, or if I conducted my SDA analysis incorrectly. And thank you so much again for your time and patience!

After @Sophia_Autor 's last message, we asked that the IPUMS USA team take a closer look at all of the component pieces of the POVERTY variable. It seems that there is a double-inflation issue as @scogginj suggested. I hope to provide a more detailed update about the source of the error (as well as an offer of an IPUMS mug to those who helped us flag the error) in the next few days. Thanks for your ongoing patience.

1 Like

That’s awesome – thank you so much!

1 Like

Yes, that’s wonderful, thank you so much for the update!

I am following up to confirm that the large discrepancies between the 5-year and the 1-year files for POVERTY stem from double-adjusting for inflation by IPUMS in the FTOTINC variable. In my earlier message I incorrectly stated that the Census Bureau inflates variables reporting dollar amounts the 5-year files so that all values are standardized to the final year of the file–IPUMS performs this adjustment for inflation on the 5-year files as the original multi-year files from the Census Bureau are not adjusted for inflation. IPUMS applies this inflation adjustment on a variable-level basis. In this specific circumstance, we first adjust person-level total income (INCTOT). The variable FTOTINC sums INCTOT for all persons in the same FAMUNIT; we then incorrectly apply the inflation adjustment to most values of FTOTINC even though it has already been applied to the component INCTOT values. FTOTINC is, in turn, used to create the POVERTY variable. This issue was made obvious with the recent change to the IPUMS USA POVERTY variable; the double inflation adjustment only impacted secondary families prior to February 6, 2024 when POVERTY reported the original PUMS value for income-to-poverty-threshold for the primary family. On February 6, 2024, we introduced a change to reference a FAMUNIT’s FTOTINC value to the corresponding poverty threshold for all families instead of only secondary families.

Note that there will still be some minor differences between the 1-year and 5-year files as the 5-year files will still be adjusted for inflation (but only once).

We are working to correct this error and update the data files on IPUMS USA. To convey our gratitude to @Sophia_Autor and @scogginj for helping us find and correct this error, we would like to send you an IPUMS mug as a token of our appreciation. Please email ipums@umn.edu with a mailing address where we can send this small thank you gift.

2 Likes

Thank you so much, Kari and IPUMS team – this is great news.

Thank you @KariWilliams for getting to the bottom of this, and @Sophia_Autor keeping the conversation going! Glad to hear that this will be corrected since we rely heavily on the POVERTY variable in our work.