New to IPUMS. Need help creating a US Census variable merge

I am trying to create a chart for US Census data by County in NY State to determine the number of low-to-moderate income (LMI) households (HH) in each county out of the total county household number. Because HUD LMI cut-offs increase with HH size, I need to not just find the HH count under a certain HH income figure, but control for various HH income limits by the number of persons in each HH.

I’m struggling with the online analysis tool on the site to figure out a way to control for those two variable by county and was wondering if anyone had any tips or guidance they could provide, or if controlling for those variables isn’t possible on the IPUMS program?

Any suggestions would be most appreciated.



There are ways you can do this with the online analysis tool. However, the lowest geographic unit identified in the ACS microdata is the public use microdata area or PUMA; PUMAs are areas that contain 100,000 people. Where possible, IPUMS infers other geographic identifiers (e.g., counties, cities) from PUMAs. Not all counties can be identified based on PUMAs meaning some counties in New York state may be omitted from your analysis if you are using the microdata.

For this reason you might be interested in using IPUMS NHGIS, which provides summary or tabular data from published estimates at a variety of geographic levels, including county. If you are looking to generate summary counts of households, you may want to see if NHGIS has a table that suits your needs first. For example, if you filter on the Household and Family Income topic and the 2018-2022 year in the Data Finder tool, you can find tables reporting federal poverty guidelines (e.g., B17026 Ratio of Income to Poverty Level of Families in the Past 12 Months) and Area Median Income (e.g., B19019 Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) by Household Size).

If you choose to use the online analysis tool for IPUMS USA microdata instead, you can create a new variable (see instructions for creating a new variable in SDA) that indicates whether a given household, based on its size and county, is above or below the AMI threshold. Creating a new variable allows you to assign codes based on combinations of other variables (e.g., household income (HHINCOME) as well as number of persons in the sampled housing unit or household (NUMPREC)). Be sure to use the household weight (HHWEIGHT) and filter to PERNUM == 1 to generate weighted household estimates. Note that in the microdata files, each record is a person and they are organized by households; they can also be organized into families (a household could contain multiple families). If you are interested in a family- instead of a household-level analysis, you will need to use variables indicating family income (FTOTINC) and family size (FAMSIZE); also note that the PERNUM(1) filter will only capture the first family in multi-family households (I am not aware of a way to use the SDA tool restrict to one observation per family).

Hi Kari,

I meant to thank you for your detailed suggestions in your email.

While I couldn’t get exactly what I was hoping for, they were helpful in allowing me to navigate the data base and get close.

Thanks for taking the time to help. I really appreciate it!

Have a great day!


1 Like