Help with FAMINC variable

Your help would be extremely help when it comes to the FAMINC variable. I am having issues finding an accurate number of households when using the FAMINC variable.

As an example, the only sample I am pulling is 2018 (for simplicity sake right now), and the only variables are STATEFIP & FAMINC.

When creating the data extract this gives me 10 variables (YEAR, SERIAL, MONTH, HWTFINL, CPSID, ASECFLAG, ASECWTH, PERNUM, WTFINL, CPSIDP, STATEFIP and FAMINC)

Using R, I am removing any duplicates of serial numbers for each household, by grouping by SERIAL then slice. When adding the weights, I get a total of 8.8 million households. According to census data for 2018 though, there were roughly 7.6 million households (MOE +/- 23,468)

I have tried to multiple times to get the number of households even remotely close to the 7.6 million number and cannot.

Once I feel comfortable getting the correct number of households, I can expand the research as I am trying to gauge the fluctuation of household income over the past 36 months.

Is there an additional variable I need to add to the original data pull?

Any assistance would be greatly appreciated. Thank you.

You aren’t precise enough (e.g. provide specific measures, data sources, estimators, etc. when you say “according to census data”) for someone to help. I suggest reading the SERIAL documentation: https://cps.ipums.org/cps-action/variables/SERIAL#description_section

I am sharing a few ideas about what might be happening to give you these numbers. Please follow up if you have questions after reviewing them or continue to have problems replicating the household counts you expect.

A good way to calculate household estimates is to use PERNUM == 1, which removes duplicates within households by keeping only the household heads. It looks like you may be trying to estimate the number of households for a particular state (please correct me if i’m wrong), in which case you can filter and then use group_by in R to sort your results by state. Also be sure to use ASECFLAG == 1 if you included any Basic Monthly Data in your extract, which requires use of a different sample weight variable. These data training exercises might be helpful to you for best practices in utilizing IPUMS data according to the statistical package you use, in your case R. Below is some example code you could use to estimate the number of households in Florida (2018), which comes out to about 8.6 million:

data %>%
filter(PERNUM == 1 & STATEFIP == 12 & ASECFLAG == 1) %>%
group_by(STATEFIP) %>%
summarize(n = sum(ASECWTH))

One other factor to consider if you are getting conflicting numbers is the difference between families and households. Not all household members are necessarily family members. The U.S. Census Bureau defines a family as a group of two or more people related by birth, marriage, or adoption who reside together, whereas a household includes unrelated people who reside in the house. This comes into play with FAMINC, which is a household-level variable that measures family income. Alternatively, HHINCOME is a household-level variable that measures household income (you can read more about this here).