Estimating income quintiles

I am trying to calculate income quintiles and am getting slightly different figures than I would expect. I was hoping someone might take a look at my code and let me know if something is amiss. I’m using 2017-2021 ACS data with the following variables: HHWT, RACE, HHINCOME and the weightedcalcs module in python.

# remove N/A values
df = df[(df['HHINCOME'] < 9999999)].reset_index(drop = True)

# set the weighting variable
calc = wc.Calculator('HHWT')

# create lists of quintiles based on income
df_quintile_cutoffs = [df['HHINCOME'].min()] + [calc.quantile(df, 'HHINCOME', i / 5) for i in range(1, 5)] + [df['HHINCOME'].max()]

# assign quintile labels based on income
df['quintile'] = pd.cut(df['HHINCOME'], bins = df_quintile_cutoffs, labels = False, include_lowest = True)

df_group = df.groupby(['quintile'])
calc.mean(df_group, 'HHINCOME').round().astype(int)

I can’t provide any specific feedback on your code, but there are a few things that you should consider in your calculations. First, for household-level analysis, you should limit your sample to one observation per household (usually the respondent with PERNUM = 1). Second, published census estimates will differ slightly from calculations using the ACS since the public use microdata sample available through IPUMS is a subset of the entire ACS sample used in published estimates.

If you could explain how your results differ from your expectations or provide the source for the estimates that you’re comparing your results to, I’ll be able to provide more targeted support.