I am trying to calculate income quintiles and am getting slightly different figures than I would expect. I was hoping someone might take a look at my code and let me know if something is amiss. I’m using 2017-2021 ACS data with the following variables: HHWT, RACE, HHINCOME and the weightedcalcs module in python.
# remove N/A values
df = df[(df['HHINCOME'] < 9999999)].reset_index(drop = True)
# set the weighting variable
calc = wc.Calculator('HHWT')
# create lists of quintiles based on income
df_quintile_cutoffs = [df['HHINCOME'].min()] + [calc.quantile(df, 'HHINCOME', i / 5) for i in range(1, 5)] + [df['HHINCOME'].max()]
# assign quintile labels based on income
df['quintile'] = pd.cut(df['HHINCOME'], bins = df_quintile_cutoffs, labels = False, include_lowest = True)
df_group = df.groupby(['quintile'])
calc.mean(df_group, 'HHINCOME').round().astype(int)