I am doing a household-level analysis. I am restricting records with PERNUM = 1, so that one only one individual in a household gets included. I am interested in average household income in a particular PUMA for the bottom 20th percentile of income. I am sorting the household income variable in each PUMA, and multiplying each household income record with the household weight to get the average income at 20th percentile income. Am I using the weights correctly?
It sounds like you’re trying to find the sample of households that are in the bottom 20th percentile of household income and then calculate the average income of that group. If so, I recommend that your first step be to restrict your sample to cases where PERNUM is equal to 1 to focus your analysis on households. To get the cutoff for the 20th percentile, you should first sort by household income and then create a new variable that will represent the cumulative sum of weights (CUWT). You will want this variable to equal the sum of HHWT for the corresponding household plus CUWT for the previous observation. Then, divide CUWT by the total sum of HHWT in order to calculate your sample’s percentiles. The income cutoff for the 20% percentile will be the observation where this variable is equal to 0.2. In Stata, you can use the handy function pctile (e.g. pctile IncomeQuintile = hhincome [pw=hhwt], n(5)) which will divide your data into as many quantiles as you need.
To get the average income for this subsample, you can sum the product of HHINCOME and HHWT and then divide this value by the sum of HHWT for these households. In Stata, this would include setting the data with svyset [pw=hhwt] and then running svy: mean hhincome if hhincome < [insert 20th percentile cutoff]. Please also note that HHINCOME values of 9999999 refer to respondents living in group quarters rather than earnings of $10 million.