I am posting my question here hoping someone could help. I, however, understand that this may be more of an analytic question than a question about the data that I’m using.
I would like to include a measure of family income (not poverty status) in my logistic regression analysis of self-rated health using the 2000-2018 NHIS data. To deal with the missing information on family income as indicated by the variable incfam97on2, I rely on the five imputed family income variables (incimp1-incimp5). Rather than using the imputed income categories on those variables, I would like to 1) use the midpoint of each interval of the imputed income variables, 2) convert those values (midpoints) to 2018 dollars, and 3) adjust the income values in 2018 dollars, derived in step 2, for family size. Given the popular view that imputed variables should not be transformed after imputation, I wonder if I could do my own multiple imputation as follows: 1) use incimp1 as the “original” income variable and treat cases with imputed values (i.e. impyfamflag1 == 1 | impyfamflag1 == 2) as missing cases; 2) transform the incimp1 variable with missing/unimputed values as described in steps 1-3 above - convert intervals to midpoints, convert to 2018 dollars, and adjust for family size; 3) multiply impute the values of the new transformed variable.
I am open to other suggestions. Thank you.