Hello All,
I am using the multiple imputated variables incimp1 through incimp5 for my logit regression. I am wondering if anyone is familiar with doing this in Stata and could provide feedback on my code. Is this an appropriate use of these variables?
Some of the variables used in the logit are cleaned and named differently, so I am hoping you can tell by looking at the code:
* Open Data file
use "Data\mydata_nhis_00010.dta", clear
save "Data\miconverted_mydata_nhis_00010.dta", replace
use "Data\miconverted_mydata_nhis_00010", clear
gen incimp=incfam97on2
replace incimp=. if incfam97on2>95
mi import wide, imputed(incimp=incimp1 incimp2 incimp3 incimp4 incimp5) clear
mi estimate: logit CVD i.birthcohort10 c.age c.age#c.age i.female i.incimp if insample==1
logit CVD i.birthcohort10 c.age c.age#c.age i.female if insample==1
For help with using Stata, including advice on using mi, I would suggest posting on statalist.org. IPUMS User Support can provide assistance with IPUMS data and sites, but cannot generally help troubleshoot code or provide analytical guidance. Below I’ll provide some information about the IPUMS variables you are using that may be helpful.
It looks like you’re using IPUMS NHIS data to estimate a regression model with the variable you’ve named CVD as your dependent variable, and a number of covariates, including INCFAM97ON2. The variable INCFAM97ON2 reports total grouped family income. This variable is available for NHIS samples from 1997-2018, and thus the income ranges have been harmonized to be compatible with the information available in the original data in each of these survey years. You should note that narrower income range categories are available in the similar variables INCFAM07ON and INCFAM9706. These variables may be preferable depending on the range of samples you are using.
The variables INCIMP1 through INCIMP5 are imputed family income variables available from 1997-2018. For guidance on how to use these variables appropriately, see the NCHS publication “Multiple Imputation of Family Income and Personal Earnings in the National Health Interview Survey: Methods and Examples.”
The NCHS documentation for the imputed income files directs that analysis of the five versions of each imputed income variable should be done separately, using methods and software that are appropriate for such survey data. Only then can estimates and standard errors be combined using the combining rules described in the aforementioned document on “Multiple Imputation of Family Income and Personal Earnings in the National Health Interview Survey.” The 2018 imputed income file documentation further warns:
The extra variability due to imputation CANNOT be incorporated by simply analyzing a SINGLE completed data set as if the imputed values were true values. Moreover, analysts SHOULD NOT create a single completed data set using the AVERAGE of the five sets of imputed values.
Though you are using Stata, you might find our sample code for using these variables in SAS useful. You can find this code in our data brief.