I’ve been using 10 years of data to look at sample adults. I use the method of pooling by multiplying the SAMPWEIGHT by the fraction (number of observations in sample x)/(number of observations in pooled sample).
I count the number of observations as the number of sample adults with a record (I remove all ASTATFLG!=1 participants before calculating sample weight). Should I be calculating this adjusted weight using all participants, not just sample adults? I think my current way is correct since I am only interested in sample adults but wanted to check. Thank you!
This user note on variance estimation using NHIS data provides guidelines on pooling multiple years of data. Regarding weights when pooling samples, the simplest adjustment method is to divide the weight by the number of years of data pooled. The method you mention seems reasonable; you could try both and compare the results, though the two methods are likely to perform similarly well. Whether to adjust weights using all participants or just sample adults depends on your variable of interest and the types of estimates that you are looking to produce. If your outcome is only available for sample adults and you are looking to calculate estimates for this group (e.g. percent of US adults that have x), then you only need to use sample adults for your weight adjustment. In cases where your estimates are for the entire US population (e.g. percent of Americans that have x), then you will need to adjust your weights using all respondents. Note that the guide also provides recommended syntax for subpopulation analysis that does not compromise the sample design information. Rather than removing all cases with ASTATFLG != 1, the syntax allows researchers to run their analysis on this subsample in order to retain sample design information and compute correct standard errors.
Thank you! We have tried both and they perform similarly, so we use the more precise method.
We are only interested in Sample Adults as we are focused on occupation and industry (these data are only collected for Sample Adults). In this case, would removing everyone who is not ASTATFLG !=1 affect the sample design? We use subpopulation methodology as you described for subsamples of occupation and industry, but we don’t analyze anyone who does not have employment information, so all of our analyses exclude anyone who is not ASTATFLG==1 and our sample weight adjustments similarly only include Sample Adults.
Out of curiosity I ran the samples as if we were keeping all participants and using the subpopulation method. It ended up being the exact same since everyone who is not ASTATFLG==1 is not in the sample design due to the weight adjustment. Or would we adjust the non-Sample Adult weights by the Sample Adult weight fraction described above? Wouldn’t that not be appropriate, since it’s not including non-Sample Adults in the calculation?
Removing observations from a sample that uses a complex survey design without running the subpop() option will result in losing information on relevant sample design parameters such as strata and clusters even if your analysis only focuses on this subpopulation. While your point estimates will be the same regardless, the standard error of your estimate will likely be different when including sample design parameters from observations outside of your subpopulation of interest. Adjusting the weights for observations outside of your analysis however is not necessary (unless using replicate weights). You can find further information in the Stata documentation on the subpop() option.
Thank you for your reply, I understand what you’re saying. We re-ran one of our analyses applying the adjusted adult sample weight, now applying it to all observations, and then running the subpop option for ASTATFLG==1. There was no difference in the estimates; the estimation design had the same number of strata, the only different was an additional PSU. Should we be concerned and re-run the rest of our analyses using the subpop option? Thank you again!
It’s not surprising to me that some estimates would be more affected by the use of subpop() than others. IPUMS advises users to account for the complex sampling design of the NHIS appropriately, but it is up to individual researchers to determine how to run their analyses.
Understood, we re-ran more of our analyses using the subpop(if ASTATFLG==1) option and there was no difference in estimates compared to only including Sample Adults in the dataset, except for the additional PSU when running the analyses. The sample weights we adjusted considered only Sample Adults but we applied them to all participants and did not use replicate weights. It’s reassuring that there does not appear to be a difference, but we will keep this in mind for the future.