I am interested in conducting an analysis comparing prevalence of cardiovascular risk factors (diabetes, hypertension, etc.) by survey year in NHIS. My analysis uses pooled IPUMS NHIS data from 2013-2018. To generate estimates for each year, would I use the pooled sample weight procedure outlined here and apply that adjusted sample weight to each year?
When pooling multiple NHIS samples together to create estimates that are representative of the entire time period, you need to adjust the sampling weights accordingly. You should follow the procedure outlined in the user note you linked to, that is, divide the sampling weights by the total number of samples you are pooling together. In your case, you would divide the weights by six. The particular sampling weight you use will depend on the universe of the variables you are analyzing. For example, if you were analyzing the variable DIABETICEV, the universe is sample persons, so you would use the weight SAMPWEIGHT. The weights tab of each variable links to the weight to use with that variable, by sample. You don’t need to modify the STRATA variable when pooling samples, since IPUMS modifies the strata to make them unique across samples, as well as within samples.
The section of the note titled “Combining Sampling Weights When a Variable is Located in Different Files Across Years” may be relevant depending on which variables you are using.
Thanks Isabel. I understand creating a pooled sample weight by dividing SAMPWEIGHT by the numbers of samples pooled together. However, if I wanted to compare yearly prevalence by each sample year (2013 vs. 2014 vs. 2015 etc), would I still apply this pooled sample weight (SAMPWEIGHT/6) to generate these yearly prevalences?
You only need to adjust the sampling weights with pooled samples if you are creating estimates that are representative of the entire time period. If you are creating estimates for each year separately, you do not need to adjust the sampling weights by dividing them by the number of samples. You would just use them as-is. In that case, while your extract may contain data from multiple samples, you are not conducting a “pooled” analysis.