Which weight to use when analyzing a subpopulation?

I am analyzing several variables in NHIS related to current workers for the years 2007-2018. My groups of interest rely on work information (which is only available for Sample Adults), and most of the health outcomes are collected in the Sample Adult survey. All of my analyses are comparing variables across work groups (for example, prevalence of asthma across different industries).

However, there are some variables (age, sex, race) that are collected in the Person or Household surveys rather than the Sample Adult survey. I have thus far been conducting my subpopulation analyses (focusing on specific industries) using the SAMPWEIGHT (adjusted for pooled years) weight, but I’m wondering if I should be using PERWEIGHT or HHWEIGHT instead for these? For example, if I want to look at the age distribution in a specific industry (for example, the number of workers 55+ in the agriculture industry), should I be using SAMPWEIGHT or PERWEIGHT? And what if I want to combine variables from different surveys into a model (for example, the prevalence of asthma in the agriculture industry, adjusted for age)?

Thank you so much!!

When combining variables from different sections of the NHIS, it is appropriate to use the most restrictive weight for the variables in your analysis (SAMPWEIGHT in your case). PERWEIGHT would be appropriate if you were only analyzing variables from the person (or possibly family) file and wanted to create person-level estimates. HHWEIGHT would be appropriate if your unit of analysis was households. You may be interested in the IPUMS NHIS user guidance on sampling weights and variance estimation.

Thank you Kari! So just so I’m clear, since I’m focusing my analyses on a subpopulation that is only collected in the Sample Adult survey, it doesn’t matter what outcome I’m analyzing, it will always be the same weight (because SAMPWEIGHT is the most restrictive)? Thanks!

This is correct. Because only sample adults will have the full slate of relevant variables, your analytical subsample will be restricted to sample adults and using PERWEIGHT instead of SAMPWEIGHT would underestimate the population counts. If your outcome measure is from an even more selective supplemental file (i.e., not the person or family file), you would want to use the supplemental weight (e.g., SUPP1WT for analyzing variables from the functioning and disability supplement in 2011-2017). However, it sounds like your analytical subpopulation is sample adults and you are looking at an outcome measure that is available for all persons; in this circumstance, SAMPWEIGHT is appropriate.

1 Like