Use PERWEIGHT OR SAMPWEIGHT for constructing total counts of cancer diagnoses by age?


For the NHIS, if I am attempting to construct total counts of cancer diagnoses by age, should I use PERWEIGHT or SAMPWEIGHT? The IPUMS documentation says SAMPWEIGHT, but only by using PERWEIGHT can I weight up the entire survey to the whole U.S. population. Within the population of cancer diagnoses, SAMPWEIGHTed counts and PERWEIGHTed counts diverge considerably.

Thank you

PERWEIGHT represents the inverse probability of selection into the NHIS sample and should be used with variables for which information was collected about all persons. Meanwhile, SAMPWEIGHT represents, with a few exceptions, the random selection of a respondent in a sampled household to complete a supplement survey. The correct weight to use will depend on whether the question your variable of interest relates to was asked of all persons or just of sample adults or sample children selected to answer supplemental questions.

The universe tab for CANCEREV (ever told had cancer) notes that only sample adults were asked questions relating to cancer. This means that non-sample adults will have a not-in-universe (NIU) value for this variable. For additional clarity, IPUMS NHIS provides a weights tab for each variable that specifies which weight should be used. In the case of CANCEREV and other cancer variables, the tab notes that SAMPWEIGHT should be used for all samples except for 1983. If you weight analyses of these variables with PERWT, you are underestimating the effect as PERWT values will be lower than SAMPWEIGHT since a smaller group of selected respondents must be inflated to the same population value.

Note that due to a redesign of the NHIS in 2019, all respondents from 2019-onwards are either sample adults or sample children and should be analyzed with SAMPWEIGHT for person-level estimates. I also recommend referring to this user guide on weights for information on combining and pooling weights over time if this is of interest.

1 Like

So, just to clarify, if I am trying to estimate the total population of 18+ people in the U.S. diagnosed with bladder cancer at age 13 using the variable CNBLADAG I should use SAMPWEIGHT?

Thank you.

Yes, SAMPWEIGHT is the correct weight to use when estimating the total population of 18+ people in the U.S. diagnosed with bladder cancer at age 13 using CNBLADAG. This is because this question is only ever asked of sample adults aged 18+ who were ever told they had bladder cancer.

Note however that the small number of survey respondents who ever reported having bladder cancer (CNBLAD) makes producing such an estimate particularly difficult. For example, in the 2021 NHIS not a single respondent reported being diagnosed with bladder cancer at age 13. This does not mean that there wasn’t a single adult in the entire U.S. in 2021 who had been diagnosed with bladder cancer at age 13, but that this was rare enough that none of the approximately 30,000 sample adults in the 2021 NHIS fell into this category. Such precise statistics can be difficult to estimate with survey data due to the few number of respondents satisfying the criteria. The results is a large standard error

There are several approaches that you might consider in order to increase your sample size to reduce the standard error of your estimate. These include grouping multiple age brackets and pooling data from multiple years of the NHIS (in which case you will need to divide SAMPWEIGHT by the number of years pooled). Aggregating by cancer types (e.g. genitourinary cancers instead of bladder cancer only) might also help.