Weighting when using pooling 1986-2014 NHIS linked mortality files


I’m using the NHIS linked mortality files pooled from 1986-2014 with mortality follow-up through 2015. My independent variable is “classwk” from 1986-2000 (available for all persons ages 18+ until 1996 and sample adults from 1997-2000) and “classwk2” from 2001-2014 (available for sample adults only). My dependent variable is “mortstat”. I’m planning to exclude those ages <25 or >65. I’m also planning to incorporate survey weights into my analyses. It seems like there are a couple weighting and sample issues I need to consider, which I’m hoping you can lend advice about. My analysis will be conducted in R.

First, I need to decide whether to use “mortwt” or a mixture of “mortwt” and “mortwtsa”. For example, should I use “mortwt” throughout my analyses, or should I use “mortwt” from 1986-1996 and “mortwtsa” from 1997-onwards (or otherwise)?

Second, I need to decide how to handle the Hispanic oversample in 1992. Based on your documentation, it seems like I should exclude the oversample in 1992. However, if I do so, do I need to adjust the survey weights or design? I’m unsure how to do so in R, as the NCHS guidance is based on SAS. Would it be sufficient to just “subset” the survey design object to exclude the oversample [e.g., svy_dsg_sub ← subset(svy_dsg, hispanic_oversample!=1)]?

Thanks so much for your advice.

Regarding your question about combining weights for a variable that moves between the person file and the sample adults file, it is appropriate to use MORTWT for 1986-1996 and MORTWTSA for 1997-2014. I am sharing an example about creating a weight when working with variables located in different sections of the survey over time from the IPUMS NHIS User Note on sampling weights:

In some cases, a variable of interest may be located in different original NHIS files with different sampling schemes across the years. For example, the IPUMS NHIS variable PAPEVER indicates whether a women ever had a Pap test. For the years, 1982, 1992 and 2002, the variable comes from three different files: 1982 Preventive Care supplement, 1992 Cancer Control supplement, and 2002 Sample Adult section. Accordingly, the sampling weights for each individual variable are PERWEIGHT, SUPP2WT, and SAMPWEIGHT, respectively. For analysis, these weights will need to be combined in a new variable. Researchers should generate a new weight, perhaps called PAPWEIGHT, such that PAPWEIGHT = PERWEIGHT if year = 1982; PAPWEIGHT = SUPP2WT if year = 1992; and PAPWEIGHT = SAMPWEIGHT if year = 2002.

It sounds like you have read the IPUMS NHIS user note on the 1992 Hispanic Oversample. The recommendation is to adjust PERWEIGHT92 for linking ineligibility as outlined in Appendix III of this NCHS technical documentation (which uses a SUDAAN procedure called PROC WTADJUST). While the process is using SUDAAN, the comments in the code do provide some guidance; I am also sharing another paper explicitly about adjusting sample weights for linkage eligibility (also using SUDAAN) that closely relates to Appendix III but goes into more detail. Unfortunately, my initial google searches haven’t turned up an R equivalent of the WTADJUST procedure outlined in these papers (this forum post has some general guidance on using weights in R, but no corollary to PROC WTADJUST). I am not an R user and don’t have experience adjusting weights based on linkage eligibility, but based on my understanding of the technical documentation a simple subset command would only reconfigure the standard errors and not adjust the weights as is recommended by the NCHS. I am sure you were hoping for a more definitive answer!

Thanks so much for the help! Regarding the first issue, I’ll use mortwt for 1986-1996 and mortwtsa for 1997-2014. Regarding the second issue, I’ll investigate further, although cursory googling hasn’t turned up much for me either. Perhaps a workaround would be to estimate inverse probability of selection weights (IPSW) myself, with a dependent variable of whether the respondent was included in the sample (vs excluded because they were in Hispanic oversample) and predictors of sociodemographic characteristics? I could then multiply together the survey weights and the IPSW and incorporate the combined weights into my analysis.

It is beyond the scope of the IPUMS User Support team to provide analytical advice, but you may be interested in contacting the NCHS Data Linkage team with this question as it relates to the linked mortality files.