Weighting when using pooling 1986-2014 NHIS linked mortality files


I’m using the NHIS linked mortality files pooled from 1986-2014 with mortality follow-up through 2015. My independent variable is “classwk” from 1986-2000 (available for all persons ages 18+ until 1996 and sample adults from 1997-2000) and “classwk2” from 2001-2014 (available for sample adults only). My dependent variable is “mortstat”. I’m planning to exclude those ages <25 or >65. I’m also planning to incorporate survey weights into my analyses. It seems like there are a couple weighting and sample issues I need to consider, which I’m hoping you can lend advice about. My analysis will be conducted in R.

First, I need to decide whether to use “mortwt” or a mixture of “mortwt” and “mortwtsa”. For example, should I use “mortwt” throughout my analyses, or should I use “mortwt” from 1986-1996 and “mortwtsa” from 1997-onwards (or otherwise)?

Second, I need to decide how to handle the Hispanic oversample in 1992. Based on your documentation, it seems like I should exclude the oversample in 1992. However, if I do so, do I need to adjust the survey weights or design? I’m unsure how to do so in R, as the NCHS guidance is based on SAS. Would it be sufficient to just “subset” the survey design object to exclude the oversample [e.g., svy_dsg_sub ← subset(svy_dsg, hispanic_oversample!=1)]?

Thanks so much for your advice.

Regarding your question about combining weights for a variable that moves between the person file and the sample adults file, it is appropriate to use MORTWT for 1986-1996 and MORTWTSA for 1997-2014. I am sharing an example about creating a weight when working with variables located in different sections of the survey over time from the IPUMS NHIS User Note on sampling weights:

In some cases, a variable of interest may be located in different original NHIS files with different sampling schemes across the years. For example, the IPUMS NHIS variable PAPEVER indicates whether a women ever had a Pap test. For the years, 1982, 1992 and 2002, the variable comes from three different files: 1982 Preventive Care supplement, 1992 Cancer Control supplement, and 2002 Sample Adult section. Accordingly, the sampling weights for each individual variable are PERWEIGHT, SUPP2WT, and SAMPWEIGHT, respectively. For analysis, these weights will need to be combined in a new variable. Researchers should generate a new weight, perhaps called PAPWEIGHT, such that PAPWEIGHT = PERWEIGHT if year = 1982; PAPWEIGHT = SUPP2WT if year = 1992; and PAPWEIGHT = SAMPWEIGHT if year = 2002.

It sounds like you have read the IPUMS NHIS user note on the 1992 Hispanic Oversample. The recommendation is to adjust PERWEIGHT92 for linking ineligibility as outlined in Appendix III of this NCHS technical documentation (which uses a SUDAAN procedure called PROC WTADJUST). While the process is using SUDAAN, the comments in the code do provide some guidance; I am also sharing another paper explicitly about adjusting sample weights for linkage eligibility (also using SUDAAN) that closely relates to Appendix III but goes into more detail. Unfortunately, my initial google searches haven’t turned up an R equivalent of the WTADJUST procedure outlined in these papers (this forum post has some general guidance on using weights in R, but no corollary to PROC WTADJUST). I am not an R user and don’t have experience adjusting weights based on linkage eligibility, but based on my understanding of the technical documentation a simple subset command would only reconfigure the standard errors and not adjust the weights as is recommended by the NCHS. I am sure you were hoping for a more definitive answer!

Thanks so much for the help! Regarding the first issue, I’ll use mortwt for 1986-1996 and mortwtsa for 1997-2014. Regarding the second issue, I’ll investigate further, although cursory googling hasn’t turned up much for me either. Perhaps a workaround would be to estimate inverse probability of selection weights (IPSW) myself, with a dependent variable of whether the respondent was included in the sample (vs excluded because they were in Hispanic oversample) and predictors of sociodemographic characteristics? I could then multiply together the survey weights and the IPSW and incorporate the combined weights into my analysis.

It is beyond the scope of the IPUMS User Support team to provide analytical advice, but you may be interested in contacting the NCHS Data Linkage team with this question as it relates to the linked mortality files.

1 Like

Hi KariWilliams,
Thanks for your answers about the use of weight. According to description in IPUMS, MORTWTSA is a mortality weight designed for use with sample adult variables and is available for the NHIS years 1997-2014. However, I have another question. If I want to pool 1997-2004 NHIS (for all adults ages 18+) linked mortality files through December 31, 2006 (using MORTUCOD), which weight variables (MORTWTSA, MORTWT or SAMPWEIGHT) should be used in analysis?

This is a good question.

As you allude to, MORTUCOD is from an earlier release of the NHIS-LMF data that only includes decedents from the National Death Index (NDI) through December 31, 2006 (and persons in the 1986-2004 NHIS). The currently available mortality weighting variables are for a version of the linked files that cover the 1986-2014 NHIS data linked to the NDI through December 31, 2015. As you can imagine, there are many more people in the more recent sample; accordingly, the weights are not appropriate for use with MORTUCOD as this is a smaller subsample. The IPUMS NHIS team is planning to release all previous versions of these variables to facilitate reproducibility (and address situations like the one you describe for differences in variable availability); I don’t have a specific timeline for this work, but think it could be available as early as this summer.

In general, which weight you use would not just depend on the vintage of the NHIS-LMF data, but also on the variables you include. Once the vintage-specific weights are available, you would still choose between MORTWTSA and MORTWT based on other variables in your analysis (i.e., whether or not you are using the linked mortality files alongside variables originally included in the sample adult questionnaire). You may also be interested in the IPUMS NHIS User Note on Sampling Weights, which has information on pooling multiple years of data together.

1 Like

Thanks very much. All variables in my study were included in the sample adult questionnaire. MORTUCOD (through December 31, 2006) was used in our study. Therefore, we need to use MORTWTSA according to the IPUMS NHIS recommendation.

Hi KariWilliams,
Thanks very much. All variables in my study were included in the sample adult questionnaire. MORTUCOD (through December 31, 2006) was used in our study. Whether can I use MORTWTSA in my study now?

The current version of MORTWTSA is based on a different vintage of the NHIS linked mortality files (NHIS-LMF) than the weights that are appropriate for use with MORTUCOD. MORTUCOD is from a previous release of the data; while we continue to offer this variable because of its additional detail not available in later vintages of the data, we did not preserve the weights specific to this vintage of the NHIS-LMF data in the IPUMS NHIS website. We are working to offer vintage-specific weights soon. I would advise against using the currently available MORTWTSA for analyses including MORTUCOD. We may be able to provide previous vintages of MORTWTSA directly to you ahead of the anticipated release date; email ipums@umn.edu for more information.

Thanks very much. In my study, both MORTUCOD (through December 31, 2006) and MORTUCODLD (through December 31, 2015) were used. Adults (18+ years old) in NHIS 1997-2004 was included. However, follow-up began at the date of recruitment and ended either at the date of death (before 31 December 2006) or 31 December 2006 which was the censoring date for those who were still alive at the end of follow-up. All-cause death was defined by MORTUCODLD variable. If someone died in 2007, they were as alive in our study. But cause-specific death was defined by MORTUCOD variable. In this condition, I think MORTWTSA may be available. Do you think it is okay?

The current MORTWTSA variable is designed for estimates about decedents from the 1997-2014 NHIS who died before December 31, 2015. This weight will not produce correct estimates if your population of interest is persons who died before December 31, 2006. While your analysis may count those who died in 2007 as alive, MORTWTSA does not; these persons are counted as deceased and will be assigned a mortality weight. Accordingly, I expect estimates for a shorter study period that use the current versions of MORTWT or MORTWTSA would produce an undercount. I recommend waiting for the vintage-specific weights (and other variables) that the IPUMS NHIS team is working to release soon.