This report talks about how to create a new weight for 2019-2020 and another new weight for 2021-2022. If I want to do research using 2019-2022 data overall, does that mean I shall do it every two years like this page and combine them, or I still pool four years together and divided by four?
To conduct a pooled analysis using 2019-2022 samples, you will first need to adjust 2019-2020 following the steps provided in Step 2. Extra adjustments needed when pooling 2019 and 2020 samples in the variance estimation user note (see example syntax below).
Unless you are including COVID-19 variables in your analysis, you do not need to make additional adjustments to the weight for the 2020-2021 samples. If you are including COVID-19 variables in your analysis, you will need to follow the steps provided in Step 3. Sampling weight adjustments needed when analyzing COVID-19 data in the variance estimation user note.
Once you’ve made the above adjustments needed for pooling 2019-2020 samples and using COVID-19 data from 2020-2021 samples, you can follow the standard recommendations for pooling samples by dividing the weight (SAMPWEIGHT) by the number of samples in your pooled dataset, which in your case would be 4.
I provided Stata syntax below for how you may choose to make these adjustments, which includes the adjustments needed when pooling 2019 and 2020 data together.
I hope this helps. Please follow up with any additional questions.
*drop duplicate sample adult records from the 2019-2020 follow back sample
drop if age > 17 & partweight == 0 & year == 2020
*set pooled_weight equal to sampweight
gen pooled_weight = sampweight
*replace the value of pooled weight with partweight for sample adults in 2020 who were NOT part of the follow back sample
replace pooled_weight = partweight if year == 2020 & age > 17 & age != .
*adjust the pooled weight by dividing by the number of samples in the combined sample
gen pooled_weight_adj = pooled_weight/4
Hi Megan, could you clarify the first piece “drop”?
After applying your syntax, the 2020 data only has around 21k people whereas I still have 31k people in 2019, 29k people in 2021, and 27k people in 2022.
As noted in Step 2. Extra adjustments needed when pooling 2019 and 2020 samples in the variance estimation user note, the first drop removes participants from the 2020 sample that were observed in both 2019 and 2020 so they are not counted twice in your pooled dataset. Dropping these records is needed because some of the respondents in the 2019 sample were resurveyed in 2020 to correct for a higher level of nonresponse during the COVID-19 pandemic. As a result, the 2019-2020 samples provide a one-time longitudinal panel of observations and must be adjusted when pooling together. There were 10,415 sample adults that were included in both 2019 and 2020, so the 21k people remaining in 2020 after the drop seems reasonable to me given that 21k is about 10k lower than the number of observations in the three other samples. PARTWEIGHT is the final Partial Weight offered in the original NHIS public use files and was created to allow estimates that are based only on the respondents from the original 2020 NHIS sample (i.e., excluding the longitudinal sample). In the syntax I provided, PARTWEIGHT replaces the pooled_weight value for the remaining 2020 observations to properly weight the remaining observations to be nationally representative of the non-institutionalized population. More detailed documentation about generating estimates using the 2020 NHIS sample can be found under Section II: Analyzing 2020 NHIS (page 44) in the 2020 NHIS Survey Description.
Hi Megan, thank you for the explanation and the link. I have been using the MEPS data, the subset of NHIS, and it has variables to identify rounds and panels. I thought the NHIS is similar but apparently, it is not. In other words, I could not see the yearly trend across four years even though I pool them together, right?
The NHIS is a cross-sectional household interview survey, meaning households are only surveyed once. Because of this, you are correct that you would not be able to see within-person trends across those four years. However, you would be able to estimate population-level trends across those four years when applying the weights to your analysis.