Subpopulation variance: How to do it correctly?

vv28 · July 4, 2025, 7:25pm

Hello, I am doing a study comparing cancer prevalence and related health behaviors between veterans and non-veterans. I am pooling data from 2000 to 2018. I read the article “Analysis and Variance Estimation with IPUMS NHIS,” but I am confused.

How do I make sure I produce nationally representative subpopulation variance?
Do I have to adjust STRATA or PSU?

NOTE: Cancer prevalence variables are available all years. Some of the health behavior variables are available all years, others are available only certain years (2000, 2003, 2015).

Dan_Backman · July 8, 2025, 9:46pm

My response below will summarize what is documented in our User Note on how to approach analysis and variance estimation with IPUMS NHIS and the Use of Sampling Weights with IPUMS NHIS.

I am pooling data from 2000 to 2018.

You only need to adjust the sampling weights with pooled samples if you are creating estimates that are representative of the entire time period. To do this, you would simply divide the weights by the number of samples in your pool. However, if you are creating estimates for each year separately, you do not need to adjust the sampling weights by dividing them by the number of samples. You would just use them as-is. In that case, while your extract may contain data from multiple samples, you are not conducting a “pooled” analysis.

How do I make sure I produce nationally representative subpopulation variance?

The following R syntax demonstrates, generally, how an analyst can conduct subpopulation analysis using IPUMS NHIS data without compromising the design structure of the data. This approach has the effect of producing estimates for the population of interest, while incorporating the full sample design information for variance estimation. This syntax uses, as an example, the population of those 65 and older.

library(survey)
library(srvyr)
data <- as_survey(data, id = PSU, weight = PERWEIGHT, strata = STRATA, nest = TRUE)
subset(data, age >= 65) %>% summarise(var1_mean = survey_mean(var1, na.rm = TRUE))

Do I have to adjust STRATA or PSU?

The integrated variables STRATA and PSU in the IPUMS NHIS database have been adjusted from the original NHIS design variables to account for sampling design changes across years. Thus, the analyst can simply select the STRATA and PSU variables to use for analysis of one year or for many years of IPUMS NHIS data.

NOTE: Cancer prevalence variables are available all years. Some of the health behavior variables are available all years, others are available only certain years (2000, 2003, 2015).

Depending on if you are pooling samples or creating annual estimates, your approach will be different. If you are creating annual estimates for each year separately, there is no necessary change needed for your analyses. If you are pooling data across multiple years for one estimate, you may need to restrict your pool to only years where the variables of interest are available, and adjust weights accordingly (divide your weight variables by the number of samples in your analyses).

While differential availability across time is not the reason for a different weight, it likely indicates that something is part of a rotating topical supplement that is only asked of sample adults/children. On each variable webpage, there is a Weights tab that indicates which weight variable is best for that variable. In general, you should be using the most restrictive weight available based on the variables you are using for your analysis, or you could run your analyses separately for combinations of variables that require different weights, as seems appropriate.

vv28 · August 3, 2025, 2:15am

Thank you for your help, Dan!

Topic		Replies	Views
Pooled weights for yearly trend analyses in NHIS	3	51	October 7, 2024
Question about pooled weight 2019-2022 HEALTH SURVEYS	5	285	November 30, 2023
How do you deal with svyset's STRATA when using Census decenial data and ACS data together? USA	4	1601	February 24, 2016
Adjusting SAMPWEIGHT--include only sample adult total? HEALTH SURVEYS	6	246	September 19, 2023
NHIS: pooling, complex sampling and weights HEALTH SURVEYS	1	518	September 4, 2018

Subpopulation variance: How to do it correctly?

Related topics