Harmonizing data with missing years - weights?


I am reviewing a 15-year dataset (2004-2018) and one of the variables within it is for high cholesterol (CHOLHIGHEV). However, this variable (though consistent in the years it was used) is only available for 2008, 2012, and 2015-2018. This is 6 years rather than 15.
They’ve been using exact fractions based on the number of respondents within each year over the total number of respondents over the 15 years to calculate weights (i.e., the weights have been adjusted for the 15-year pooling).
Is it possible to use these same weights and analyze high cholesterol, although it’s only available for 6 years? Would you have to adjust the weights somehow in the 15-year dataset? Or is it impossible to analyze it because of the years it is missing, and you would need to limit analyses to the years high cholesterol is available (make a new dataset)?
Thank you!! I really appreciate the help in this forum :slight_smile:

Without more details about your specific research project, I can only say that if CHOLHIGHEV is a focal variable for your analysis, I would not recommend including years where this is not available in your analytical subsample.

Thank you for your reply! My colleague is looking at several health conditions (e.g., cancer, diabetes, high blood pressure), high cholesterol is just one of them. The populations of interest are by industry (comparing X workers to Y workers). My concern is that the prevalence estimates may be incorrect since technically it is 6 years, not 15, so weights would need to be changed. But on the other hand, industries could be compared since it would be the same years missing? However, since it’s not a focal point, my current inclination is to remove the high cholesterol condition.