Pre-1996 Prevalence Rate Calculation

Dongyue_Ying · March 21, 2019, 8:48pm

Hi,

I have some questions for the calculation of prevalence rate for pre-1996 data.

I understand that I need to use different weight for different times. I also learned that there will be a universe for each questions. For example, to calculate the prevalence for diabetes, I should use DIABETICYRC, and that:

For 1973 and 1975, DIABETICYRC should be weighted with PERWEIGHT.
For 1978 to 1981, DIABETICYRC should be weighted with DIABWT.
For 1982 to 1996, DIABETICYRC should be weighted with CONDWT4.

Question:
How should I calculate the prevalence rate specifically?

What I have done is like this:

if year == 1996:
numerator = ONE(answering "20/21/22" for DIABETICYRC in 1996)*CONDWT4
denominator = ONE(answering "10/20/21/22" for DIABETICYRC in 1996)*CONDWT4
prevalence rate = numerator/denominator*
where ONE(`) returns 1 if "`" is satisfied and 0 if not*

DIABETICYRC:
00: not in universe
10: no
20: yes
21: Yes, indicated by response to direct survey question
22: Yes, indicated by other source

Is my method correct? Since I calculate for diabetes and the result seems unreasonable.

Thank you!

Dongyue

JeffBloem · March 22, 2019, 8:30pm

This seems correct in general. A few notes that might be helpful. First, in 1996 the only available response categories are 00 “NIU,” 10 “No,” and 20 “Yes.” I don’t think specifying unused codes will make any difference, but I thought it is worth mentioning. Additionally, it is not clear to me from what is shared above how you are specifying the sampling weight. Be sure to check the documentation of your statistical software about how to correctly specify the sampling weight.

Dongyue_Ying · March 24, 2019, 5:28pm

Thank you, Jeff! I am using STATA 15.2 and treat the sampling weight as analytic weight (aweight). The specific process is to summarize the numerator and denominator separately, both with analytic weight. Store the sum of the weight of the two and divide one by the other. The code looks like below, do you think these are appropriate?

sum diabeticyrc [aweight = condwt4] if year == 1996 & diabeticyrc >= 20 & diabeticyrc <= 30
local a1 = r(sum_w)

sum diabeticyrc [aweight = condwt4] if year == 1996 & diabeticyrc >= 10 & diabeticyrc <= 30
local a2 = r(sum_w)

local a3 = `a1' / `a2'

dis as text "`a3'"

JeffBloem · March 25, 2019, 2:46pm

Yes, this code seems appropriate to me.

Dongyue_Ying · March 26, 2019, 3:39pm

Thanks, Jeff.

I have another issue about the choice of samples. If I would like to calculate the prevalence rate for subgroups (age, gender, income, etc.), should I stick with the current way of computation and only add more restrictive conditions (e.g. age>=50 & age <=59)? Or should I change the weight and sample? Thank you!

JeffBloem · March 26, 2019, 3:57pm

You are working on a trade-off between temporal precision vs. statistical precision. Sticking with one sample and adding more restrictive conditions will reduce the number of observations that meet the given criteria and will increase the margin of error associated with your prevalence estimates. Pooling a number of samples together will allow you to have more observations that meet the given criteria but will only represent an average of the prevalence rate over the pooled time period. So, I really can’t tell you which way is the best as this really depends on your ultimate research objectives.

Dongyue_Ying · March 27, 2019, 6:47pm

Thank you for your help, Jeff!

Topic		Replies	Views
What weights do I use if we are tabulating the Race and Occupation variables? USA	1	345	April 23, 2014
my code indicates that 28.99% have bmicalc<18.5 in 1991, 29.35% in 1993, but only 3.02% in 1992; is data ok? HEALTH SURVEYS	1	225	September 16, 2013
Weighting when using pooling 1986-2014 NHIS linked mortality files HEALTH SURVEYS	14	643	November 3, 2023
Proper way of using weights	3	579	January 12, 2023
Using weights correctly INTERNATIONAL	1	958	November 21, 2019

Pre-1996 Prevalence Rate Calculation

Related topics