Hi!
I am trying to find the percent of below-poverty households in 1950 that were headed by single mothers. What weights should I use to calculate this?
I see that the sample-line weights (SLWT) should be used in situations where poverty/income variables are used for the 1950 survey. But in my situation, I’m using poverty as a filtering mechanism to get below-poverty households, and am then calculating percentages of single-mother households based on that group. Would it therefore be feasible to just use HHWT?
And beyond this: can the SLWT weight be used to get information on a household level like this in general (rather than on an individual level)?
Hi Jacob,
Thank you for your patience on this. After further discussion, our conclusion is that you do not need to use either HHWT or SLWT to calculate percentages of single-mother households below the poverty line since each household has the same probability of being in the universe for POVERTY. Let me explain. Each person in the 1950 census had a 1-in-330 chance of being selected as a sample-line person in the public use sample of the 1950 census. Once selected, their entire household would join them in the public sample. However, only in the case where the sample line person was the head of the household would they answer the supplementary income questions that are necessary to construct POVERTY (this is noted in the variable universe). The paradox is that the more members the household has, the more likely it’s included in the public use sample. However, the larger the household the less likely the person who was selected was the household head.
Consider 165 families of 2 versus 55 families of 6 in a sample of 330. In the first case, 165 families each have a 1/165 chance of being selected. Once they are, their head is selected with probably 1/2, so the likelihood of a given household head being selected is 1/165 x 1/2 = 1/330. In the second case, 55 families each have a 1/55 chance of being selected. Once they are, their head is selected with probability 1/6, so the likelihood of a given household head being selected is 1/55 x 1/6 = 1/330. The higher chance of their household being selected is exactly offset by the lower chance of the head being selected within the household. Therefore, each household has an equal likelihood of their head being selected (i.e. 1/330).
Since the probability is the same for all households, there is no need to use weights for generating averages since you would be multiplying your household sums by the same constant. If you’re interested in estimating population-level totals, such as the total number of female-headed households, you will want to multiply the number of households you find satisfying the criteria by the inverse of the probability of being included in the sample (i.e. 330). HHWT should only be used to analyze household-level variables that are available for all households (e.g. geography, household composition). Using HHWT for POVERTY will give too much weight to smaller households which have a high probability of having their householder be the sample line person. As an additional note, you should consider dropping all households with multiple families (FAMUNIT). POVERTY is calculated based on family, rather than household, income. A household refers to a residence, which may have unrelated individuals living among several families. If so, POVERTY will only provide information on the family that the household head belongs to. Dropping these households will make sure the family definition in POVERTY is congruent with your household analysis.
Please let me know if you have any further questions. This is a complicated issue and we appreciate you bringing it up so we can provide clear documentation for future users.
Thank you so, so much for this response - it is much appreciated. I will let you know if I have any further questions as I go back through the data!
One further question: is there a way to do a similar analysis on a person-level (rather than HH-level) with the 1950 data using sample-line weight data? For example, if I wanted to say “X percent of children in 1949 lived in poverty”, is there a weighting scheme that would allow for this?
Thanks again for your help and patience on this!
Yes, you can run a similar analysis on the person-level and estimate percentages without needing to use any of the provided weights. This is because each person in your sample of respondents with in-universe values of POVERTY (i.e. POVERTY > 0) will have a weight of 330. Since each household has a 1-in-330 chance of its household head being selected to answer supplementary income questions, each individual also has the same 1-in-330 chance of being in one of these households. Since the weights are identical for all individuals, you can ignore them when calculating percentages and ratios. When estimating totals (e.g. the total number of children living in poverty), each observation will represent 330 people in your estimate.
A couple additional notes:
For your household analysis, you should only keep one individual for each household in your sample. That individual will represent the entire household and will have a weight of 330 since each household has a 1-in-330 chance of having its household head be the sample-line person. Typically the household head (i.e. PERNUM = 1) is kept, but in this case you might be more interested in keeping the sample-line person instead (SLREC = 2) since there might be other variables you’re interested in that are only available for the sample-line person.
If you’re using Stata, you should consider running the svy command together with the subpop option. This allows you not only to quickly drop not-in-universe values of POVERTY since the option defines the subpopulation by varname ≠ 0 and not missing, but it also calculates survey standard errors where your subpopulation is constructed without replacement.