Hello! Let’s say I need to know how many children under 10 live with mothers who meet poverty level D in a given region. I’m able to find each of a mother’s children via MOMLOC, and then count how many are under 10 per mother, and make my own new variable CHILDCOUNT10 per mother.
My concern has to do with weighting. There are N=200 for mothers who meet poverty level D in the given region, but I see that weighted N=80,000. My question is this: am I able to use weighted data along with CHILDCOUNT10? Conceptually, I know that a weight represents how frequently a person with given characteristics appeared in the data. Since CHILDCOUNT10 is not in the data, will it be compatible with all N=100,000 mothers who meet poverty D?
Thank you so much for any help.
As long as I am understanding what you are doing correctly, using the person-level weights should be valid in this case. Regardless of the fact that you are creating new variables, you are still performing individual-level analysis and, if you want to calculate representative statistics, need to correct for the sampling procedure used in the CPS. This can be done using the person-level weights. More details and resources about sample weights are available in this blog post.
Thanks for your response, Jeff, and the blog link!
Let me try rewording, and see if it in fact will work:
Let’s say that for my region, I find 200 mothers (unweighted) who meet the criteria I’m interested in. For each mother in the sample, I create a new variable, CHILDCOUNT10, that I made by finding the number of children under age 10 who point to her. Let’s say that the features of one the 200 mothers, Mother_X, can be represented as a vector, X*. When I multiply her features by the weight she’s assigned – which I do to make sure that I my sample reflects true population ratios – I’m kind of “generating” a bunch of other mothers, about whom data wasn’t collected, but who share the features that Mother_X has in the original CPS sample.
CHILDCOUNT10 wasn’t an original feature of Mother_X – it was not an original variable in X*. So when I multiply Mother_X’s features by the appropriate weight in order to simulate the presence of other however-many-other-mothers in my dataset, would the variable CHILDCOUNT10 still be applicable? Mother_X represents, say, 1,000 other women about whom we don’t have data… but my intuition holds that she only represents them insofar as they share her characteristics in X*.
MAYBE, and this is my hope, she also represents them in terms of number of children under 10. But this seems like a lot to ask.
Thank you again so much for your timely help!
This should still work. You are right that the number of children under 10 is likely not directly reflected in the generation of the sampling weight, but variables that correlate with the CHILDCOUNT10 variable (such as age, geographic location, and other demographic factors) are included in X*. So, when you apply the sampling weight in your analysis, you should be calculating the number of mothers who have children under 10. I hope this helps.