Hi! I’m working with a merged sample of ASEC and basic monthly samples which includes data from march 1996 - 2006.
I want to create two variables that need to be constructed by using family interrelationships:
-
A dummy variable which indicates whether anyone in the family of the individual of interest has bad health conditions (except for him/herself) with bad health defined as variable HEALTH>=4 (which means fair or poor health of the respondent)
-
A variable which counts the number of family members in bad health, the individual of interest excluded
Could someone help me with the stata codes?
Best regards
Sandra
A preliminary note: The HEALTH variable is only available in ASEC samples. So, you will not be able to include the basic monthly samples in your analysis. This is not much of an issue, since in general, you will want to perform any analysis in either the ASEC samples or the basic monthly samples and not both. This is because the same individuals are included in the ASEC and basic monthly samples.
I’ll discuss some ideas for how to create each of these variables, one at a time:
(1) I’m not sure how you are defining your “individual of interest.” This code will be influenced by how you are defining this individual within each household. One way to do this is to generate a dummy variable for all observations that equals 1 if HEALTH>=4 (e.g., “fair or poor health”). Then, I’d create a dummy variable that equals 1 if the person is your “individual of interest.” Finally, you can replace the values of the “bad health” dummy variable for your “individual of interest” with a zero.
(2) The following code will count the number of individuals within households that report “bad health.” Note: households are slightly different than families, because multiple families can live within a household.
egen count = total(bad_health==1), by(serial)
If you want to perform this count within families, then the code will look something like this:
egen count = total(bad_health==1), by(serial famunit)
Thank you for the quick response!
Maybe I explained my sample not detailed enough: Its a panel data set. I need variables from basic monthly as well as ASEC samples. Therefore, I merged 1:1 the samples from march 1996-2006 basic monthly with 1996-2006 ASEC via MARBASECIDP, dropped the ASEC oversample observations and identified individuals via CPSIDP with validation via sex, age and race. To my understanding, I therefore created a one year panel of individuals which inherits variables from both surveys.
With individual of interest I just meant that I want to analyse working age adults which have (1) any family members in bad health (2) specific number of family members in bad health.
The main difficulties that occur to me are that I want to create (1) and (2) without counting the individual itself in and, due to panel data long format, every individual/family appears two times within the dataset. Therefore, “simply” using SERIAL and FAMUNIT will not work…
Okay, yes, I think you are on the right track regarding merging the March basic monthly sample with the ASEC sample. I just wanted to flag this detail in case there was any confusion.
Regarding the difficulties you’ve noted here. On the detail about not wanting to count the “individual of interest,” replacing the “bad_health” dummy variable to be equal to something other than a value of one will exclude them from the count variable, by definition of the total command. On the detail about every individual being included in the data set twice. Because you’ve created a panel data set, this is by definition a feature of the data set. You’ll need to decide whether this is a feature you need to correct or not. If you want to correct for this somehow, you will need to decide whether you want to use the first response of every person or the second. Once you’ve made this decision, you can use the duplicates command in Stata to identify duplicate individual observations by CPSIDP. This will allow you to only focus on either the first or second observation for each individual. All this being said, we (i.e., IPUMS) are really not your best source for coding advice. There are a number of approaches for coding what you are describing here (e.g., converting the data from long format into wide format) and I encourage you to get in touch with someone within your field of study.
Thank you very much. You have already helped me a lot.