I am trying to generate a continuous income variable for respondents in non-March samples (specifically November). I cannot use the categorical version of the income variable available in this sample because I am using income to roughly calculate a person’s Federal Poverty Level (FPL) for that year. The best option that I could think of was to use the earnweek variable and combine it with any potential spouse’s earnkweek value (earnweek_sp). I understand that this is a flawed method for many reasons, but it is the best solution that I could devise. Before I continue with this question, is anyone aware of a better way to get a continuous income variable for the November supplement that would allow me to calculate FPL?
If not, as I proceed with using earnweek to calculate family income I have encountered a few problems. Given that earnweek is only asked in the outgoing rotation group, I wanted to attach the earnweek variable from the subsequent months that still had participants in the November sample (i.e. December, January, and February) to increase my sample size. I tried to link my IPUMS CPS sample from these months to the November sample using HRHHID, HRHHID2, and LINENO, but I return a number of false matches. (When I use race as the test variable, I get 8137 incorrect matches). If I include race and sex in the match, it doesn’t solve the problem. Do you know how I can merge this information without generating duplicates?
Once I solve this problem, I plan to append this generated “income” variable to every person in the same household. First, I plan to use the variable RELATE to delete anyone not part of the family by Federal Poverty Level-calculation standards. I can then extend the “income” variable to everyone remaining in the household by first creating a new variable called Fam_earnweek. I would then use YEAR, SERIAL, and PERNUM to uniquely identify households, and select the maximum value of Fam_earnweek within a given household to use for the household value. I would then drop all other records with the same serial number. I plan to then merge this household level information onto the original person-level data by using a many-to-one merge on YEAR, PERNUM, and SERIAL. Does this make sense?
My apologies for the length of these questions! I greatly appreciate your help.